Red Team Testing
Red team testing lets you proactively identify vulnerabilities in your AI workflows by running automated adversarial attacks. NodeLoom simulates real-world attack scenarios to evaluate how well your guardrails, prompts, and configurations resist manipulation.
Overview
AI workflows face unique security challenges. Adversaries can craft inputs designed to bypass safety measures, extract sensitive data, or cause the model to produce harmful content. Red team testing automates the process of discovering these weaknesses before attackers do.
A red team scan runs a suite of adversarial test cases against your workflow and produces a vulnerability report with a resilience score. Use these findings to strengthen your guardrails and refine your prompts.
Attack Categories
Red team scans test your workflow against multiple categories of adversarial attacks:
| Category | Description |
|---|---|
| Prompt Injection | Attempts to override the system prompt or inject new instructions into user input, causing the model to ignore its original instructions. |
| Jailbreak | Techniques designed to bypass the model's safety training and content policies, causing it to produce responses it would normally refuse. |
| Data Exfiltration | Attempts to extract sensitive information such as system prompts, training data, API keys, or credentials from the model's context. |
| Hallucination Probing | Inputs designed to cause the model to generate false, misleading, or fabricated information presented as fact. |
| Bias Testing | Evaluates whether the model produces biased, discriminatory, or unfair outputs when presented with inputs related to protected characteristics. |
| Compliance Violation | Tests whether the model can be induced to generate content that violates regulatory requirements or organizational policies. |
Running a Scan
To run a red team scan against a workflow:
- Open the workflow you want to test and navigate to the Security tab.
- Click Run Red Team Scan to open the configuration dialog.
- Select which attack categories to include in the scan. By default, all categories are enabled.
- Optionally configure the number of test cases per category and the intensity level (low, medium, high).
- Click Start Scan to begin. The scan runs in the background and you will be notified when it completes.
Scans typically take a few minutes depending on the number of categories and test cases selected.
Understanding Results
After a scan completes, you receive a vulnerability report containing:
| Component | Description |
|---|---|
| Resilience Score | An overall score from 1 (highly vulnerable) to 5 (highly resilient). This represents how well your workflow withstood the adversarial attacks across all categories. |
| Category Breakdown | Individual scores for each attack category, showing which areas are strong and which need improvement. |
| Findings | Detailed list of successful attacks, including the input used, the model's response, and which safety measure (if any) was bypassed. |
| Recommendations | Suggested guardrail configurations, prompt modifications, or other changes to address identified vulnerabilities. |
Sensitive results