Red Team Testing

Red team testing lets you proactively identify vulnerabilities in your AI workflows by running automated adversarial attacks. NodeLoom simulates real-world attack scenarios to evaluate how well your guardrails, prompts, and configurations resist manipulation.

Overview

AI workflows face unique security challenges. Adversaries can craft inputs designed to bypass safety measures, extract sensitive data, or cause the model to produce harmful content. Red team testing automates the process of discovering these weaknesses before attackers do.

A red team scan runs a suite of adversarial test cases against your workflow and produces a vulnerability report with a resilience score. Use these findings to strengthen your guardrails and refine your prompts.

Attack Categories

Red team scans test your workflow against multiple categories of adversarial attacks:

CategoryDescription
Prompt InjectionAttempts to override the system prompt or inject new instructions into user input, causing the model to ignore its original instructions.
JailbreakTechniques designed to bypass the model's safety training and content policies, causing it to produce responses it would normally refuse.
Data ExfiltrationAttempts to extract sensitive information such as system prompts, training data, API keys, or credentials from the model's context.
Hallucination ProbingInputs designed to cause the model to generate false, misleading, or fabricated information presented as fact.
Bias TestingEvaluates whether the model produces biased, discriminatory, or unfair outputs when presented with inputs related to protected characteristics.
Compliance ViolationTests whether the model can be induced to generate content that violates regulatory requirements or organizational policies.

Running a Scan

To run a red team scan against a workflow:

  1. Open the workflow you want to test and navigate to the Security tab.
  2. Click Run Red Team Scan to open the configuration dialog.
  3. Select which attack categories to include in the scan. By default, all categories are enabled.
  4. Optionally configure the number of test cases per category and the intensity level (low, medium, high).
  5. Click Start Scan to begin. The scan runs in the background and you will be notified when it completes.

Scans typically take a few minutes depending on the number of categories and test cases selected.

Understanding Results

After a scan completes, you receive a vulnerability report containing:

ComponentDescription
Resilience ScoreAn overall score from 1 (highly vulnerable) to 5 (highly resilient). This represents how well your workflow withstood the adversarial attacks across all categories.
Category BreakdownIndividual scores for each attack category, showing which areas are strong and which need improvement.
FindingsDetailed list of successful attacks, including the input used, the model's response, and which safety measure (if any) was bypassed.
RecommendationsSuggested guardrail configurations, prompt modifications, or other changes to address identified vulnerabilities.

Sensitive results

Red team results contain sensitive information about your workflow vulnerabilities. Restrict access using RBAC to ensure only authorized team members can view scan results.