Guardrails

The Guardrail node provides a configurable safety layer for AI agent workflows. It can detect prompt injection, redact PII, filter harmful content, validate schemas, and detect semantic manipulation -- all before data reaches your AI models or leaves your system.

Prompt Injection Detection

NodeLoom detects prompt injection attempts using a multi-layered approach. Suspicious inputs are identified and flagged before they reach downstream AI nodes.

The detection system matches known injection patterns (e.g., "ignore previous instructions," "you are now," role-switching phrases) and scores suspicious inputs. Inputs exceeding the configurable threshold are blocked or flagged.

Example guardrail configuration
{
  "type": "GUARDRAIL",
  "parameters": {
    "checks": ["PROMPT_INJECTION", "PII_REDACTION"],
    "severity": "BLOCK"
  }
}

PII Redaction

The PII redaction engine identifies and masks personally identifiable information before it is processed by AI models or stored in logs. The following data types are detected:

PII TypeDescription
Email addressesDetected and replaced with a redaction marker.
Social Security NumbersUS SSN patterns are detected and redacted.
Credit card numbersCommon card number formats are detected and redacted.
Phone numbersInternational and domestic phone formats are detected and redacted.
API keysCommon API key patterns are detected and redacted.
JWT tokensJWT patterns are detected and redacted.

Credential response masking

In addition to PII redaction, NodeLoom automatically masks credential values (API keys, tokens, passwords) in all AI model responses. This prevents accidental leakage of secrets through generated text.

Content Filtering

Content filtering uses a two-tier approach to detect harmful or inappropriate content:

  • OpenAI Moderation API: When configured, inputs are sent to the OpenAI moderation endpoint for classification across categories like hate speech, self-harm, violence, and sexual content.
  • Local pattern matching: A built-in set of patterns provides baseline content filtering without any external API dependency. This acts as a fallback or supplementary layer.

JSON Schema Validation

The Guardrail node can validate AI model outputs against a JSON Schema definition. This ensures that structured data returned by AI models conforms to your expected format before it is passed to downstream nodes.

Schema validation example
{
  "schemaValidation": {
    "type": "object",
    "required": ["sentiment", "confidence"],
    "properties": {
      "sentiment": {
        "type": "string",
        "enum": ["positive", "negative", "neutral"]
      },
      "confidence": {
        "type": "number",
        "minimum": 0,
        "maximum": 1
      }
    }
  }
}

Semantic Similarity Detection

This check uses embedding-based comparison to detect semantic manipulation attempts. NodeLoom maintains reference embeddings for known safe prompts and various categories of malicious input.

Incoming prompts are embedded and compared against these reference vectors. High similarity to known attack patterns triggers the configured severity action.

SQL Injection Prevention

When AI agents generate SQL queries, the Guardrail node provides structural SQL injection prevention. Rather than relying solely on string matching, queries are analyzed for suspicious structures.

  • Structural analysis: Detects injected clauses, tautology conditions, and stacked queries.
  • Comment stripping: Removes SQL comments that are commonly used to hide injection payloads.
  • Whitelist/blacklist: Configure allowed tables, columns, and operations. Queries referencing unauthorized resources are blocked.

Custom Guardrail Rules

Beyond the built-in checks, you can define custom guardrail rules using four different rule types:

Rule TypeDescriptionExample Use Case
REGEXMatch input against a regular expression pattern.Block messages containing base64-encoded data.
KEYWORD_LISTCheck for the presence of specific keywords or phrases.Flag messages mentioning competitor names.
JAVASCRIPTRun a custom JavaScript function that returns pass/fail.Validate custom business logic or data formats.
LLM_PROMPTUse an LLM to evaluate the input against a natural language rule.Check if a response aligns with your brand voice.

Severity Levels

Each guardrail check can be configured with one of three severity levels that determine what happens when a violation is detected:

SeverityBehavior
LOGThe violation is recorded in the execution log but processing continues normally. Use this for monitoring without disruption.
WARNINGThe violation is logged and a warning is attached to the node output. Downstream nodes can inspect the warning and decide how to proceed.
BLOCKThe execution is halted immediately. The workflow returns an error with details about the guardrail violation.

Layered defense

For production AI agents, combine multiple guardrail checks at different severity levels. For example, use LOG for content filtering, WARNING for PII detection, and BLOCK for prompt injection.