Guardrails
The Guardrail node provides a configurable safety layer for AI agent workflows. It can detect prompt injection, redact PII, filter harmful content, validate schemas, and detect semantic manipulation -- all before data reaches your AI models or leaves your system.
Prompt Injection Detection
NodeLoom detects prompt injection attempts using a multi-layered approach. Suspicious inputs are identified and flagged before they reach downstream AI nodes.
The detection system matches known injection patterns (e.g., "ignore previous instructions," "you are now," role-switching phrases) and scores suspicious inputs. Inputs exceeding the configurable threshold are blocked or flagged.
{
"type": "GUARDRAIL",
"parameters": {
"checks": ["PROMPT_INJECTION", "PII_REDACTION"],
"severity": "BLOCK"
}
}PII Redaction
The PII redaction engine identifies and masks personally identifiable information before it is processed by AI models or stored in logs. The following data types are detected:
| PII Type | Description |
|---|---|
| Email addresses | Detected and replaced with a redaction marker. |
| Social Security Numbers | US SSN patterns are detected and redacted. |
| Credit card numbers | Common card number formats are detected and redacted. |
| Phone numbers | International and domestic phone formats are detected and redacted. |
| API keys | Common API key patterns are detected and redacted. |
| JWT tokens | JWT patterns are detected and redacted. |
Credential response masking
Content Filtering
Content filtering uses a two-tier approach to detect harmful or inappropriate content:
- OpenAI Moderation API: When configured, inputs are sent to the OpenAI moderation endpoint for classification across categories like hate speech, self-harm, violence, and sexual content.
- Local pattern matching: A built-in set of patterns provides baseline content filtering without any external API dependency. This acts as a fallback or supplementary layer.
JSON Schema Validation
The Guardrail node can validate AI model outputs against a JSON Schema definition. This ensures that structured data returned by AI models conforms to your expected format before it is passed to downstream nodes.
{
"schemaValidation": {
"type": "object",
"required": ["sentiment", "confidence"],
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"]
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1
}
}
}
}Semantic Similarity Detection
This check uses embedding-based comparison to detect semantic manipulation attempts. NodeLoom maintains reference embeddings for known safe prompts and various categories of malicious input.
Incoming prompts are embedded and compared against these reference vectors. High similarity to known attack patterns triggers the configured severity action.
SQL Injection Prevention
When AI agents generate SQL queries, the Guardrail node provides structural SQL injection prevention. Rather than relying solely on string matching, queries are analyzed for suspicious structures.
- Structural analysis: Detects injected clauses, tautology conditions, and stacked queries.
- Comment stripping: Removes SQL comments that are commonly used to hide injection payloads.
- Whitelist/blacklist: Configure allowed tables, columns, and operations. Queries referencing unauthorized resources are blocked.
Custom Guardrail Rules
Beyond the built-in checks, you can define custom guardrail rules using four different rule types:
| Rule Type | Description | Example Use Case |
|---|---|---|
REGEX | Match input against a regular expression pattern. | Block messages containing base64-encoded data. |
KEYWORD_LIST | Check for the presence of specific keywords or phrases. | Flag messages mentioning competitor names. |
JAVASCRIPT | Run a custom JavaScript function that returns pass/fail. | Validate custom business logic or data formats. |
LLM_PROMPT | Use an LLM to evaluate the input against a natural language rule. | Check if a response aligns with your brand voice. |
Severity Levels
Each guardrail check can be configured with one of three severity levels that determine what happens when a violation is detected:
| Severity | Behavior |
|---|---|
LOG | The violation is recorded in the execution log but processing continues normally. Use this for monitoring without disruption. |
WARNING | The violation is logged and a warning is attached to the node output. Downstream nodes can inspect the warning and decide how to proceed. |
BLOCK | The execution is halted immediately. The workflow returns an error with details about the guardrail violation. |
Layered defense