Creating Detection Rules
Detection rules are the core components of guardrails that identify potential risks in AI interactions. This guide explains how to create and manage detection rules in Agent Ops Director.
Understanding Detection Rules
Detection rules define specific patterns or conditions that, when matched in AI request or response content, trigger a guardrail event. Rules can be organized into categories and have configurable severity levels.
Each rule consists of:
- Pattern: A regular expression or matching condition
- Location: Where to apply the rule (request body, headers, etc.)
- Event Name: Identifier for the triggered event
- Severity: Impact level (Low, Medium, High)
Creating a New Detection Rule
To create a new detection rule:
- Navigate to the Guardrail Studio section
- Select a guardrail to edit or create a new one
- Go to the Detection Rules tab
- Click + Add Detection Rule
- Configure the rule details:
- Enter a descriptive name
- Select or create a rule category
- Define the pattern using regular expressions
- Specify where to apply the rule (location)
- Set the event name and severity
Using Rule Categories
Rules are organized into categories for easier management. Default categories include:
- Security: Rules for detecting security risks like prompt injections
- Regulatory: Rules related to compliance requirements
- Operational: Rules for operational concerns like model versioning
To work with categories:
- When creating/editing a rule, select an existing category or create a new one
- Use the Group by: Category dropdown in the Detection Rules view to organize rules by category
- Filter rules by category using the filter options
Pattern Matching Examples
Effective rules rely on well-crafted patterns. Here are some examples:
Email Detection
[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}
Credit Card Detection
4[0-9]{3}(?:[ -]?[0-9]{4}){3}|5[1-5][0-9]{2}(?:[ -]?[0-9]{4}){3}
Prompt Injection Detection
(ignore|disregard).+(previous|above).+(instructions|prompt)
Setting Rule Severity
Each rule should have an appropriate severity level:
- Low: Minor risks with limited impact
- Medium: Moderate risks that may affect operations
- High: Significant risks requiring immediate attention
The severity level determines how the rule behaves in different enforcement modes and may affect alerting thresholds.
Testing Detection Rules
After creating a rule:
- Go to the Tests tab in the guardrail details view
- Create test cases with sample inputs that should trigger your rule
- Run tests to validate the rule's effectiveness
- Adjust patterns as needed based on test results
For more details on testing, see Testing Guardrails.
Best Practices for Detection Rules
- Start Specific: Begin with precise patterns and broaden as needed
- Consider False Positives: Balance detection coverage with false positive risk
- Use Rule Comments: Add comments to document complex patterns
- Regular Updates: Review and update rules as new risk patterns emerge
- Consistent Naming: Use clear, consistent naming conventions for rules and events
Next Steps
- Learn how to test your guardrails thoroughly
- Understand how to handle false positives
- Explore monitoring guardrail analytics for rule performance