Handling False Positives
False positives—legitimate requests incorrectly flagged as violations—are an inevitable challenge when implementing guardrails. This guide explains how to effectively manage and reduce false positives in Agent Ops Director.
Understanding False Positives
A false positive occurs when a guardrail incorrectly triggers on legitimate content. False positives can:
- Disrupt user experience if requests are incorrectly blocked
- Create unnecessary alerts and operational noise
- Reduce confidence in guardrail effectiveness
- Lead to "alert fatigue" if too frequent
Identifying False Positives
False positives can be identified through several channels:
- User Reports: When users encounter blocked legitimate requests
- Test Results: During guardrail testing when expected outcomes don't match actual results
Reporting a False Positive
When users encounter what they believe is a false positive:
- They click the Report False Positive link on the block message
- Complete the report form with:
- Incident ID (if available)
- Application name
- Guardrail type (if known)
- Description of why they believe it's a false positive
- Submit the report for review
Reviewing False Positive Reports
For administrators reviewing reports:
- Navigate to the False Positives tab in Guardrail Studio
- Review the queue of reported false positives
- For each report:
- Examine the triggered rule and content
- Determine if it's a genuine false positive
- Take appropriate action
Resolving False Positives
When a legitimate false positive is identified, several resolution options are available:
- Rule Adjustment: Modify the detection rule to be more precise
- Rule Exemption: Create a specific exemption for the legitimate pattern
- Severity Adjustment: Lower the rule's severity if it's causing too many disruptions
- Mode Change: Switch from Enforce to Monitoring mode while addressing the issue
The ideal solution depends on the specific nature of the false positive and its frequency.
Implementing Rule Improvements
To refine a rule based on false positive feedback:
- Navigate to the Detection Rules tab for the affected guardrail
- Locate and edit the rule causing false positives
- Adjust the pattern to exclude legitimate use cases
- Add test cases based on the false positive examples
- Verify the changes resolve the issue without creating new problems
Monitoring False Positive Rates
Track false positive metrics over time:
- Use the Analytics tab to monitor the false positive rate
- Set alert thresholds for unacceptable false positive rates
- Regularly review the top rules generating false positives
A decreasing false positive rate indicates successful guardrail refinement.
Best Practices for Minimizing False Positives
- Start in Monitoring Mode: Deploy new guardrails in Monitoring mode before switching to Enforce
- Iterative Refinement: Continuously improve rules based on false positive data
- Context-Aware Rules: Develop rules that consider context, not just pattern matching
- User Education: Help users understand guardrail purposes to reduce incorrect reports
- Regular Reviews: Schedule periodic reviews of rules with high false positive rates
Next Steps
- Learn about customizing guardrails for your specific needs
- Explore the analytics capabilities for tracking improvement
- Review creating detection rules with a focus on precision