Operationalizing guardrails after launch
The controls we ship so copilots stay safe once real data and edge cases arrive.
Guardrails rarely fail in staging - they fail when the model meets real-world prompts, strange files, and power users. We keep guardrails alive by treating them like product features, not one-time checklists.
Map the risky behaviors to signals
We list risky behaviors in plain language - PII leaks, off-policy actions, hallucinated citations - and assign signals we can actually measure. Examples:
- PII leaks → detector confidence, number of redactions, destination system.
- Off-policy actions → allowlist match rate, audit log diff from baseline workflows.
- Hallucinated citations → citation coverage %, unresolved source links per answer.
Each signal gets a threshold and an owner. If a signal cannot be measured, we redesign the workflow until it can.
Build a small review lane instead of blocking
Pure blocking leads to brittle experiences. We route uncertain events to a review lane:
export const reviewLane = { pii: { threshold: 0.65, reviewers: ['security'], action: 'auto-redact' }, citations: { threshold: