Real-Time LLM Protection — Monitor and Block Attacks in Production
Guide · 2023-04-18 · 14 min read · FilterPrompt Security Team
How real-time LLM protection works, the architecture that holds up under load, latency budgets, what to monitor, and the production checklist for catching attacks the moment they happen.
Real-time LLM protection is the discipline of catching attacks against your LLM application the moment they happen — at the prompt, before the model responds, and at the response, before it reaches the user. Batch detection (logs reviewed nightly) is fine for forensics but useless against an active prompt injection that exfiltrates a user's data in seconds. This guide covers the architecture, the latency budget, the monitoring signals that matter, and the production playbook for shipping real-time LLM protection that actually holds up.
Why real-time matters
Two reasons. First, attack windows. A prompt injection that succeeds gets one shot to do something destructive — exfiltrate data, call an unauthorised tool, leak the system prompt. If you detect it in nightly log review, the damage is already done. Second, user experience. A real-time block returns a clean error to the user; an after-the-fact response means the model's harmful output already shipped to the user's screen and is in your logs and possibly their training data.
Real-time LLM protection also enables policy enforcement at the prompt level — refuse a request because it violates topical policy (no medical advice, no legal advice, no competitor mentions) without paying the inference cost. For high-volume applications this saves real money on top of the security benefit.
Architecture that holds up
A production-grade real-time LLM protection layer has four detection stages, ordered cheap-to-expensive so cheap detections short-circuit before expensive ones run. Pattern rules first (microseconds, deterministic, catches known templates). Structural validation second (low milliseconds, JSON schema and allowlist checks). Semantic classifiers third (10–60ms, transformer models for novel injection variants). Output-side checks fourth (runs only on responses, looks for exfiltration markdown, PII, secrets). Total median budget: under 100ms.
The fail-open vs fail-closed decision
Every real-time protection layer must answer one question before it ships: when the protection layer itself fails (timeout, dependency outage, panic), does traffic pass through unchecked (fail-open) or get blocked (fail-closed)? Fail-open optimises availability and is correct for low-risk consumer chatbots. Fail-closed optimises security and is correct for healthcare, finance, and any application where a single unfiltered response is worse than an outage. The right answer is per-tenant configurable. FilterPrompt defaults to fail-open with a loud alert; tenants in regulated industries flip to fail-closed.
What to monitor
- Block rate by rule — sudden spikes mean either an attack or a false-positive regression
- Median + p99 inspection latency — if p99 climbs past 300ms, streaming UX degrades
- Detection-layer health — each layer's error rate, separately. A semantic classifier outage should not silently fail-open the whole pipeline
- Verdict log volume — a sudden drop usually means an integration outage upstream, not that attacks stopped
- OWASP LLM Top 10 coverage by control — auditors will ask
- Per-tenant block rate — wildly different rates between similar tenants usually indicate a misconfigured policy
Production playbook
- Deploy the protection layer in shadow mode first — log verdicts but do not block. Run for 1–2 weeks to baseline false positives.
- Promote rules from shadow to enforce one category at a time. Start with prompt injection patterns (lowest false-positive risk), then PII redaction, then topic enforcement.
- Wire verdict logs into your SIEM (Splunk, Datadog, Sentinel) with alerts on anomalous block-rate deltas.
- Add a fail-open or fail-closed default per tenant based on their risk tier. Document the choice.
- Run an adversarial scanner weekly to verify the protection layer still blocks what it claims (regressions happen when you change models or prompts).
- Build a tenant-facing audit export — verdict log, OWASP LLM Top 10 mapping, period summary — so enterprise customers can satisfy their own auditors.
Common mistakes
Three patterns we see fail in production. First, single-layer protection — pattern rules only, no semantic layer. Trivial to bypass with paraphrased instructions. Second, no output-side check. The team protects the input, congratulates itself, and ships an application that happily echoes a markdown image pointing at attacker.com. Third, no verdict logs or unstructured logs. When the auditor asks 'show me what you blocked last quarter' the team has nothing.
