What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

AI Firewall & LLM Firewall — What It Is and Why You Need One

Guide · 2026-02-21 · 11 min read · FilterPrompt Security Team

How AI firewalls block prompt injection, jailbreak, PII leakage and tool abuse in real time — and what to check before you buy.

The web has firewalls (WAFs). Networks have firewalls (NGFW). LLMs need firewalls too, and 2026 is the year 'AI firewall' / 'LLM firewall' moved from novelty to procurement checkbox. This post covers what an AI firewall does, the detections that matter, and how to evaluate one before you sign a contract.

What is an AI firewall?

An AI firewall (also called an LLM firewall, LLM gateway, or GenAI guardrail proxy) is a service that sits between your application and the LLM. Every prompt going in and every response coming out is inspected. Traffic that matches attack patterns — direct or indirect prompt injection, jailbreak phrasing, PII, secrets, off-policy topics, unauthorized tool calls — is blocked, masked, or rewritten before it reaches the model or the user.

Firewall vs scanner: complementary, not overlapping

The detections that matter

Prompt injection (direct + indirect) — the highest-volume attack class.
Jailbreak — role-play attacks, DAN/DUDE variants, safety-alignment bypasses.
PII exfiltration — emails, phones, national IDs, credit cards leaving the model.
Secret exfiltration — API keys, private keys, tokens embedded in responses.
Tool/function-call abuse — attempts to trigger unauthorized email, calendar, DB actions.
Off-policy topics — competitors, legal advice, medical advice, self-harm.
Toxicity / harmful content — for consumer-facing applications.

Architecture patterns

Three patterns dominate in 2026: (1) SDK middleware — a wrapper around the OpenAI/Anthropic client that inspects locally; simplest to add. (2) Sidecar proxy — an HTTP proxy the app calls instead of the model provider; language-agnostic. (3) API gateway plugin — installed on Kong/APIM/Cloudflare; fits enterprises that already gateway all traffic. FilterPrompt supports all three.

Latency budget

In 2026, 'AI firewall' buyers expect <50ms p95 for prompt inspection and <100ms p95 for response inspection. Anything higher is felt in chat UX. FilterPrompt inspects with a small classifier + rule engine and typically stays under 40ms p95.

Evaluation checklist

Ask for a live demo against your traffic, not a canned dataset.
Measure false-positive rate on 24h of your actual prompts.
Verify PII/secret detection covers your specific formats (national IDs, employee numbers).
Confirm blocked-request evidence is queryable for incident response.
Check the integration matches your architecture (SDK, proxy, gateway).
Understand pricing: per request, per 1M tokens, or seat-based.

How FilterPrompt Firewall works

Your app calls FilterPrompt with the prompt + destination model. FilterPrompt inspects, applies your rule set (which can be seeded from your last scan), forwards to the model if safe, inspects the response, and returns the answer to your app. Every decision is logged with full context for audit. The same rule engine ships in the free tier — you can wire it up in an afternoon.