What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

AI Vulnerability Scanner vs AI Firewall: Difference & Why You Need Both

Comparison · 2025-03-15 · 9 min read · FilterPrompt Team

An AI vulnerability scanner finds weaknesses offline; an AI firewall blocks attacks online. Here's exactly where each one fits and how to layer them in a production GenAI stack.

Every week we get the same question: 'We already have an AI vulnerability scanner — why would we need a vulnerability scanner too?' The answer is the same reason web teams run both Burp Suite and a WAF. They solve different problems on different timelines.

Scanner = offline testing. Firewall = online blocking.

A scanner is a build-time tool. It fires thousands of adversarial probes at your model in a controlled environment and tells you what your model gets wrong. A firewall is a runtime tool. It sits in front of every live request and blocks attacks as they happen — using rules, classifiers, and detectors that you tuned partly based on what the scanner found.

Side-by-side

Scanner runs: weekly, on every model upgrade, before launch. Firewall runs: every single request.
Scanner output: a report with evidence and severity. Firewall output: allow / block / sanitize per request, plus logs.
Scanner cost model: per-probe, bounded. Firewall cost model: per-request latency budget (~5–50 ms target).
Scanner failure mode: false negative (missed weakness). Firewall failure mode: false positive (blocked legit user).

What a scanner catches that a firewall misses

Hallucination. Bias. Excessive agency in tool use. Soft refusals that comply anyway. These are model-behavior issues, not request-shape issues — a firewall sees the prompt and response, but it does not have the time or rubric to judge nuance for every request. A scanner does, because it has minutes per probe instead of milliseconds per request.

What a firewall catches that a scanner misses

Novel attacks invented after your last scan. Real users probing your prompts in ways your test corpus never imagined. Tenant-specific abuse patterns. PII flowing in real customer data. The firewall is your last line of defense for everything the scanner could not have anticipated.

A reference layered architecture

Build-time: scanner runs in CI on every system-prompt change and model upgrade. Critical findings fail the build.
Pre-launch: full scanner suite signed off and attached to the launch ticket.
Runtime input: firewall input rules — prompt injection classifier, PII redaction, jailbreak patterns.
Model call: provider with safety settings on. System prompt isolated from user content.
Runtime output: firewall output rules — markdown sanitization, exfiltration URL allowlist, secret scanner.
Observability: firewall logs feed back into next week's scanner targets.

What about agentic systems?

Agents amplify both. Scanners need to test multi-turn tool sequences, not just single prompts. Firewalls need to gate tool calls with policy, not just prompts. If you are shipping an agent, you need both more than ever — and you need the scanner to specifically include excessive-agency and tool-abuse categories.

Bottom line

If you only have a firewall, you are blocking attacks against weaknesses you have never measured. If you only have a scanner, you are publishing a list of weaknesses with no runtime defense. Run both. Scan offline. Block online. Close the loop.