What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

AI Vulnerability Scanner — What to Look For in 2026

Buyer's Guide · 2026-02-07 · 12 min read · FilterPrompt Security Team

How AI vulnerability scanners work, the OWASP LLM Top 10 coverage checklist, and how to shortlist between FilterPrompt, Garak, Promptfoo and Lakera.

'AI vulnerability scanner' is a $15.58-CPC query with clear procurement intent. This guide walks through what a scanner does, what separates the good ones from the demoware, and how to shortlist for enterprise use.

What is an AI vulnerability scanner?

An AI vulnerability scanner runs a battery of adversarial probes against your LLM, agent, or GenAI application and reports which probes succeeded — i.e., which attacks the model failed to resist. A 'probe' is a crafted prompt (or multi-turn conversation) designed to trigger a known failure mode: prompt injection, jailbreak, PII disclosure, secret leakage, unauthorized tool use, and so on. A 'scan' is one execution of the probe battery against one endpoint. A 'finding' is a probe that produced an unsafe response.

How AI vulnerability scanners work

Probe library — curated attacks (usually 500–5000) covering the OWASP LLM Top 10.
Execution engine — parallel HTTP calls to your model with rate-limiting and retry.
Evaluator — grades the response as pass/fail/error. Best-in-class use an LLM-as-judge with confidence scoring.
Report generator — aggregates findings into a per-category score, per-probe evidence, and prioritized remediation.

The evaluator is where scanners diverge sharply. A cheap scanner uses substring-match ('if the response contains X, fail') and produces a flood of false positives. A serious scanner uses an LLM judge that reads both the attack and the response, considers whether the model refused vs complied, and grades with confidence. FilterPrompt uses the latter with a proprietary rubric plus a false-positive short-circuit for benign refusals.

OWASP LLM Top 10 coverage checklist

Shortlist rubric

Coverage — all 10 OWASP LLM categories, updated when OWASP publishes a new version.
Agentic probes — tests for function-calling injection and tool abuse, not just text-in/text-out.
Evaluator accuracy — LLM-graded with confidence, not substring match.
Evidence quality — PDF/HTML report per scan with full prompt/response chains and remediation.
Integrations — works with OpenAI, Anthropic, Gemini, Azure OpenAI, Bedrock, Ollama, vLLM, and any OpenAI-compatible endpoint.
Pricing predictability — per-scan or per-1M-token pricing, not 'contact sales'.

The top 4 in the market

FilterPrompt — enterprise scanner + firewall in one platform, 1,000+ probes, LLM-graded evaluator, PDF reports, and free-tier access to the full OWASP LLM Top 10 sampler. NVIDIA Garak — open-source, strong academic pedigree, CLI-only output. Promptfoo — developer-first, best for evaluations that live in CI. Lakera — commercial firewall-first with scanner add-on, strong on prompt injection.

Cost of running a scanner vs cost of not running one

The 2026 average cost of a public LLM security incident (per IBM's report) is $215k, driven by remediation, disclosure, credit monitoring, and reputational damage. A scanner running on every deploy for a year costs a few hundred dollars in credits. The break-even is a single prevented incident every 400 years — which is why AI security scanners now show up in every serious AppSec budget.