What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

LLM Vulnerability Scanner: What It Is & How to Pick One (2026)

Guide · 2026-02-14 · 11 min read · FilterPrompt Team

A buyer's guide to LLM vulnerability scanners — what they actually test, how an AI vulnerability scanner differs from a runtime firewall, and the 7 capabilities that matter when you evaluate one.

An LLM vulnerability scanner is the automated red team for your AI app. It sends a battery of adversarial prompts — jailbreaks, prompt injections, PII extraction attempts, harmful-content elicitations, regulatory probes — to a target LLM and grades the responses. If you ship anything built on GPT, Claude, Gemini, or a fine-tuned open model, you need one before, not after, a customer finds the bug.

What an LLM scanner actually does

Unlike a LLM vulnerability scanner (which blocks attacks at runtime), a scanner runs offline, on demand, against a connected model endpoint. It produces a report: which probes the model failed, what it leaked, how confident the verdict is, and a remediation hint.

Probe library — hundreds to thousands of pre-built adversarial prompts mapped to OWASP LLM Top 10 categories
Proprietary multi-stage detection — deterministic checks layered with AI-based grading
Multi-LLM target support — same probes against OpenAI, Anthropic, Google, Azure, and custom OpenAI-compatible endpoints
Severity grading — critical / high / medium / low per finding, mapped to category
Reproducible reports — exportable transcripts with full prompt + response pairs for compliance evidence

Scanner vs. LLM vulnerability scanner vs. red team service

These three are complementary, not interchangeable. A scanner is the cheapest, fastest, and most repeatable layer; a firewall enforces in production; a human red team finds the long-tail novelties a scanner misses.

Scanner — automated, runs in minutes, catches the known unknowns (90%+ of OWASP LLM Top 10)
LLM vulnerability scanner — runtime gateway, blocks live attacks, generates the data you'll later regression-scan
Human red team — engagement-based, expensive, finds zero-days and business-logic exploits a generic probe can't

7 capabilities to evaluate

1. Coverage of OWASP LLM Top 10

Ask for a probe-to-category mapping. A scanner that doesn't cover LLM01 (Prompt Injection), LLM02 (Insecure Output Handling), LLM06 (Sensitive Information Disclosure), and LLM10 (Model Theft) is incomplete.

2. Bring-your-own-model support

You should be able to point the scanner at any OpenAI-compatible endpoint — including your fine-tuned model behind a private VPC. SaaS-only scanners that lock you to a vendor list are a red flag.

3. AI-based detection for nuance

Pure-regex scanners over-flag. Pure-judge scanners are slow and expensive. The good ones layer cheap deterministic checks (refusal detection, contains, regex) and only escalate to a judge model when uncertain.

4. Reproducibility

Every finding must include the exact prompt sent, the full response received, the evaluator that scored it, and a timestamp. Without this, you can't show an auditor anything.

5. Per-tenant isolation

If you serve multiple customers, your scanner must scope results, credits, and provider keys per tenant. Shared-bucket scanners create privilege-escalation risk.

6. Cost transparency

Probe runs cost LLM API tokens. A good scanner shows the credit cost up front (e.g. 1 credit per category × LLM connection) so you can scope a scan before you spend.

7. CI integration

You want a CLI or webhook that fails the build when a critical regression appears. Scanners that only live in a UI become shelfware.

How FilterPrompt approaches this

FilterPrompt is a full LLM Vulnerability Scanner: bring your own provider key (OpenAI, Anthropic, Azure, Google, custom), pick a suite from the catalog (sampler, vulnerability, compliance, fairness, robustness, agent, cost, multimodal), and get a scored report in minutes. Every probe transcript is stored under your tenant for audit. The Free plan ships with 1 welcome credit so you can run a real scan before you commit.