What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

OWASP Top 10 for LLM Applications 2025 — Full Breakdown

Framework · 2026-03-21 · 15 min read · FilterPrompt Security Team

Every category in the OWASP Top 10 for Large Language Model Applications 2025 — what it means, real examples, and how to test for each.

The OWASP Top 10 for LLM Applications is the single most-cited framework in AI security — 8,100 searches per month for the top query and growing. This post walks the entire 2025 taxonomy with real examples and testing guidance.

LLM01 — Prompt Injection

Attackers manipulate model behavior through crafted input. Direct: typed into the prompt. Indirect: hidden in data the model reads (URLs, docs, emails). This is the highest-volume attack class in production LLMs — every serious defense program starts here. Test with the FilterPrompt LLM01 probe battery, run monthly.

LLM02 — Insecure Output Handling

LLM output is passed downstream (rendered as HTML, executed as SQL, run as shell) without sanitization. The model becomes an attack vector for classic web/DB vulnerabilities. Defense: treat model output as untrusted; sanitize before rendering or executing.

LLM03 — Training Data Poisoning

Attacker introduces malicious or biased data into pretraining or fine-tuning. Backdoor triggers, biased outputs, or knowledge corruption result. Defense: dataset provenance, integrity hashes, evaluation against poisoning benchmarks.

LLM04 — Model Denial of Service

Attackers craft prompts that force expensive generations or exhaust the context window, driving up cost and blocking legitimate users. Defense: rate limits, generation length caps, complexity scoring on prompts.

LLM05 — Supply Chain Vulnerabilities

Compromised model weights, poisoned fine-tuning datasets, malicious plugins, or unsafe vector stores. Defense: SBOM for AI, signed models, plugin allowlists.

LLM06 — Sensitive Information Disclosure

Model reveals system prompt, PII, secrets, or proprietary data. Defense: firewall with PII/secret detection, system-prompt hardening, output DLP.

LLM07 — Insecure Plugin/Tool Design

Tools/functions accept unvalidated input from the model, allowing SQLi, SSRF, or unauthorized actions. Defense: schema validation, strict allowlists, principle of least privilege.

LLM08 — Excessive Agency

Agents are given too many tools or too broad permissions; an injection triggers unauthorized email, calendar, DB, or code-execution actions. Defense: minimum-viable tool sets, human approval on high-risk actions, per-tool blast radius analysis.

LLM09 — Overreliance

Users trust hallucinated model output for high-stakes decisions. Defense: source citations, confidence surfacing, human review workflows for regulated domains.

LLM10 — Model Theft

Attackers extract model weights or replicate behavior via query-based extraction. Defense: rate limits per user, output watermarking, extraction-detection scanning.

How to score your app against the OWASP LLM Top 10

Run FilterPrompt Scanner's OWASP LLM Top 10 sampler — free, ~5 minutes.
Review the per-category score in the PDF report.
Prioritize LLM01 + LLM06 + LLM08 first (highest exploitation rates).
Fix, re-scan, and add rules to your AI firewall for anything you can't fix at the model level.