What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

AI Red Teaming vs. Automated Vulnerability Scanning: When to Use Each

Comparison · 2024-10-09 · 9 min read · FilterPrompt Team

When to hire a human red team, when to run an automated LLM scanner, and the realistic budget split most AI teams should use in 2026.

Both AI red teaming and automated vulnerability scanning produce findings against your LLM app. The difference is who runs them, what they catch, and how often you can afford to do them. Pick the wrong one and you either pay 50× too much or miss the only bug that matters.

What automated scanning does well

Runs in minutes, on every deploy
Catches the known patterns: jailbreaks, instruction overrides, PII leaks, refusal bypass
Produces the same probe twice → great for regression testing fine-tunes
Cheap enough to run nightly in CI

What human red teaming does well

Finds business-logic exploits — 'use the customer-support agent to issue a refund to my own account'
Chains multiple low-severity bugs into a high-severity attack
Tests social-engineering vectors a probe library doesn't have
Writes the narrative your insurer or board actually wants to read

The realistic 90/10 split

For most teams: 90% of your testing budget on automated scanning (continuous), 10% on a human red team (annually or pre-launch). The scanner finds the bugs that scale. The humans find the bugs that scare you.

When to use only automated scanning

Pre-revenue startups, internal-only tools, and apps where the LLM has read-only scope. The marginal value of a human engagement is low until your blast radius gets bigger.

When you must add human red teaming

Your LLM has tool-use / agent capabilities (write actions, payments, code execution)
You're handling regulated data (HIPAA, PCI, GDPR special categories)
You're shipping to enterprise — they will ask for a third-party report
You're pursuing SOC 2, ISO 42001, or NIST AI RMF certification