What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

AI Data Leak Protection: A Practical Guide for LLM Teams

Best Practice · 2026-07-01 · 9 min read · FilterPrompt Security Team

How AI data leakage happens in production LLMs, why AI DLP is different from legacy data loss prevention, and the data leakage prevention tools that actually stop it.

What AI data leakage really means

AI data leakage is not a breach in the classic sense. It is the silent, often accidental movement of sensitive data into or out of an AI model. A customer support agent pastes a credit card into a chatbot. A developer includes a production API key in a prompt. A retrieval pipeline returns a row of a patient database. The model memorises it, echoes it, or exfiltrates it through a tool call.

The result is the same: data that should have stayed inside your trust boundary is now sitting in a provider log, a model fine-tuning set, or an attacker's server. AI data leak protection is the discipline of stopping that movement before it happens.

The four LLM data leakage vectors

1. Input-side leakage

Every prompt is a data-transfer event. Users paste PII, source code, legal documents, and internal identifiers into chat interfaces. Without AI DLP, that data travels to the model provider, is stored in logs, and may be used in training. Input-side controls must redact, tokenise, or block before the prompt leaves your infrastructure.

2. Output-side leakage

LLMs can regurgitate training data or retrieved context. A model that has seen a private email dataset may quote it back to a different user. Output-side AI data leak protection checks the response for PII, secrets, and internal identifiers before it reaches the user.

3. Retrieval and grounding leakage

RAG pipelines leak when access controls are missing at the chunk level. A prompt about one customer can retrieve documents that belong to another. AI data security here means enforcing tenant isolation on the vector store, filtering retrieved chunks by user permissions, and validating that the model only grounds answers in allowed context.

4. Tool and agent exfiltration

Agents with tool access can send data to external APIs, render tracking images, or write sensitive values into function arguments. An attacker who jailbreaks the agent can turn it into a data exfiltration channel. This is why LLM data leakage prevention must include adversarial testing of tool-calling behaviour, not just static pattern matching.

Why legacy DLP fails for LLMs

Traditional data leakage prevention tools were built for endpoints, email, and file shares. They look for known patterns: Social Security numbers, credit cards, or classified file headers. They do not understand the structure of a conversation, the semantic meaning of an embedding, or the risk of a tool call.

Legacy DLP scans files; LLM risk lives in tokens, prompts, and context windows.
Regex cannot detect paraphrased PII, code that reveals architecture, or prompts that extract internal knowledge.
Email DLP does not inspect model responses, image inputs, or multi-turn conversations.
Static policies cannot adapt to new jailbreaks, encoding tricks, or provider model updates.

AI DLP is a different category. It must be semantic, bidirectional, and adversarial — running on both the request and the response, and continuously tested against new attacks.

The AI data leak protection stack

Layer 1: Input redaction and classification

Detect sensitive entities before they reach the model. Use a combination of deterministic detectors for emails, phones, credit cards, and secrets, plus semantic classifiers for custom categories like patient IDs, internal project names, or unreleased product features.

Layer 2: Output filtering and repetition checks

Scan every response for leaked PII, secrets, and internal identifiers. Add repetition checks to catch model memorisation: if the model starts returning long verbatim chunks of training or retrieval data, block or redact the response.

Layer 3: Retrieval access controls

Bind vector chunks to tenant, user, and role. Apply the same permissions at retrieval time that you apply at the source database. Never assume that a top-k similarity search will respect your access model.

Layer 4: Adversarial testing and continuous monitoring

Run LLM vulnerability scans against your production system. Test for prompt injection that exfiltrates data, jailbreaks that extract system prompts, and multi-turn attacks that slowly pull out sensitive context. Repeat the scans weekly to catch drift from provider model updates.

Choosing data leakage prevention tools

Not every tool marketed as AI DLP actually prevents LLM data leakage. When evaluating vendors, look for these capabilities rather than marketing language.

The best data leakage prevention tools for LLMs combine runtime protection with pre-deployment assurance. Runtime rules catch known patterns; adversarial scanning catches the unknown patterns attackers will use next week.

AI data leakage in real deployments

Customer support chatbots

Users paste order numbers, addresses, and partial card details into support chats. Without AI data leak protection, those details become training feedback or appear in model responses to other customers.

Internal coding assistants

Developers copy error messages, environment variables, and proprietary algorithms into coding assistants. AI data security here means redacting secrets and blocking paste of internal repository paths or architecture descriptions.

Healthcare and finance copilots

Regulated industries face the highest risk. A single LLM data leakage event involving PHI or payment data can trigger breach notification laws. These deployments need the full stack: redaction, retrieval isolation, output filtering, and audit-ready reporting.

Mapping AI data leak protection to compliance

Regulators are starting to treat AI data leakage as a preventable control failure. The EU AI Act requires risk management and data governance for high-risk AI systems. NIST AI RMF maps data leakage to the Govern, Map, and Manage functions. SOC 2 expects evidence that sensitive data is not exposed through application logic.

EU AI Act — maintain records of data governance and risk controls for high-risk systems.
NIST AI RMF — identify data leakage risks and measure effectiveness of mitigations.
SOC 2 CC6.1 / CC6.6 — implement logical access and system monitoring that covers LLM interactions.
HIPAA — treat prompts and responses containing PHI as protected information in transit and at rest.

Audit-ready data leakage prevention tools produce logs, reports, and scan evidence that satisfy these frameworks without manual spreadsheet work.

Getting started this week

Inventory every LLM integration in your environment and classify the data each one touches.
Turn on input-side redaction for known PII, secrets, and internal identifiers.
Add output-side filtering for the same categories, plus checks for long verbatim repeats.
Enforce tenant and role-based access on every retrieval pipeline.
Run an adversarial LLM scan to find data exfiltration paths, then schedule it weekly.
Document the controls and export evidence for your compliance team.