What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

LLM Regulatory Reporting: What Auditors Will Ask in 2026 (EU AI Act, NIST, ISO 42001)

Compliance · 2024-01-30 · 14 min read · FilterPrompt Team

A field guide to producing audit-ready LLM vulnerability reports for the EU AI Act, NIST AI RMF, ISO/IEC 42001, SOC 2, and HIPAA — with the exact evidence each framework expects.

Shipping a generative AI feature in 2026 is no longer a pure engineering decision. The EU AI Act is in force, NIST's AI Risk Management Framework (AI RMF 1.0 + Generative AI Profile) is the de-facto US baseline, and ISO/IEC 42001 is becoming the procurement checkbox enterprise buyers ask for before they sign. All three want the same thing in different vocabularies: documented evidence that you tested your LLM for known harms, found something, and did something about it.

This guide walks through the regulatory reporting surface for an LLM-powered product, the exact artefacts each framework expects, and how to produce them on a normal sprint cadence instead of a 6-week pre-audit fire drill.

What counts as 'an LLM vulnerability' to a regulator

Regulators do not think in CVE numbers. They think in harm categories. When an auditor opens your report, they are looking for evidence you have tested for at least these eight families:

Prompt injection (direct + indirect via RAG, tools, documents)
Jailbreaks and safety-policy bypass (DAN-style, role hijack, encoded payloads)
Sensitive data leakage (PII, PHI, secrets, training data extraction)
Hallucination in high-stakes domains (legal, medical, financial advice)
Bias, discrimination, and protected-class disparate impact
Toxic, harmful, or illegal content generation
Insecure tool / function-call handling (SSRF, command injection via tool args)
Denial-of-wallet and resource exhaustion

EU AI Act: what Article 9, 15, and 55 actually demand

If your system is classified high-risk (Annex III) or you are a provider of a general-purpose AI model with systemic risk, you owe three concrete documentation artefacts:

1. A risk management system (Art. 9)

A living document — not a one-shot PDF — that lists foreseeable risks, the test you ran for each, the residual risk, and the mitigation. The vulnerability report from your LLM scanner is the primary input to this. Re-run it every release and keep the dated history; auditors love a timeline.

2. Accuracy, robustness, and cybersecurity evidence (Art. 15)

You must demonstrate the model is robust against errors and adversarial inputs. In practice this means an adversarial test battery report with pass/fail per category, not just a model card. A FilterPrompt scan exporting category-level robustness scores satisfies this directly.

3. Adversarial testing and incident reporting (Art. 55, GPAI)

GPAI providers with systemic risk must perform and document adversarial testing (red teaming) and report serious incidents within 15 days. You need the testing artefact ready before launch, not after.

NIST AI RMF: the GOVERN / MAP / MEASURE / MANAGE evidence trail

NIST is voluntary in name and mandatory in practice — every US enterprise procurement questionnaire references it. Map your vulnerability report into the four functions:

GOVERN — your AI policy, roles, and the cadence at which you re-test (e.g. 'every model upgrade and at least monthly')
MAP — the harm categories you scoped in (use the GenAI Profile's 12 risk categories)
MEASURE — the actual scan results: counts, severities, example failing prompts, judge rationale
MANAGE — the tickets you opened, mitigations deployed (system-prompt hardening, output filters, refusal training), and the re-test that proved closure

ISO/IEC 42001: the AIMS clauses your scanner output covers

ISO 42001 is the AI Management System standard — the AI equivalent of ISO 27001. Auditors will look for evidence against Annex A controls. A vulnerability scan report is direct evidence for at least:

A.6.2.6 — AI system impact assessment
A.6.2.8 — AI system verification and validation
A.7.4 — Quality of data for AI systems (when you test for training-data leakage)
A.8.2 — Information for interested parties (the report itself, redacted)
A.9.3 — Responsible use of AI (test for misuse and abuse cases)

What an audit-ready LLM report actually contains

Strip out the marketing — these are the sections an auditor will tick off:

Scope: model name + version, provider, endpoint, system prompt hash, RAG sources, tools exposed
Methodology: probe taxonomy used (OWASP LLM Top 10 / MITRE ATLAS), number of probes per category, judge model and version
Results: pass/fail counts per category, severity distribution, top 10 highest-severity failures with full prompt + response
Reproducibility: random seed, temperature, date/time, scanner version
Mitigations: for each failed category, the change deployed and the date
Re-test: scores after mitigation, delta vs. previous run
Sign-off: name + role of the person accepting residual risk

Sector-specific overlays

Healthcare (HIPAA, FDA SaMD)

Add a PHI-leakage test battery using realistic-but-synthetic patient data. Document that no real PHI was used. If the model influences clinical decisions, you also need an FDA Predetermined Change Control Plan describing how often you re-test.

Financial services (DORA, SR 11-7, NYDFS)

Model risk management (SR 11-7) treats LLMs as models. You owe an independent validation report — a third-party or internal-but-segregated team running the vulnerability scan and signing off. DORA adds an ICT third-party risk angle: if you use OpenAI/Anthropic, document the provider's own red-team attestations.

Public sector (US EO 14110 follow-ons, UK AISI, Singapore IMDA)

Pre-deployment evaluation reports are increasingly mandatory. The bar is higher: expect to share probe-level data, not just summaries.

How often to re-run the report

Every model version change (provider upgrade from gpt-5 to gpt-5.2 = new report)
Every system-prompt change that touches safety language
Every new tool / function added to the agent
Every new RAG source or knowledge-base ingest pipeline change
On a calendar cadence — monthly minimum for production systems

Common reasons reports get rejected by auditors

Only one harm category tested (usually prompt injection) — looks like security theatre
No probe count or methodology disclosed — results are not reproducible
Failures listed without mitigations or re-test — open items with no owner
Test data is the same as the system prompt's few-shot examples — model is being tested on its training set
No version info on model, scanner, or judge — results cannot be compared over time

The two-hour version

If you have a pre-audit deadline this week: connect your production LLM endpoint to FilterPrompt, run the full Standard battery (covers all 8 harm families with ~400 probes), export the PDF + JSON, paste it as Annex C of your AI risk file, and open Linear tickets for every Critical and High finding with an owner and a due date. That is a defensible posture to walk into an audit with.

Implementation Patterns for Audit-Ready LLM Reporting

Achieving audit-readiness for LLM vulnerability reporting is less about the tools and more about the integration into your SDLC. Organizations that succeed integrate these activities into existing CI/CD pipelines, rather than treating them as separate, compliance-driven tasks. This reduces friction and ensures continuous compliance, rather than reactive scrambling.

A mature implementation pattern involves automated scanning of new model versions or significant prompt changes within the pre-production environment. The output of these scans should automatically generate tickets in issue tracking systems (e.g., Jira, ServiceNow) and update a "single source of truth" for LLM risk posture. This enables developers to address vulnerabilities proactively and provides auditors with a real-time, auditable trail of identification, remediation, and verification.

Navigating Common Pitfalls in LLM Compliance Reporting

Even with the right tools, companies often stumble over common pitfalls when preparing LLM compliance reports. One frequent error is producing static, snapshot-in-time reports that quickly become outdated. Auditors are looking for dynamic, living documentation that reflects the continuous evolution of your AI systems and your risk management efforts. Another pitfall is failing to contextualize technical findings into business impacts or harm categories, which is essential for regulatory frameworks like the EU AI Act.

Furthermore, many organizations over-rely on generic security questionnaires without specific LLM-focused questions, leading to a superficial assessment. A robust approach requires translating generic security principles into specific threats and mitigations pertinent to large language models, covering the unique attack surfaces like prompt injection, data extraction, and hallucination. Over-reporting minor issues without prioritization, or under-reporting systemic failures, can both lead to audit dissatisfaction.

Static reporting: Avoid one-off PDFs; opt for living, version-controlled documents.
Lack of business context: Translate technical vulnerabilities into potential harms (e.g., reputational damage, financial loss, discrimination).
Generic security assessments: Incorporate LLM-specific threat modeling and testing.
Failure to prioritize: Auditors expect to see a clear risk ranking and remediation plan.
Insufficient evidence of remediation: Show not just findings, but the tickets, code changes, and re-tests that closed them.

Metrics and Benchmarking for LLM Security Posture

Quantitative metrics are critical for demonstrating control effectiveness and continuous improvement in LLM security. Auditors appreciate seeing trends over time, which speaks to a mature risk management process. Key metrics include the 'time to detect' and 'time to remediate' for critical vulnerabilities, the coverage of adversarial test suites, and the reduction in specific harm category incidents post-mitigation. Benchmarking against industry averages or internal baselines provides invaluable context for these metrics.

For example, tracking the percentage of production prompts that trigger a safety filter, or the rate of successful jailbreaks against a red teaming effort, provides tangible evidence of your model's robustness. Establishing quantifiable thresholds for these metrics – such as 'less than 0.1% critical hallucinations in medical advice generation' – transforms subjective assessments into objective compliance targets. Tools that deliver trend analysis and comparison across different model versions streamline this reporting.

Decision Frameworks for LLM Risk Categorization

The EU AI Act's focus on 'high-risk' AI systems necessitates a robust decision framework for categorizing your LLM applications. This isn't a one-time exercise; it needs to be an integral part of your product development lifecycle. A clear framework helps identify which regulatory obligations apply and ensures proportional governance. This framework should consider the deployed context, potential impact on fundamental rights, and the model's decisional autonomy.

For instance, an LLM used purely for internal knowledge retrieval might be low-risk, while the same LLM underpinning a medical diagnostic tool or a credit assessment system would undoubtedly be high-risk. Your framework should leverage a structured questionnaire that walks product teams through a series of questions mapped directly to the EU AI Act's Annex III criteria, or NIST's impact assessments, leading to a definitive risk classification and corresponding regulatory obligations. This pre-computation of risk ensures that, by the time an auditor appears, your team knows exactly why a system is classified as it is.

Vendor Comparisons for LLM Security and Compliance Tools

The market for LLM security and compliance tools is rapidly maturing, offering a range of solutions that automate vulnerability scanning, red teaming, and reporting. When selecting a vendor, consider their alignment with established taxonomies like OWASP LLM Top 10 and MITRE ATLAS, their ability to integrate with your existing CI/CD and GRC platforms, and their reporting capabilities tailored for regulatory frameworks. Look for tools that provide both programmatic access (APIs) and user-friendly dashboards.

Evaluate vendors not just on their detection capabilities, but on their support for remediation workflows. Can they automatically generate actionable recommendations? Do they offer pre-built compliance report templates for NIST AI RMF or ISO 42001? A thorough comparison should involve PoCs with your specific LLMs and use cases, covering a spectrum of attack vectors including indirect prompt injection via RAG data sources, which many early tools overlook. FilterPrompt, for instance, focuses heavily on contextual and multi-turn adversarial testing, which is crucial for real-world scenarios.