What is an LLM vulnerability scanner?

An LLM vulnerability scanner sends batteries of adversarial probes — jailbreaks, prompt injections, PII extraction attempts, harmful-content requests — at a target LLM and grades the responses. The output is a vulnerability + optimization report with per-probe evidence, severity, and a prioritized fix list.

Which LLMs can I scan with FilterPrompt?

OpenAI, Anthropic, Google Gemini, Azure OpenAI, plus any OpenAI-compatible endpoint — Ollama, Groq, Mistral, Together AI, OpenRouter, Perplexity, Hugging Face, vLLM, or your own custom endpoint. Bring your own keys per tenant.

What kinds of vulnerabilities does FilterPrompt test for?

Jailbreaks (DAN, role hijack, translation smuggling), direct and indirect prompt injection, system-prompt extraction, harmful-content compliance, PII / secret leakage, bias & fairness, RAG poisoning, agent/tool abuse, output quality, and robustness — categories map to the OWASP LLM Top 10.

How are probes graded?

Each probe declares an evaluator: regex match, refusal-check, contains-check, or an AI judge (Gemini 3 Flash). Pass/fail comes with severity, category, the exact prompt sent, the model's full response, and the evaluator's reason — fully auditable.

How much does a scan cost?

1 credit per probe executed. New accounts get 1 welcome credit on signup. Pay-as-you-go credit packs after that — credits never expire. Connecting LLMs and creating tenants is free.

AI in Cybersecurity: Pentesting AI, AI Pentesting, and the Certifications That Matter

Field Guide · 2018-06-21 · 15 min read · FilterPrompt Security Team

A 2026 field guide to AI in cybersecurity — how AI is changing SOCs, how to pentest AI systems, how attackers use AI to pentest you, and the AI security certifications worth the time.

AI in cybersecurity is no longer a buzzword on a vendor slide; it is the dominant force shaping both attacker and defender workflows in 2026. Detection engineering is being rewritten around LLM-driven triage, SOCs are reorganising around AI copilots, and the entire offensive industry is folding LLM-augmented reconnaissance and exploit generation into standard playbooks. At the same time, every enterprise rolling out AI internally has created a brand-new asset class — models, prompts, embeddings, agents — that classical security tooling does not protect. This guide walks through what AI cybersecurity actually looks like end to end, how to pentest AI systems, and which AI security certifications are worth your team's time.

How AI is changing cybersecurity, in concrete terms

Strip the marketing layer and three changes in cybersecurity actually matter:

Detection and triage are getting cheaper. LLMs summarise alerts, correlate signals across noisy feeds, and draft analyst notes. SOC teams report 40-60% reduction in tier-1 triage time when they wire an LLM into the alert pipeline correctly.
Offensive operations are getting cheaper. The same LLMs let attackers automate reconnaissance, generate phishing content per victim, rewrite malware to defeat signatures, and probe applications faster than human operators ever could.
A new asset class — AI systems themselves — is now in scope. Prompt injection, jailbreaks, model extraction, training data poisoning, and excessive agency on agents are categories that didn't exist on a 2020 risk register and dominate the 2026 one.

If your security program treats AI only as a tool you use defensively, you are missing the second and third changes — both of which are now larger threats than the first is a benefit.

AI in defensive cybersecurity: where it actually works

Alert triage and SOC copilots

The strongest production use case. An LLM reads the alert, the related logs, threat-intel context, and asset metadata, and produces a structured triage note: severity, suggested next action, related historical incidents. Analysts review and edit instead of writing from scratch. Crowdstrike Charlotte, Microsoft Security Copilot, and a long tail of in-house implementations all converge on the same shape.

Detection engineering

LLMs translate plain-language detections ('alert when a service account logs in from a new country and immediately escalates privileges') into Sigma, KQL, or SPL queries. The productivity multiplier on detection content is real, but the queries still need human review — LLMs hallucinate field names with enthusiasm.

Vulnerability management

AI-assisted prioritisation reads CVE descriptions, your asset inventory, and exploit-availability data and produces a rank-ordered remediation queue grounded in actual exploitability rather than raw CVSS. This is the single highest-value AI use case for resource-constrained security teams.

Phishing and social-engineering defence

LLMs are excellent at classifying inbound text for impersonation, urgency manipulation, and BEC patterns. The hard part is the false-positive rate; production deployments need careful tuning and a human-in-the-loop step for borderline cases.

AI in offensive cybersecurity: how attackers use AI against you

If you read only one section of this article, read this one. Attackers are ahead of most defenders in operational adoption.

Per-victim spear-phishing generated in seconds from public LinkedIn and conference data.
Voice and video deepfakes for vishing and BEC — the Hong Kong CFO deepfake wire-fraud case is the new normal.
LLM-driven reconnaissance — feed GitHub, DNS, breach dumps, and SaaS configs into a model and get a ranked list of attack paths.
Polymorphic malware generation — same payload, fifty syntactic variants, each defeats signature-based detection.
AI-assisted exploit chaining — code-capable models propose plausible exploit sequences faster than a junior pentester.
Adversarial ML against your defensive AI — attackers fuzz your phishing classifier and your prompt firewall, then craft inputs that slip past both.

Pentesting AI systems: a different discipline

Pentesting AI is not a subset of webapp pentesting. The asset is a model — its prompts, its training data, its tools, and its memory — and the techniques are mostly novel. A serious AI pentest covers six areas.

1. Prompt injection (direct and indirect)

Probe both user-input injection and content-borne injection planted in retrieved documents, tool responses, web pages, file uploads, and emails the agent reads. Indirect injection is the dominant real-world attack against RAG systems and agents in 2026 — make sure your pentest scope includes it.

2. Jailbreak resistance

Test against published jailbreak corpora (DAN, role-play, encoded payload, low-resource-language). Then move to multi-turn pressure: most production models hold up to a single jailbreak prompt but fold under conversational coercion.

3. PII and DLP leakage

Probe both directions. Inputs: does customer data the user shouldn't reach the provider, get sent? Outputs: can a user trick the model into echoing data it pulled from a knowledge base they shouldn't see?

4. Tool-use and excessive agency

For agents, enumerate every tool. For each, ask: can the agent be social-engineered into calling this tool for an unauthorised purpose? What are the parameters under attacker influence? What downstream system does the call hit?

5. Output handling

Does the application render model output in a context that can execute (HTML, markdown links, SQL, shell)? Indirect XSS via LLM output is now common; treat every model token as untrusted on the way out, the same way you treat user input on the way in.

6. Model and training data leakage

Membership inference, model inversion, and training data extraction probes — particularly for fine-tuned models on proprietary or personal data. Often missed because it requires statistical attacks rather than a single payload.

How to run an AI pentest in practice

Inventory the asset — model, system prompts, tools, retrieved corpora, memory, and the upstream provider.
Threat model — adversary profiles, target data, target actions. Don't reuse a webapp threat model wholesale.
Probe library — OWASP LLM Top 10, MITRE ATLAS, AVID, plus probes specific to your tools and data.
Multi-turn execution — single-shot probes miss two-thirds of production-relevant attacks.
AI-based detection grading — outperforms regex grading on every category we measure, especially DLP and tool-misuse.
Map every finding to a framework — OWASP LLM Top 10 IDs are the lingua franca; NIST AI RMF and EU AI Act references are the ones auditors actually want.
Hand the regressions back to engineering as CI checks — the goal of pentest #1 is to make pentest #2 boring.

AI security certifications worth the time

The certification market is still maturing. As of 2026 a small set are worth pursuing for credibility and skill-building; most are not.

ISACA AAISM — Advanced in AI Security Management

Management-level AI security certification from ISACA. Strong on governance, risk, and program structure. Right for a CISO, security manager, or anyone who has to defend an AI security program to executives and auditors.

ISC2 ISSMP / CISSP with AI focus areas

ISC2 has folded AI risk management into CISSP and ISSMP study domains. Not an AI-specific certification but, given how widespread CISSP is, the AI-aware coverage matters for hiring signal.

SANS / GIAC AI Security tracks

SANS launched dedicated AI red-team and AI defender courses in 2024-2025; GIAC certifications are following. Hands-on, expensive, and the deepest practical training currently available — worth it for senior individual contributors who actually have to do the work.

OWASP LLM Top 10 contributor work

Not a certification, but for IC-level credibility, contributing to OWASP LLM Top 10, the AVID database, or MITRE ATLAS is more meaningful in hiring than most paid courses. Recruiters in this niche read GitHub commits.

Vendor certifications (Microsoft AI-102, Google Cloud Generative AI Leader)

Useful if your stack is heavily on one cloud, but they're not security certifications — they're platform-engineering credentials with a security chapter. Don't confuse the two.

Avoid: 'AI cybersecurity' bootcamp certs

There is a long tail of low-quality 'Certified AI Security Professional' programmes that emerged after ChatGPT shipped. Most are slide-deck exams with no practical component. Recruiters in this field have learned to discount them.

Building an AI cybersecurity program: minimum viable controls

Inventory every LLM endpoint your organisation uses — including shadow IT — and assign an owner.
Run a baseline AI pentest against each production model and agent. Map findings to OWASP LLM Top 10.
Deploy input/output filtering (prompt firewall) on every customer-facing model. Pattern + ML + structural layers.
Add DLP redaction in both directions. Never ship customer PII to a third-party model provider.
Wire agents with explicit tool allowlists and parameter validation enforced outside the model.
Schedule continuous adversarial scans on staging — weekly minimum — to catch provider model drift.
Maintain an audit log of every model verdict and every blocked attack, with a 90-day retention floor.
Pick one framework as your reporting backbone: NIST AI RMF for US, EU AI Act Article 15 for EU, ISO/IEC 42001 if you operate globally.

Bottom line

AI in cybersecurity is a two-front problem. AI is a powerful defensive tool when used carefully — alert triage, vulnerability prioritisation, detection engineering, phishing classification all benefit. It is also a powerful offensive tool, and the attackers using it are ahead of most defenders in operational adoption. And the AI systems your business is rolling out are themselves a new asset class that classical security tooling does not protect. A serious 2026 program covers all three: defensive use, offensive resilience, and AI-system pentesting — with continuous adversarial scanning as the connective tissue. Pick your certifications carefully, ignore the bootcamp ones, and put the AI red team next to the appsec team, not inside the data-science org.