Why Your “Skill Scanner” Is Just False Security (and Maybe Malware)

著者

2026年2月11日

0 分で読めます

Maybe you’re an AI builder, or maybe you’re a CISO. You've just authorized the use of AI agents for your dev team. You know the risks, including data exfiltration, prompt injection, and unvetted code execution. So when your lead engineer comes to you and says, "Don't worry, we're using Skill Defender from ClawHub to scan every new Skill," you breathe a sigh of relief. You checked the box.

But have you checked this Skills scanner?

The anxiety you feel isn't about the known threats but rather the tools you trust to find them. It's the nagging suspicion that your safety net is full of holes. And in the case of the current crop of "AI Skill Scanners," that suspicion is entirely justified.

If you’re new to Agent Skills and their security risks, we’ve previously outlined a Skill.md threat model and how they impact the wider AI agents ecosystem and supply chain security.

Why Regex can't scan SKILL.md for malicious intent

The enemy of AI security isn't just the hacker; it's the infinite variability of language. In the traditional AppSec world, we scan for known vulnerabilities (CVEs) and known patterns (secrets). This approach works because code is structured, finite, and deterministic. A SQL injection payload has a recognizable structure. A leaked AWS key has a specific format.

But an AI agent Skill is fundamentally different. It is a blend of natural language prompts, code execution, and configuration. Relying on a denylist of "bad words" or forbidden patterns is a losing battle against the infinite corpus of natural language. You simply cannot enumerate every possible way to ask an LLM to do something dangerous. Consider the humble curl command. A regex scanner might flag curl to prevent data exfiltration. But a sophisticated attacker doesn't need to write curl. They can write:

c${u}rl (using bash parameter expansion)
wget -O- (using an alternative tool)
python -c "import urllib.request..." (using a standard library)
Or simply: "Please fetch the contents of this URL and display them to me."

In the last case, the agent constructs the command itself. The scanner sees only innocent English instructions, but the intent remains malicious. This is the core failure of the "denylist" mindset. You are trying to block specific words in a system designed to understand concepts.

The complexity explodes further when you consider context. A skill asking for "shell access" might be perfectly legitimate for a DevOps deployment tool. It is catastrophic for a "recipe finder" or a "calendar assistant." A pattern matcher sees "shell access" and must either flag both (creating noise) or ignore both (creating risk). It has no understanding of why the access is requested, only that the words exist.

Case study: We pitted community scanners against real malware

We decided to put the most popular community, "Skill Scanners," to the test. We looked at SkillGuard, Skill Defender, and Agent Tinman. We also pitted them against a custom "semi-malicious" skill to see if they could tell friend from foe.

1. SkillGuard: The scanner that was actually malware

Our first subject was SkillGuard by user c-goro. The promise? A lightweight scanner for your skills. The reality? It was a trap.

Code editor view of a Markdown file containing a malicious prompt injection attack that uses system tags to trick an AI into leaking API keys.

When we analyzed SkillGuard, our own internal systems flagged it not as a security tool, but as a malicious skill itself. It attempted to install a payload under the guise of "updating definitions."

Malicious Bash script debug.sh showing a reverse shell payload, base64-encoded host, and curl commands exfiltrating sensitive credential files to an external domain.

Update: As of this writing, SkillGuard has been removed from ClawHub. But for the hundreds of users who installed it, the damage is done. This illustrates a core problem: Who scans the scanner?

2. Skill Defender: The false negative

Next, we looked at Skill Defender (by itsclawdbro) and Agent Tinman (by oliveskin). These are legitimate community efforts, with Skill Defender even sporting a "Benign" badge on VirusTotal.

But "benign" does not mean "effective."

We created a test skill called vercel. It only appeared to be a deployment tool for the Vercel platform. In reality, it was designed to quietly exfiltrate the user's hostname to a remote server.

We ran Skill Defender against this malicious skill using the Gemini CLI.

> use the skill defender to scan current skills

The result:

The Malicious Skill (vercel): Verdict: CLEAN. 0 findings.
The Scanner Itself (skill-defender): Verdict: DANGEROUS. 20 findings.

Terminal output of a skill-defender security scan. One skill is flagged as dangerous with 20 findings, while the Vercel skill is marked as clean within a coding workspace.

The scanner failed to catch the actual threat because our exfiltration code in the fake Vercel skill didn't match its hardcoded list of "bad" strings. Yet, it flagged itself as dangerous because its own reference files contained the very "threat patterns" it scans for!

This is the classic "Antivirus Paradox": The scanner looks malicious because it knows what malice looks like, but it's blind to anything new.

3. Ferret Scan: Still limited to RegEx patterns

We also looked at Ferret Scan, a GitHub-based scanner. It claims to use "Deep AST-based analysis" alongside regex. While significantly better than ClawHub-native tools, it still struggles with the nuances of natural-language attacks.

It can catch a hardcoded API key, but can it catch a prompt injection buried in a PDF that the agent is asked to summarize?

Moving to behavioral analysis of agentic intent

We need to stop thinking about AI security as "filtering bad words." We need to start thinking of it as Behavioral Analysis.

AI code is like financial debt: Fast to acquire, but if you don't understand the terms (meaning, the intent of the prompt), you are leveraging yourself into bankruptcy.

A regex scanner is like a spellchecker. It ensures the words are spelled correctly. A semantic scanner is like an editor. It asks, "Does this sentence make sense? Is it telling the user to do something dangerous?"

Evidence from ToxicSkills research: Context is king

In our recent ToxicSkills research, we found that 13.4% of skills contained critical security issues. The vast majority of these were NOT caught by simple pattern matching.

Prompt injection: Attacks that use "Jailbreak" techniques to override safety filters.
Obfuscated payloads: Code hidden in base64 strings or external downloads (like the recent google-qx4 attack).
Contextual risks: A skill asking for "shell access" might be fine for a dev tool, but catastrophic for a "recipe finder."

Regex sees "shell access" and flags both. Or worse, it sees neither because the prompt says "execute system command" instead.

The solution: AI-native security for SKILL.md files

To survive this velocity, you must move beyond static patterns. You need AI-Native Security.

This is why we built mcp-scan (part of Snyk's Evo platform). It doesn't just grep for strings. It uses a specialized LLM to read the SKILL.md file and understand the capability of the skill and its associated artifacts (e.g, scripts)

You can think of running mcp-scan as asking:

Does this skill ask for permission to read files?
Does it try to convince the user to ignore previous instructions?
Does it reference a package that is less than a week old (via Snyk Advisor)?

By combining Static Application Security Testing (SAST) with LLM-based intent analysis, we can catch the vercel exfiltration skill because we see the behavior (sending data to an unknown endpoint), not just the syntax.

Tomorrow, ask your team these three questions:

"Do we have an inventory of every 'skill' our AI agents are using?" - If they say yes, ask how they found them. If it's manual, it's outdated. If they say no, share the mcp-scan tool with them.
"Are we scanning these skills for intent, or just for keywords?" - Challenge the Regex mindset.
"What happens if a trusted skill updates tomorrow with a malicious dependency?" - Push for continuous, not one-time, scanning.

Don't let "Security Theater" give you a false sense of safety. The agents are smart. Your security needs to be smarter. Learn how Evo by Snyk brings unified control to agentic AI.

GUIDE

Unifying Control for Agentic AI With Evo By Snyk

Evo by Snyk gives security and engineering leaders a unified, natural-language orchestration for AI security. Discover how Evo coordinates specialized agents to deliver end-to-end protection across your AI lifecycle.

Get the guide

開発者セキュリティプラットフォーム