Pipeline Active
Last: 09:00 UTC|Next: 15:00 UTC
← Back to Insights

Every AI Agent Capability Advance Expands the Attack Surface

ClawHavoc weaponized 12% of ClawHub's registry in two weeks. Snyk found 36% prompt injection and 91% hybrid attacks. As agents gain speed, tools, and open-source access, security is losing an exponential race.

TL;DRBreakthrough 🟢
  • ClawHavoc compromised 341 of 2,857 ClawHub skills (12%) with AMOS infostealers in under two weeks; daily submissions grew 10x (under 50 to 500+) outpacing moderation
  • <a href="https://snyk.io/blog/toxicskills-malicious-ai-agent-skills-clawhub/">Snyk's ToxicSkills audit</a> found 36% prompt injection rate across 3,984 skills, with 91% using hybrid prompt-injection-plus-malware attacks that bypass both AI safety filters and traditional antivirus
  • Three CVEs in <a href="https://adversa.ai/blog/top-mcp-security-resources-february-2026/">Anthropic's own Git MCP server</a> (CVE-2025-68145 path traversal, CVE-2025-68143 unrestricted git_init, CVE-2025-68144 argument injection) enable RCE; if the reference implementation ships with exploitable vulnerabilities, community servers are presumptively less secure
  • MCP's 97M monthly SDK downloads and 5,800+ servers create transitive trust risk: a compromised server can cross-shadow legitimate servers by injecting hidden instructions targeting other tool connections
  • DeepSeek V4's Apache 2.0 release will enable self-hosted agents outside cloud security perimeters; Bitdefender already confirms <em>Shadow AI</em> agents on corporate machines with ungoverned terminal and disk access
securityai-agentssupply-chainmcpattack-surface5 min readFeb 17, 2026

Key Takeaways

  • ClawHavoc compromised 341 of 2,857 ClawHub skills (12%) with AMOS infostealers in under two weeks; daily submissions grew 10x (under 50 to 500+) outpacing moderation
  • Snyk's ToxicSkills audit found 36% prompt injection rate across 3,984 skills, with 91% using hybrid prompt-injection-plus-malware attacks that bypass both AI safety filters and traditional antivirus
  • Three CVEs in Anthropic's own Git MCP server (CVE-2025-68145 path traversal, CVE-2025-68143 unrestricted git_init, CVE-2025-68144 argument injection) enable RCE; if the reference implementation ships with exploitable vulnerabilities, community servers are presumptively less secure
  • MCP's 97M monthly SDK downloads and 5,800+ servers create transitive trust risk: a compromised server can cross-shadow legitimate servers by injecting hidden instructions targeting other tool connections
  • DeepSeek V4's Apache 2.0 release will enable self-hosted agents outside cloud security perimeters; Bitdefender already confirms Shadow AI agents on corporate machines with ungoverned terminal and disk access

The Structural Problem: Capability and Attack Surface Scale Together

The AI security story of February 2026 is not simply that attacks are happening -- it is that the attack surface grows with every capability improvement the AI industry celebrates. Faster inference means more agent actions per minute. Broader tool access via MCP creates transitive trust. Open-source model releases remove the cloud vendor security layer. Each advance the industry treats as progress is simultaneously a new attack vector.

Capability 1: Faster Inference Means More Agent Actions Per Minute

GPT-5.3-Codex-Spark generates 1000+ tokens/sec on Cerebras WSE-3 -- 15x faster than standard GPU inference. For coding agents, this translates to faster code execution, more tool calls per session, and higher-frequency system interactions. Each interaction is a potential attack vector.

An agent operating at 1000 tok/s can execute a malicious instruction, exfiltrate data, and cover its tracks before a human reviewer could notice the activity in real-time logs. The persistent WebSocket infrastructure (80% RTT reduction) that makes Spark fast also creates a persistent connection channel that a compromised agent could exploit for continuous data exfiltration -- unlike HTTP request-response patterns that create natural monitoring boundaries.

Capability 2: MCP Creates Transitive Trust

MCP's 97M monthly SDK downloads and 5,800+ servers mean AI agents now have programmatic access to emails, git repositories, file systems, databases, and enterprise APIs. Every MCP tool connection extends the blast radius of a compromised agent.

The transitive trust problem is critical: when an MCP client connects to multiple servers, a compromised server can inject instructions that modify the agent's behavior toward other legitimate servers. Cross-server shadowing -- where malicious MCP server A injects hidden instructions affecting legitimate MCP server B -- represents a new attack class with no traditional security analog. The agent does not distinguish between tool descriptions from trusted and malicious servers because MCP metadata is treated as trusted input.

Capability 3: Open-Source Models Enable Unmanaged Agent Deployments

DeepSeek V4's planned Apache 2.0 release enables self-hosted agent deployments outside cloud vendor security perimeters. Enterprises and individuals can run frontier-class models on consumer hardware (dual RTX 4090s) without cloud provider security monitoring, threat detection, or compliance enforcement.

Bitdefender's GravityZone telemetry already confirms Shadow AI -- employees deploying OpenClaw agents on corporate machines with broad terminal and disk access, outside IT governance. Gartner projects 40% of enterprise applications will integrate AI agents by EOY 2026 (from <5%). The shadow deployment problem will accelerate faster than security teams can catalog and govern agent permissions.

The Hybrid Attack Problem: Neither AI Safety Nor Traditional Security Catches It

Snyk's finding that 91% of malicious skills combine prompt injection with traditional malware reveals a detection gap that neither existing defense addresses:

  • Traditional antivirus sees no suspicious binary because the initial vector is a text-based prompt injection in tool metadata
  • AI safety filters are bypassed because tool descriptions are treated as trusted system input, not user input
  • The malware payload (AMOS infostealer) is delivered as a second stage after the prompt injection grants execution privileges

AMOS specifically targets the credentials AI agents use: browser tokens, OAuth credentials, crypto wallets, SSH keys, and AI agent API tokens. This hybrid attack class requires a new detection category that bridges AI safety and traditional security -- a gap OWASP MCP Top 10 acknowledges but does not yet solve.

The Escalation Timeline

The ClawHub compromise did not arrive without warning. The Shai-Hulud 2.0 self-propagating worm (October 2025) exposed 33,185 secrets across 20,649 repositories using stolen npm tokens, with 3,760 still valid days after discovery. Remediation at scale is fundamentally slower than exploitation. At MCP's 97M download scale, even a 1% compromise rate affects millions of agent deployments.

The npm ecosystem stabilized only after years of investment in provenance signing, 2FA enforcement, and automated malware scanning -- investments the AI agent ecosystem has not yet made. Daily ClawHub skill submissions jumped from under 50 to over 500 by early February 2026, a 10x increase that the registry maintainer admitted cannot be secured at current moderation capacity.

What This Means for Practitioners

ML engineers deploying AI agents must treat every MCP tool connection as a potential attack vector. The security problem is current -- ClawHavoc is active and MCP CVEs are exploitable today. Enterprise security hardening tooling (MCP-specific scanning, agent permission governance) is 6-12 months from maturity. That gap is the critical vulnerability window.

Immediate actions:

  • Audit all MCP server sources; implement allow-list-only skill policies rather than permissive discovery
  • Encrypt credential storage -- never plaintext config files. AMOS specifically targets credentials accessible to agent processes
  • Establish SIEM monitoring for agent API calls; the volume and frequency of calls is an anomaly signal even without content inspection
  • Catalog Shadow AI agent deployments at the enterprise level before Gartner's 40% adoption threshold materializes
  • Treat tool description metadata as untrusted input, not trusted system configuration

Security vendor opportunity: Snyk, Wiz, CrowdStrike, and Palo Alto have a new product category -- AI agent security posture management. Anthropic faces reputational risk from CVEs in their own protocol. Enterprise AI platform vendors that ship with MCP security defaults (allow-listing, credential encryption, audit logging) gain immediate trust advantage.

AI Agent Supply Chain Attack Escalation: August 2025 to February 2026

Timeline showing escalating sophistication and scale of AI agent supply chain attacks over 6 months

Aug 2025s1ngularity Credential Harvesting

2,349 credentials exposed via early agent targeting

Oct 2025Shai-Hulud 2.0 Self-Propagating Worm

33,185 secrets across 20,649 repos; self-sustaining via stolen NPM tokens

Jan 27, 2026ClawHavoc Campaign Begins

28 malicious skills uploaded Jan 27-29; escalation to 386 skills Jan 31-Feb 2

Feb 1, 2026Anthropic MCP CVEs Disclosed

3 CVEs in official Git MCP server enabling RCE via prompt injection

Feb 5, 2026Snyk ToxicSkills Published

36% prompt injection rate across 3,984 skills; 91% hybrid attacks

Feb 13, 2026OWASP MCP Top 10 Released

Security community formalizes AI agent threat taxonomy

Feb 14, 2026First Wild MCP npm Backdoor

'postmark-mcp' package harvesting emails from AI agents

Source: PointGuard AI, Snyk, Dark Reading, Adversa AI, OWASP

AI Agent Security Crisis: Key Metrics (February 2026)

Critical security statistics quantifying the scale of the AI agent attack surface

12%
ClawHub Malicious Rate
341 of 2,857 skills
36%
Prompt Injection Rate
1,467 of 3,984 skills
91%
Hybrid Attack Rate
Prompt injection + malware
97M
MCP Monthly Downloads
970x since Nov 2024
40%
Enterprise Agent Adoption
From <5% (Gartner EOY 2026)

Source: Koi Security, Snyk ToxicSkills, Gartner, MCP registry data

Share