How AI Agents Still Break Security When Nothing Is Broken
AI agents are quickly moving from experiments to production. They triage alerts, review pull requests, summarize logs, route tickets, and take action across systems using real credentials.
From a security perspective, many of these systems look clean. There’s no memory-unsafe code. No obvious injection flaw. No broken authentication. And yet, they can still fail, sometimes catastrophically, simply by following a single sentence.
This gap isn’t about missing patches or unscanned dependencies, but rather about how non-deterministic systems, such as LLMs, change the meaning of trust boundaries.
Recent Snyk Labs research explores why traditional AppSec models struggle to reason about agent behavior and why threat modeling is emerging as one of the most effective defenses for AI-native applications
Deterministic security doesn’t map cleanly to agentic systems
Traditional application security is built around predictable behavior. Inputs are treated as data, instructions are constrained by code, and controls enforce clear boundaries. However, AI agents operate differently. They interpret meaning, reason over context, and decide when to act. This creates a new class of risk in which the system behaves exactly as designed, yet the outcome remains harmful.
For example:
An AI agent reads untrusted text from an issue, email, or document.
That text subtly changes how the agent interprets its task.
The agent uses legitimate tools and permissions to take an action.
Sensitive data ends up somewhere it never should have.
From a security scanner’s perspective, nothing is “broken,” and from an attacker’s perspective, everything worked. This is why prompt injection is at the top of the OWASP Top 10 for LLMs, and why static analysis alone cannot catch these failures.
The real risk of agentic systems is behavioral, not technical
One of the most important shifts highlighted in the research is where failures occur. Agentic systems tend to fail at the behavioral layer, not the code layer. Security teams are increasingly forced to ask questions like:
Is this input acting as context or instruction?
Should this agent be allowed to act on what it just read?
What happens if this output becomes another agent’s input?
These are semantic decisions, and they are probabilistic by nature. Even well-designed guardrails cannot guarantee perfect outcomes at scale. This is where many existing tools stop being effective—not because they are poorly built, but because they were never designed to reason about intent, agency, and information flow.
Recent industry incidents make this shift unmistakable.
In early 2025, researchers disclosed failures in enterprise agent deployments at both ServiceNow and Salesforce, where no traditional vulnerability was present.
In the ServiceNow case, an internal agent correctly processed a request that appeared to come from a trusted provider, but because identity was treated as user-supplied metadata rather than a verified claim, the agent instantiated a high-privilege session on behalf of an attacker — a classic confused-deputy outcome triggered entirely by agent logic.
In Salesforce, a public form field flowed unmodified into an internal “summarizer” agent’s context window, allowing untrusted input to reshape the agent’s intent and turn a routine workflow into a data-exfiltration path. In both cases, code scans passed, permissions were technically valid, and the systems behaved as designed. The failure occurred at the behavioral layer, where agents inferred meaning, authority, and intent from context that security tooling treated as inert data.
Why composition matters more than individual agents
Many of the most serious failures in agentic systems don’t originate inside a single agent. They emerge when agents are combined into workflows, pipelines, or orchestrated systems. On their own, each agent may appear well designed: permissions are scoped, policies are enforced, and behaviors align with expectations. Security reviews pass because nothing looks obviously unsafe in isolation.
The risk appears at the seams. When one agent’s output becomes another agent’s input, new data paths form—often without any explicit checks on sensitivity, intent, or destination. Information can cross trust boundaries simply because the system assumes downstream agents will “do the right thing.” This mirrors familiar security patterns such as confused deputies and time-of-check, time-of-use gaps, but with an important difference: in agentic systems, these failures can be triggered dynamically by model reasoning rather than fixed logic.
As agent workflows grow more complex, risk becomes an emergent property of the system rather than a defect in any single component. That makes composition, not individual agents, the primary unit of security analysis.
Threat modeling gives security teams leverage
Traditional security tools struggle to reason about agentic behavior because nothing is technically broken. Threat modeling reintroduces structure by shifting the focus from code correctness to system behavior. Instead of asking whether a function is vulnerable, teams examine how data, authority, and decisions move through an agentic system.
This approach surfaces questions that scanners cannot answer. Where does untrusted input enter the workflow? Which agents can read sensitive data? Which actions change the external state? How do decisions made earlier in a workflow constrain or enable later actions?
By mapping these relationships, threat modeling reveals failure paths that emerge only when components interact, which remain invisible when reviewing agents in isolation. For security teams navigating non-deterministic systems, this is a critical advantage. Threat modeling doesn’t attempt to predict every outcome. Instead, it helps teams understand where failures could occur and design controls that limit impact when they do.
From prevention to containment
Agentic systems force a rethink of what “secure” means. When behavior is probabilistic, likelihood can rarely be reduced to zero. Even strong safeguards leave room for failure, and at scale, small probabilities become operational realities. In this environment, security strategies that rely solely on prevention fall short.
Containment becomes just as important as detection. The goal shifts toward limiting what any agent can access, what combinations of capability are possible, and how far sensitive data can travel. Instead of assuming boundaries will always hold, systems are designed to remain safe when they don’t. This mindset prioritizes blast-radius reduction, capability separation, and explicit controls on data flow.
Threat modeling supports this transition by helping teams identify where containment matters most. It provides a framework for designing resilient systems that continue to protect users and data even when models behave in unexpected ways.
Why agentic AI security starts with threat modeling
AI agents are changing how software behaves, and in doing so, they are changing how security failures occur. The most significant risks no longer live in isolated vulnerabilities or misconfigurations, but in how autonomous systems interpret context, combine capabilities, and act across trust boundaries. Securing these systems requires moving beyond deterministic assumptions and adopting approaches that account for behavior, interaction, and impact.
Threat modeling offers a practical way forward. By treating agent behavior and system composition as first-class security concerns, teams gain the clarity needed to design controls that scale with AI-driven development. As agentic systems become foundational to modern applications, this shift will define how organizations build security that keeps pace with what’s next.
Interested in diving deeper and reading the comprehensive research? Step into the Lab today.
SNYK LABS
Try Snyk’s Latest Innovations in AI Security
Snyk customers now have access to Snyk AI-BOM and Snyk MCP-Scan in experimental preview – with more to come!