Understanding Toxic Flows in MCP and the Hidden Risk of AI-Native Systems

Written by

0 mins read

Most discussions about AI security still focus on prompts, models, and direct access to data. Those areas matter, but once an AI agent is allowed to call tools, invoke APIs, and act across multiple systems, the real risk shifts. It is no longer only about what the model can see. It is about what the agent can do when it begins composing tools on its own.

This is the context in which the concept of a toxic flow becomes important.

A toxic flow is a sequence of agent actions that takes an environment from attacker-controlled instructions to sensitive data and then onward to an exfiltration point. None of the individual steps needs to be malicious. Each tool may be doing exactly what it was designed to do. The danger lies in the end-to-end path created when tools, data, and instructions are combined under the control of an AI agent.

Model Context Protocol (MCP) amplifies both sides of this equation. MCP standardizes how models and agents connect to tools, repositories, services, and data sources. For development teams, this is a clear win: they gain a consistent way to plug AI into everyday workflows such as reading issues, updating tickets, querying logs, modifying code, and triggering scripts. At the same time, MCP provides agents with a flexible toolkit that makes it easier to inadvertently create toxic flows.

In a traditional deterministic application, a given input follows a defined path through code. Engineers can enumerate those paths, test them, and reason about their security properties. An MCP-based agent behaves differently. It can choose among many tools, in different orders, based on natural language instructions, context, and tool descriptions. No one writes every possible sequence explicitly. The model selects the path at runtime.

This shift creates a new category of exposure. It is no longer sufficient to ask whether a model has access to secrets or private repositories. The more relevant question becomes: under what conditions will an agent decide to move sensitive data through a chain of tools that ultimately exposes it to an untrusted party?

“Toxic flow” is the term for that chain. It reinforces that the risk is emergent. It is a property of how data and control move through many components, rather than a simple consequence of a single misconfiguration. In MCP environments, understanding and governing those flows is essential because agents are already wired into systems that matter: development environments, source control, incident response workflows, and production-adjacent services.

The lethal trifecta: how a tool chain becomes a breach

Despite the variability of agent behavior, the pattern behind toxic flows is remarkably consistent. When real incidents are examined, three elements tend to appear together inside a single agent execution:

attacker-influenced instructions
access to sensitive data
a way to exfiltrate that data

When these three conditions coexist in one flow, the environment is exposed, even if each tool involved was added for a legitimate reason.

Untrusted instructions are often the starting point. In an MCP setting, they are not limited to a chat prompt. An attacker can shape the content of a GitHub issue, a customer support ticket, a message in a monitored chat channel, or any other object that the agent is configured to read. If the agent’s purpose is to triage, summarize, or act on those items, the attacker has gained a path into the agent’s reasoning process.

Sensitive data typically resides behind tools that were introduced to improve productivity. These might include actions such as reading repository contents, retrieving configuration files, querying issue trackers, pulling logs, or accessing internal records. Engineers want the agent to see the same information they would use to fix a bug or understand a production problem. As a result, the agent’s toolset often includes direct access to high-value data.

Exfiltration sinks are tools or channels that can move data beyond its safe boundary. Common examples include HTTP clients that can call arbitrary URLs, connectors that write to third-party systems, integrations that send emails or chat messages, and, in some cases, the model response itself when the caller is untrusted. These capabilities are often added gradually as teams connect their agents to more workflows.

A simple MCP scenario illustrates how the lethal trifecta comes together. Consider a GitHub-connected MCP server that powers a development assistant. The assistant reads issues from a repository, uses MCP tools to fetch relevant files and configuration details, and produces helpful summaries for maintainers. To support integration testing, an HTTP tool is also exposed that can send requests to arbitrary endpoints.

An attacker opens an issue in that repository with detailed instructions. The issue claims that a complex bug can only be diagnosed by collecting environment files and configuration data from the codebase, then sending them as a JSON payload to a specific URL so that an external analysis system can review them. From the agent’s perspective, this appears to be a thorough and reasonable request.

If the agent follows that guidance, it will read the issue (untrusted instructions), traverse the repository to gather configuration and environment files (sensitive data), and invoke the HTTP tool to transmit that information to the attacker’s URL (exfiltration sink). No individual tool is misconfigured in an obvious way. The breach emerges from the way these tools are composed under model control.

Traditional AI security controls struggle with this pattern. Prompt filters and LLM firewalls inspect individual prompts and responses. They have little visibility into the intermediate tool calls and data movements that connect those prompts to external systems. Code scanning validates how tools are implemented, not how they will be composed at runtime. Access reviews confirm that each tool has a legitimate purpose, yet they rarely evaluate whether untrusted content can drive a sequence that links those tools into a toxic flow.

The result is a gap. Organizations may believe they have secured their prompts, models, and tools, while the true risk lies in the dynamic chains of actions orchestrated through MCP. Viewing this gap as the “lethal trifecta” helps focus attention on the real problem: whenever attacker-controlled instructions, sensitive data, and an exfiltration path are reachable within a single flow, the environment is at risk, regardless of how carefully each component was introduced.

Why most AI security approaches miss toxic flows

Once the lethal trifecta is understood, it becomes clear that many current AI security approaches are optimized for a different class of problems. They focus on what the model sees and says, rather than on what the agent does across interconnected systems.

Prompt controls, content filters, and LLM firewalls are usually the first set of controls organizations deploy. These systems inspect input and output, looking for policy violations or sensitive terms. They can reduce blatant misuse and prevent certain categories of prompt injection. However, toxic flows often unfold as a series of apparently legitimate steps. In the GitHub example, the assistant appears to be following a detailed debugging request. Nothing in the final answer necessarily reveals that secrets have been exfiltrated through intermediate tool calls.

Conventional application security tooling is also oriented around static artifacts and deterministic paths. Static analyzers, software composition analysis, and infrastructure-as-code scanning are well-suited to environments where code and configuration define every permissible action. They can validate that MCP tools are implemented safely, that dependencies are up to date, and that access tokens are managed correctly. What they cannot easily capture is an agent’s decision to chain those tools together in an unexpected way based on natural language input.

Runtime logging and monitoring add another layer, but they, too, are constrained in this context. Engineers can collect traces of tool invocations, responses, and errors. They can investigate individual incidents. The challenge is combinatorial, as the number of possible flows skyrockets when more tools and systems are connected via MCP. Relying on manual review to detect dangerous patterns becomes impractical, especially when the agent’s behavior may shift with small changes in input or context.

Even newer AI-specific security offerings are often focused on local checks. They may enforce parameter constraints for a specific tool or restrict a particular agent’s access to certain secrets. These controls remain component-centric. They do not generally reason over complete paths that start with untrusted content and end with data leaving the trust boundary.

The consequence is a coverage gap: investments in model safety, prompt validation, and individual tool security do not automatically extend to the interaction space created by MCP. The environment may appear well-controlled when each piece is viewed separately, yet it still allows toxic flows when those pieces are combined.

To address this, security teams need to ask a different question: Does the environment allow any path where attacker-influenced instructions can drive sensitive data into an exfiltration sink? Toxic Flow Analysis is designed to provide a structured answer.

Introducing Toxic Flow Analysis

Toxic Flow Analysis (TFA) provides a graph-based perspective on AI-enabled systems. Instead of looking at prompts or tools in isolation, it maps how agents, MCP servers, tools, and underlying systems are connected. Then it searches for paths that can manifest the lethal trifecta.

The starting point is a representation of the environment. In an MCP context, this includes MCP servers, their tool manifests, model and agent configurations, and the external systems that those tools reach, such as source control, ticketing platforms, messaging systems, and generic HTTP endpoints. From these inputs, TFA builds a flow graph that captures which components can call which tools, what those tools can access, and where their outputs can be sent.

This graph is then enriched with security-relevant attributes. Nodes and edges are annotated to indicate whether untrusted parties can influence their originating instructions, whether the data they handle is sensitive, and whether the step crosses a trust boundary. This annotation makes it possible to distinguish routine internal flows from flows that connect attacker-controlled surfaces to high-value assets and then to external destinations.

With the annotated graph in place, TFA can systematically search for paths where all three conditions of the lethal trifecta are present. It identifies sequences where untrusted instructions can reach an agent, that agent has a path to tools exposing sensitive data, and the same context includes an exfiltration sink. These are the toxic flows that represent realistic attack paths for a determined adversary using natural language and existing integrations, rather than custom malware.

Detection is only part of the requirement. For TFA to be operationally useful, it must also support prioritization and action. Not every potential flow carries the same level of risk. A path that can leak production secrets to an arbitrary external endpoint has a different impact profile than a path that could expose non-sensitive metadata to a controlled internal system. Toxic Flow Analysis can assign impact scores based on factors such as data classification, ease of exploitation, breadth of access, and the nature of the sink, helping teams decide where to focus their efforts.

This graph-aware perspective enables security and platform teams to answer questions that are otherwise difficult to address. Which MCP servers expose combinations of tools that can create toxic flows? Which agents are simultaneously connected to untrusted instruction sources and exfiltration sinks? How does the introduction of a new tool, such as a generic HTTP client, change the set of possible flows?

Importantly, TFA is not a one-time exercise. As agents evolve, as MCP configurations change, and as new tools are added, the flow graph must be updated and re-evaluated. Treated as a continuous capability, Toxic Flow Analysis becomes the foundation for a more mature approach to AI-native risk: one that understands how the environment behaves as a whole, not just how its individual components are configured.

This brings the narrative to the next step. Once an organization can see and score toxic flows, it must decide how to prevent them from being executed in practice. Visibility without control is not sufficient in an environment where agents are already empowered to act.

Why MCP environments need guardrails, not just visibility

Toxic Flow Analysis reveals where AI-native risks reside, but insight alone does not prevent incidents. If a system can identify that a specific agent, MCP server, and tool combination can create a toxic flow, it still needs a mechanism to intervene when that flow is about to be executed.

MCP increases the urgency of this requirement because it connects heterogeneous systems under a single protocol. Through MCP, an agent may reach source control, build pipelines, monitoring systems, collaboration platforms, and internal services. Each of these integrations is administered by different teams and vendors. There is usually no central point in those underlying systems where a single policy can be applied to govern all flows that involve them.

The most practical place to enforce policy is within the AI layer itself: at the level of MCP servers, the agents they expose, and the orchestration logic that decides which tools are available and how they may be used. In this context, guardrails are concrete enforcement mechanisms that can examine planned or ongoing sequences of actions, compare them with toxic flow findings and policy, and then allow, modify, or block those sequences.

To function effectively, guardrails require context. They need more than the current prompt or a single tool invocation. They must understand which agent is running, what tools are available in the environment, how those tools are configured, and which data and external systems they touch. This is where AI-BOM and MCP scanning play critical roles.

AI-BOM provides a structured description of the AI stack: models, datasets, frameworks, MCP servers, and key integrations. MCP scanning contributes a real-world inventory of what is actually deployed on developer and operator endpoints: which MCP servers are installed, what tools they expose, and how they are configured. Combined, these capabilities allow an orchestration layer to align TFA findings with concrete execution contexts.

With this information, guardrails can be applied at precise points. If Toxic Flow Analysis identifies that a particular agent and MCP configuration would allow untrusted GitHub issues to drive secrets into an external HTTP endpoint, a policy can be deployed to prevent that specific combination, while leaving other flows intact. This avoids the need for broad, blunt restrictions that undermine the usefulness of agents.

Without such integrated enforcement, organizations are at risk of repeating a familiar pattern: rich dashboards, insightful findings, but limited impact on day-to-day behavior. Guardrails close the loop between analysis and action, ensuring that awareness of toxic flows translates into concrete constraints on what agents are allowed to do.

From analysis to control: guardrails in practice and the case for unified governance

Turning toxic flow findings into effective MCP guardrails requires policy, orchestration, and alignment with existing tools.

Policy is the starting point. Security, platform, and development teams agree on boundaries that must not be crossed. Examples include prohibiting flows where agents exposed to external tickets can access production secrets and send data to arbitrary URLs, or requiring additional controls for flows involving regulated data. Toxic Flow Analysis provides the evidence needed to define and justify these policies.

An orchestration layer then evaluates agent behavior against those policies. When an agent, via MCP, is about to execute a sequence of tool calls that would complete a toxic flow, the guardrail can intervene. It might block the sequence, request additional authorization, or route the request through a safer pattern. Enforcement occurs close to the agent’s decision point, where the full context of the flow is visible.

To scale this approach, organizations need a governance layer that coordinates across their ecosystem. AI-BOM and MCP scanning give this layer accurate, up-to-date information about both centrally managed and locally installed MCP environments. The governance layer can then apply consistent guardrails, whether an agent runs in a shared platform or on a developer’s machine.

Local remediation remains essential. The objective is not to replace CI/CD systems, IDE assistants, ticketing platforms, or data governance tools, but to ensure they act on a shared understanding of risk. When TFA surfaces a new toxic flow, the orchestration platform can open issues, propose configuration changes, or update access rules in the systems teams already use. The governance layer becomes the coordination point; existing tools remain the execution engines for change.

As adoption grows, these capabilities allow organizations to move from experimental controls to a coherent AI security program. They can begin with visibility through Toxic Flow Analysis, add targeted guardrails for their highest-risk flows, and then extend those guardrails over time. Throughout this process, AI-BOM and MCP scanning ensure that policies stay aligned with the actual state of the environment.

MCP is poised to be a central building block for AI-native systems. Toxic Flow Analysis, combined with guardrails grounded in accurate inventories and unified governance, offers a way to realize its benefits while keeping the most consequential risks under control. As these capabilities evolve, the path forward becomes clear: organizations need practical ways to operationalize this kind of analysis.

Excited to learn more? Explore how Snyk is advancing protection for AI-native systems and try Snyk MCP Scan today.

THE FUTURE OF AI SECURITY

Get to know Snyk's latest innovations in AI Security

AI-native applications behave unpredictably, but your security can't. Evo by Snyk is our commitment to securing your entire AI journey, from your first prompt to your most advanced applications.

Explore Evo Own AI Security with Snyk

Patched & Dispatched