Agent Stack Crystallizes: OpenAI SDK + Monty + MCP = Production Code-Executing Agents (Q2 2026)

Three independent infrastructure pieces—OpenAI's Agents SDK, Pydantic's Monty sandboxed VM (50,000x faster than Docker), and Anthropic's MCP protocol (75+ connectors)—have assembled the first complete production stack for code-executing AI agents. The $8.5B agent market finally has its missing safety layer.

TL;DRBreakthrough 🟢

•OpenAI Agents SDK provides production-grade handoff-based orchestration supporting 100+ LLMs (not just OpenAI models)
•Pydantic's Monty (sub-microsecond sandboxed Python VM in Rust) is 50,000x faster than Docker for agent code execution
•Monty uses deny-by-default interpreter-level security: no filesystem, network, or system calls without explicit MCP connectors
•MCP protocol now has 75+ connectors, donated to Linux Foundation; A2A has 150+ supporting organizations
•Six competing production agent frameworks (OpenAI, Anthropic, Google, LangGraph, CrewAI, Mastra) now converge on MCP + A2A for interoperability

agentssdkmontymcpsandbox4 min readMar 29, 2026

High Impact⚡Short-termML engineers building agent systems should immediately evaluate Monty for the code-between-tool-calls pattern. Adopt MCP for tool integration regardless of which SDK you choose. For enterprise deployments, Monty's state serialization + SDK tracing provides the audit trail compliance teams require.Adoption: The complete stack (SDK + Monty + MCP) is available now for early adopters. Monty is v0.0.3 and missing some Python features (classes, generators). Production-grade Monty is 3-6 months away. Enterprise adoption of full code-executing agent stacks is Q3-Q4 2026.

Cross-Domain Connections

Monty provides sub-microsecond sandboxed Python execution, 50,000x faster than Docker→OpenAI Agents SDK provides handoff-based orchestration with built-in tracing for 100+ LLMs

Monty fills the exact gap the agent SDKs leave unspecified—secure code execution between tool calls. The combined stack enables 'code mode' agents that are both powerful and safe at production latency.

MCP has 75+ connectors, donated to Linux Foundation; A2A has 150+ supporting organizations→Six competing agent SDKs (OpenAI, Anthropic, Google, LangGraph, CrewAI, Mastra) fragment the market

The framework war is a distraction—protocol convergence on MCP + A2A means the tool and communication layers are becoming interoperable regardless of SDK choice.

Monty's state serialization enables durable agent workflows that survive process restarts→61% of business leaders deploying agents; Gartner projects 15% of daily decisions automated by 2028

Enterprise adoption requires durability and auditability. Monty's state serialization + SDK built-in tracing provide both.

Key Takeaways

OpenAI Agents SDK provides production-grade handoff-based orchestration supporting 100+ LLMs (not just OpenAI models)
Pydantic's Monty (sub-microsecond sandboxed Python VM in Rust) is 50,000x faster than Docker for agent code execution
Monty uses deny-by-default interpreter-level security: no filesystem, network, or system calls without explicit MCP connectors
MCP protocol now has 75+ connectors, donated to Linux Foundation; A2A has 150+ supporting organizations
Six competing production agent frameworks (OpenAI, Anthropic, Google, LangGraph, CrewAI, Mastra) now converge on MCP + A2A for interoperability

The Orchestration Layer Standardized

Agent frameworks have matured rapidly. OpenAI's Agents SDK (March 2026) evolved from the experimental Swarm framework with a production-grade handoff architecture: agents transfer execution control while carrying conversation context. Google's ADK followed with deeper multimodal capabilities. Anthropic's SDK provides the deepest MCP integration. Six production-grade frameworks now compete (OpenAI SDK, Anthropic SDK, Google ADK, LangGraph, CrewAI, Mastra), with 61% of business leaders already deploying agents and Gartner projecting 15% of daily business decisions automated by agents by 2028.

The Real Breakthrough: Safe Code Execution at Production Latency

The critical gap in agent systems has always been safe code execution. Pydantic's Monty, released February 6, 2026, solves this at the right abstraction layer: a from-scratch Python bytecode VM written in Rust that starts in 4 microseconds cold (sub-microsecond hot). This is a 50,000x improvement over Docker (195ms), 1,000x over third-party sandbox services (1,000ms), and 2,800x over Pyodide (2,800ms).

For agentic workloads—thousands of short code executions per session—this latency difference is transformative. The security model is deny-by-default at the interpreter level: open(), __import__(), eval(), exec() literally do not exist. No filesystem, no network, no system calls unless explicitly enabled through MCP connectors.

Agent Code Execution Sandbox Startup Latency (ms)

Monty's sub-microsecond startup enables thousands of code executions per agent session at production latency

Source: Pydantic official benchmark comparison

Why This Enables the Most Powerful Agent Pattern

The most powerful agentic pattern is 'code mode'—where LLMs write Python that calls tools as functions rather than making sequential tool calls via API. This approach is dramatically more flexible and token-efficient. An agent could write: results = [analyze_paper(url) for url in arxiv_search("attention mechanisms")] in a single step, rather than sequentially calling arxiv_search, iterating, then calling analyze_paper repeatedly.

This approach was previously untenable without heavy sandboxing overhead. Monty eliminates that overhead while providing stronger security guarantees than OS-level virtualization. Running arbitrary LLM-generated code in production was a compliance nightmare. Monty makes it a solved problem.

The Protocol Layer: The Sleeper Story

Anthropic's MCP (Model Context Protocol) was donated to the Linux Foundation in December 2025 and now has 75+ connectors. Google's A2A (Agent-to-Agent) protocol has 150+ supporting organizations. Both are being adopted across competing SDKs. This means the framework war (OpenAI vs Anthropic vs Google) is a developer experience competition, not a protocol lock-in battle. Agents built on any SDK can interoperate at the tool layer (MCP) and the agent communication layer (A2A).

Six competing frameworks would normally fragment the market into incompatible ecosystems. Instead, protocol convergence creates a situation where the real competition is on orchestration simplicity, observability, and ecosystem breadth—not on whether your agents can talk to your tools.

The Assembled Stack for Q2 2026

Agent orchestration (any of 6 SDKs) provides the control plane. MCP provides the tool integration layer (databases, APIs, file systems). Monty provides the secure code execution layer. A2A provides inter-agent communication. Built-in tracing (all major SDKs) provides the observability layer. Monty's state serialization to bytes enables durable agent workflows that survive process restarts—a critical enterprise requirement.

This is the infrastructure that enterprises need before automating business decisions. The stack is real and shipping.

Market Context: $8.5B in 2026

The market sizing is aggressive but grounded: $8.5B in 2026, $35B by 2030. 56% of teams report improved scalability with agent orchestration. Google ADK alone has 17,800 GitHub stars and 3.3 million monthly downloads. The infrastructure is real and shipping.

AI Agent Market Adoption (2026)

Agent deployment has crossed from experimentation to production at enterprise scale

$8.5B

Market Size 2026

61%

Leaders Deploying Agents

75+

MCP Connectors

150+ orgs

A2A Protocol Supporters

Source: Gartner / Deloitte / Google / Anthropic

The Contrarian Case: Fragmentation and Limitations

Framework fragmentation is a genuine problem. Six competing production frameworks in 12 months creates 'decision paralysis for enterprise buyers.' The interoperability story (MCP + A2A) is aspirational—in practice, switching costs between SDKs are real. Monty is v0.0.3 with no class definitions, match statements, or standard library modules. HackerNews critics noted that 'the real power of terminal agents depends on network/filesystem access that Monty deliberately removes.'

Enterprise agents that cannot access databases, APIs, or filesystems are toys, not tools.

What Critics Are Missing: The Security-Capability Split

Monty is designed for the code-between-tool-calls pattern, not general-purpose computation. The agent SDK provides the tool layer (via MCP) for filesystem/API/database access. Monty provides the computation layer between tool calls. Together, they create a security model where the LLM can compute freely but can only interact with the outside world through whitelisted, logged, auditable MCP connectors.

This is actually a stronger security posture than Docker containers, where escape vulnerabilities are regularly discovered. The LLM's code is sandboxed. The LLM's tool access is audited. This is the compliance layer enterprises need.

What This Means for ML Engineers

If you're building agent systems, immediately evaluate Monty for the code-between-tool-calls pattern. Adopt MCP for tool integration regardless of which SDK you choose—protocol-level interoperability protects against framework switching costs. For enterprise deployments, Monty's state serialization + SDK tracing provides the audit trail compliance teams require before automating business decisions.

The framework war is a distraction. The real story is protocol convergence at the tool and communication layers. Build your systems expecting that you might switch agent frameworks in 2-3 years. MCP + A2A make that switching cost bearable.

Related Across Domains

cryptoBearish 🔴