Frameworks vs. Runtimes: The Next Shift in AI Agent Infrastructure

If you started building AI agents in 2024, you probably picked a framework. LangChain, CrewAI, AutoGen, maybe OpenAI’s Agents SDK. You got tool calling working in an afternoon, had a demo by end of day, and felt great about it.

Then you tried to put it in production.

That’s when you discovered the gap. Not in the model’s intelligence or the framework’s abstractions, but in everything between “it works on my laptop” and “it runs reliably for real users.” Your three-agent pipeline that cost $0.12 per run locally starts looping in production and burns through $200 in API calls overnight because nothing enforces a budget. The hosting, the sandboxing, the UI, the scheduling, the audit trails, the cost controls — the 80% of the work that frameworks were never designed to handle.

This isn’t a failure of frameworks. It’s a category distinction the industry is only now making explicit.

The Framework Explosion

The AI agent framework space has grown remarkably. LangGraph sees tens of millions of monthly PyPI downloads. CrewAI has tens of thousands of GitHub stars. Microsoft unified AutoGen and Semantic Kernel into a single Agent Framework SDK. OpenAI, Google, and AWS have all shipped their own SDKs.

Developers have more choices than ever for the design-time experience of building agents: defining tool schemas, orchestrating multi-step workflows, managing conversation memory, and wiring up multi-agent patterns.

But a curious pattern has emerged. Despite this abundance of frameworks, only a small fraction of agent projects reach production with full security approval. The demo-to-production chasm is so consistent that experienced teams now budget the initial working prototype as 20% of total effort. Production hardening is the other 80%.

What Frameworks Actually Give You

To be clear, frameworks are genuinely useful. They solve real problems:

Abstraction over models. Switch between Claude, GPT-4, Gemini without rewriting your tool-calling logic. Type-safe schemas for tool definitions. Streaming support across providers.

Orchestration patterns. ReAct loops, plan-and-execute, multi-agent delegation, supervisor architectures. These are hard to get right from scratch, and frameworks encode best practices.

Memory and state. Conversation buffers, summary memory, vector-backed retrieval. The building blocks of agents that remember context across interactions.

Observability hooks. Callbacks, tracing, and integration with tools like LangSmith and Weights & Biases for debugging agent behavior.

This is valuable. Nobody should have to reinvent ReAct from first principles.

The 80% Problem

Here’s what frameworks don’t give you:

Nowhere to run. No framework ships a hosting story. You need to provision servers, configure auto-scaling, manage deployments, and handle zero-downtime updates. Every team solves this independently.

No security model. OWASP published a Top 10 for Agentic Applications in 2026, based on real production incidents. Frameworks treat security as your problem. Code execution without sandboxing, tool access without access controls, data handling without encryption. When your agent can browse the web and execute code, “just be careful” isn’t a security strategy.

No UI layer. Agents that only produce text are limited agents. Real applications need interactive tables, approval dialogs, forms, progress indicators. Frameworks give you a chat completion; building the frontend is your job.

No scheduling or automation. Agents that only respond to user messages are reactive, not autonomous. Cron triggers, event-driven execution, webhook-based activation, proactive monitoring-these all require custom infrastructure.

No cost controls. Multi-agent systems routinely cost 5-10x what single agents cost. A supervisor agent that delegates to three sub-agents, each making tool calls, can generate hundreds of LLM roundtrips per task. Without token budgets, execution limits, and cost-aware model routing, one stuck loop on a Friday evening can cost more than your entire month of development.

No audit trail. In regulated environments, you need to know exactly what every agent did, when, and why. Frameworks log to stdout. Production needs a proper journal.

Each of these is solvable individually. But solving all of them simultaneously, for every agent you deploy, is an infrastructure problem, not a framework problem.

The Emerging Taxonomy

The industry is starting to formalize this distinction. Analytics Vidhya published a three-layer taxonomy in late 2025 that maps neatly to what practitioners are discovering:

Frameworks handle the design-time experience. They’re libraries you import that give you abstractions for building agent logic. LangGraph, CrewAI, OpenAI Agents SDK-these are frameworks.

Runtimes handle the operational layer. They’re where agents actually execute, with managed compute, security boundaries, persistence, observability, and deployment infrastructure built in.

Harnesses handle validation. They test whether your agents actually work correctly before and after deployment.

Harrison Chase, CEO of LangChain, uses this same framing explicitly. LangChain is the abstraction layer, LangGraph is moving toward being a runtime, and evaluation tools are the harness. The layers are complementary, not competing.

This isn’t just academic taxonomy. It reflects a real architectural boundary. Frameworks are imported as dependencies. Runtimes are infrastructure you deploy to (or that’s managed for you). The concerns are fundamentally different.

What a Runtime Provides

If frameworks answer “how do I define my agent’s behavior?”, runtimes answer “where does my agent live and how does it operate safely?”

A purpose-built agent runtime provides:

Managed compute with security boundaries. Not just “a server to run on,” but sandboxed execution environments where agent-generated code runs in isolation. WebAssembly sandboxes for fast, lightweight operations. Container sandboxes for heavier workloads. Persistent machines for long-running processes. Each with resource limits, filesystem isolation, and network controls.

Built-in capabilities. Web browsing, file operations, HTTP clients, code execution, search, knowledge retrieval-available as managed services, not libraries you install and host yourself.

Tool extensibility with governance. Agents need access to external services, but that access needs authentication, rate limiting, and audit trails. A runtime provides a governed layer between agents and the outside world.

Persistent memory and state. Not just an in-memory buffer, but durable storage across sessions. Working memory for immediate context, message journals for complete history, summarized memory for long-term recall.

Scheduling and triggers. Cron-based schedules, event-driven triggers, webhook activations. Agents that act proactively, not just reactively.

Frontend rendering. Agents that produce interactive UI components, not just text. Tables, charts, forms, approval dialogs-rendered in the conversation without requiring frontend engineering.

Agent discovery and coordination. Protocols like MCP and A2A allow agents to find tools and find each other. A runtime makes these protocols operational with managed endpoints, authentication, and routing.

Cost controls and observability. Token budgets, execution quotas, cost-aware model routing, and complete audit trails. Every action logged, every cost tracked.

Why This Shift Is Happening Now

Three forces are converging:

Protocol standardization. The Model Context Protocol now has over 10,000 registered servers and is governed by a multi-vendor foundation. A2A (Agent-to-Agent) has 150+ participating organizations, Noorle among them. AG-UI is emerging for agent-to-frontend communication. These protocols create a common interface layer that runtimes can build on.

Enterprise demand. Gartner predicts 40% of enterprise applications will feature AI agents by the end of 2026, up from less than 5% in 2025. Enterprises don’t adopt frameworks-they adopt platforms. They need managed infrastructure with security, compliance, and governance built in.

Model commoditization. When every major provider offers capable models, the differentiator shifts from “which model” to “what infrastructure surrounds it.” Smart routing across providers, cost optimization, and fallback strategies become runtime concerns, not application concerns.

Frameworks and Runtimes Work Together

This isn’t frameworks versus runtimes. It’s frameworks on runtimes.

A well-designed runtime exposes its capabilities through open protocols. If you’ve built agent logic in LangGraph or CrewAI, you should be able to connect to the runtime’s tools, compute, and storage via MCP — using it as the operational layer without abandoning your existing code.

The alternative-building all of that infrastructure yourself-is how most teams currently operate. It’s why the 80% production hardening tax exists. It’s why so many promising agent demos never ship.

The Decision Framework

When evaluating how to build agents, consider which layer you’re actually solving for:

Concern	Framework	Runtime
Tool schemas and calling	Yes	Yes
Multi-agent orchestration	Yes	Yes
Prompt and model configuration	Yes	Yes
Secure code execution	-	Yes
Hosting and deployment	-	Yes
Authentication and access control	-	Yes
Scheduling and triggers	-	Yes
UI rendering	-	Yes
Cost controls and budgets	-	Yes
Audit trails and compliance	-	Yes
Agent discovery (A2A)	Some	Yes
Tool discovery (MCP)	Some	Yes

The top three rows are table stakes — both layers handle them. If most of your unsolved problems are in the bottom rows, you’re looking for a runtime, not another framework.

What to Look for in a Runtime

If you’re evaluating runtimes — whether Noorle or something else — here’s a checklist:

Sandboxed compute. Can agents execute code without risking your infrastructure? Look for WASM, container, or VM isolation with resource limits.
Built-in capabilities. Does the runtime ship with browser access, file operations, code execution, and search? Or are you installing and hosting those yourself?
Governed tool access. Can you control which agents access which tools, with authentication, rate limiting, and audit trails?
Durable memory. Is conversation state persisted across sessions, or does it live in a Python variable that dies with the process?
Scheduling and triggers. Can agents run on a schedule, react to events, or activate on webhooks — without you building a cron service?
Frontend rendering. Can agents produce interactive UI (tables, forms, approvals), or are you building a separate frontend?
Agent discovery. Can agents find and coordinate with each other via standard protocols (MCP, A2A)?
Cost controls. Are there token budgets, execution quotas, and cost-aware routing? What happens when an agent loops?
Framework compatibility. Can frameworks like LangGraph or CrewAI consume the runtime’s capabilities via MCP, or is it a closed system?

Where Do You Go from Here?

The framework you chose is probably fine. The question is: what’s running underneath it?

If you’re spending more time on deployment, security, and operational tooling than on agent logic, the gap isn’t in your framework — it’s in the missing runtime layer. That’s the 80% problem, and it’s worth evaluating separately.

A few questions worth asking your team:

What production gaps keep coming up that your framework doesn’t address?
How much of your engineering time goes to infrastructure vs. agent behavior?
If your agent loops at 2am, what stops it?

The answers usually point to the same place.

This is why we built Noorle — as a runtime, not a framework. If you’re hitting the 80% problem, take a look or read the docs.

Key Features