Implementation

The Complete Guide to Hosting AI Agents in Production

Ryan Lynn

Ryan Lynn

Founder, OuterEdge

December 31, 202515 min read

When most people think about using Claude or other large language models, they imagine a simple request-response cycle: send a message, get an answer, done. This stateless pattern works perfectly for chatbots and simple Q&A applications.

But AI agents—systems that can execute code, edit files, use tools, and work on complex tasks over hours or days—require something fundamentally different. They need a persistent environment where they can operate. They need hosting.

Understanding how to host AI agents is becoming a critical capability for enterprises deploying production AI systems. According to Anthropic's Agent SDK documentation, the SDK "can run custom agents in hosted sandboxes using a range of providers, enabling scalable agentic workloads that operate securely in the cloud."

This guide explains, from first principles, why AI agents need hosting, what the key concepts mean, and how to choose between the four primary hosting patterns for your enterprise use cases.

Why AI Agents Need Hosting

The Stateless Problem

Traditional API calls to Claude work like this: you send a message, Claude responds, the connection closes, and Claude "forgets" everything. Each request is independent—like texting someone with amnesia who needs you to re-explain the context every time.

For simple Q&A, this works fine. But consider what happens when you ask an agent to:

  • Clone a GitHub repository and fix bugs
  • Analyze a set of financial documents and build a model
  • Monitor your CRM for warning signs and alert your team
  • Process hundreds of invoices and extract structured data

These tasks require the agent to remember what it's working on, execute code on a real computer, read and write files in a persistent workspace, and use tools that interact with real systems.

That computer needs to live somewhere. That's what "hosting" means.

Think of it this way: Claude (the brain) lives at Anthropic's servers. But the body—the container with tools, file systems, and execution capabilities—lives wherever you host it.

Key Concepts Explained Simply

Before diving into hosting patterns, four foundational concepts require clarity.

Containers

Think of a container like a virtual computer inside your computer. If your server is an apartment building, each container is an apartment with its own rooms (isolated file system), utilities (CPU, memory allocation), locks (security boundaries), and address (network configuration).

Containers matter for AI agents because they provide:

  • Isolation: If one agent goes haywire, it can only affect its own container
  • Reproducibility: Every container starts from the same template
  • Scalability: Spin up thousands of containers in seconds
  • Cost efficiency: Far cheaper than physical servers

Sandboxing

A sandbox is a restricted environment where code can run without harming the host system—like a children's sandbox at a playground. Kids can play freely inside, build whatever they want, but they can't affect anything outside.

For AI agents, sandboxing means the agent can execute arbitrary code and read/write files, but only within its designated space. It cannot access your company's database or make network calls unless you explicitly allow it.

State

State is simply "what's currently true" at any moment. For an AI agent, state includes what files exist in the workspace, what's in those files, what commands have been run, the conversation history, and what the agent is currently working on.

Stateful systems remember this information. Stateless systems forget everything between requests.

Sessions

A session is a bounded period of interaction with maintained state. When you log into Amazon and browse products, your cart persists across pages—that's session state. For agents, a session starts when you assign a task and ends when the task completes, maintaining workspace, memory, and context throughout.

The Four Hosting Patterns

Anthropic's documentation describes four primary patterns for hosting AI agents. Each serves different enterprise needs.

Pattern 1: Ephemeral Sessions

What it means: Create a fresh container for each task. When the task completes, destroy the container entirely.

Analogy: Like renting a hotel room. You check in, do your work, check out. The room gets cleaned and reset for the next guest.

How it works:

  1. User request comes in
  2. Spin up new container (2-5 seconds)
  3. Run agent task to completion
  4. Return results to user
  5. Destroy container completely

Best for:

  • One-off tasks with clear completion criteria
  • Maximum security requirements (nothing persists)
  • Tasks where state preservation isn't needed

Enterprise Examples:

Investment Banking: CIM Analysis. An analyst uploads a Confidential Information Memorandum. The agent extracts key metrics, builds a financial model, generates a summary. Results delivered, container destroyed. No risk of data leakage between deals.

Legal: Contract Review. A lawyer uploads a contract for review. The agent identifies key terms, risks, and deviations from standard. Produces redline suggestions. Container destroyed—client A's contract can never accidentally mix with client B's.

Document Processing: Invoice Extraction. User uploads a stack of invoices. Agent extracts line items, totals, vendor information. Exports to structured format. No persistent storage of financial data.

Cost characteristics: You only pay for compute while the task runs. Most cost-efficient for sporadic usage, though cold starts add 2-5 seconds of latency.

Pattern 2: Long-Running Sessions

What it means: Keep containers running continuously. They're always ready, always maintaining state.

Analogy: Like owning a home. It's always there, always yours. More expensive but always available.

How it works:

  1. Container starts and stays running indefinitely
  2. Agent monitors for events and triggers
  3. When events occur, agent takes action
  4. Container stays running, waiting for next event

Best for:

  • Proactive agents that monitor and react
  • Agents that serve content or APIs
  • High-frequency interactions where cold starts are unacceptable
  • Agents maintaining complex state over long periods

Enterprise Examples:

Customer Success: Proactive Account Monitoring. An agent continuously monitors CRM for warning signs—usage drops, support tickets, engagement changes. Proactively alerts CSMs when intervention is needed. Maintains deep context about account history. Never "forgets" what's happening with each account.

Sales: Email Response Agent. An agent monitors inbox continuously, immediately responds to inbound leads, tracks conversation threads over days or weeks. Maintains context: "This is the third follow-up with John who asked about pricing last Tuesday."

DevOps: Site Reliability Agent. Monitors dashboards and alerts 24/7. When incidents occur, automatically investigates and can take remediation actions. Maintains knowledge of recent deployments and known issues.

Cost characteristics: You pay for compute 24/7 regardless of usage. Approximately $0.05/hour × 720 hours = ~$36/month per container just for compute. No cold start delays. Best for high-frequency, consistent usage patterns.

Pattern 3: Hybrid Sessions

What it means: Containers are ephemeral, but state is saved and restored. You get the best of both worlds.

Analogy: Like a gym locker. You don't own space in the gym 24/7. But you have a locker where your stuff stays. When you arrive, your stuff is there, and you continue where you left off.

This pattern represents the optimal balance for most enterprise use cases—cost efficiency of ephemeral containers with the user experience of persistent state.

How it works:

  1. User request comes in
  2. Spin up container
  3. Load saved state (conversation history, files, context)
  4. Continue working with full context from before
  5. When idle for X minutes, save state to database/storage
  6. Destroy container (but state is safe)
  7. Next request repeats from top

Best for:

  • Tasks spanning multiple user sessions
  • Intermittent interaction patterns
  • Cost optimization with state preservation
  • Enterprise workflows where context matters but 24/7 uptime isn't needed

Enterprise Examples:

Private Equity: Deal Pipeline Agent. An agent tracks multiple deals over weeks or months. Associates interact when they need help with specific deals. Full context preserved: "Last week you asked me to analyze Company X's margins—here's what I found." Container spins down between interactions to save cost. State persists in database.

Sales: Opportunity Research Agent. Sales rep asks for research on a prospect company. Agent does deep research, compiles report. Rep comes back days later: "What else did you find about their tech stack?" Agent rehydrates context, continues seamlessly.

Customer Support: Ticket Investigation Agent. Support ticket comes in requiring investigation. Agent investigates, asks clarifying questions. Customer responds hours or days later. Agent picks up exactly where it left off, with full context.

Cost characteristics: Only pay for compute during active work. Small storage cost for persisted state (~$0.02/GB/month). Best balance of cost and user experience. Ideal for enterprises with unpredictable interaction patterns.

Pattern 4: Multi-Agent Sessions

What it means: Multiple agent processes run inside the same container, potentially collaborating with each other.

Analogy: Like a shared office space. Multiple workers (agents) in one room, able to collaborate and share resources.

How it works:

  1. Single container runs continuously
  2. Multiple agent processes inside
  3. Agents can communicate with each other
  4. Shared file system enables collaboration

Best for:

  • Multi-agent simulations
  • Agents that need close collaboration
  • Research and experimentation
  • Specialized architectures where agent separation isn't needed

Enterprise Examples:

M&A Due Diligence Multi-Agent Team. A "Financial Analysis Agent" reviews financial statements. A "Legal Risk Agent" reviews contracts. A "Market Research Agent" analyzes competitive position. All share the same data room, can reference each other's work. An orchestrator agent coordinates and synthesizes findings.

Trading Strategy Simulation. Multiple "trader" agents with different strategies interact in a simulated market. Used for strategy backtesting and optimization.

Caution: This pattern requires careful coordination to prevent agents from interfering with each other. Most enterprise use cases are better served by Patterns 1-3.

Practical Considerations

Choosing a Sandbox Provider

Several providers offer containerized environments suitable for AI agents:

| Provider | Strengths | Best For |

|----------|-----------|----------|

| E2B | Purpose-built for AI agents, fast cold starts | Quick prototyping, standard agent workloads |

| Modal | Excellent Python support, generous free tier | Python-heavy workloads, data processing |

| Fly.io | Flexible, good for stateful applications | Hybrid sessions, complex networking |

| Cloudflare | Edge deployment, global distribution | Low-latency needs, global users |

| Self-hosted | Full control, compliance-friendly | Strict security/compliance requirements |

For most enterprise deployments, E2B or Modal work well for standard workloads, Fly.io excels for hybrid session patterns, and self-hosted Docker/Kubernetes makes sense for clients with strict compliance requirements.

Container Sizing

Anthropic recommends a baseline of 1GiB RAM, 5GiB disk, and 1 CPU. Adjust based on workload:

| Use Case | RAM | Disk | CPU |

|----------|-----|------|-----|

| Simple chat agent | 512MB | 2GB | 0.5 |

| Code analysis/generation | 1-2GB | 5-10GB | 1 |

| Document processing (PDF, DOCX) | 2GB | 10GB | 1 |

| Data analysis with large datasets | 4-8GB | 20GB+ | 2 |

| Multi-agent orchestration | 4GB+ | 10GB | 2+ |

Understanding Costs

Two cost components dominate:

1. Container compute: ~$0.05/hour baseline

  • Ephemeral: Only when running
  • Long-running: 24/7
  • Hybrid: Only when active + storage for state

2. Claude API tokens: This is typically 70-90% of total cost

  • Input tokens: ~$3/million (Sonnet)
  • Output tokens: ~$15/million (Sonnet)

Example: For a deal analysis agent using the hybrid pattern:

  • Average task: 50K input tokens, 10K output tokens
  • Per-task API cost: ~$0.30
  • Per-task compute (5 min): ~$0.004
  • Total per task: ~$0.30 (API dominates)

The strategic implication: Optimize prompts and context management first. Container costs are relatively minor compared to API usage.

Security Considerations

For enterprise deployment, ensure:

  • Network isolation: Containers can only reach allowed endpoints
  • Credential management: Secrets injected at runtime, never hardcoded
  • Audit logging: Record all agent actions for compliance
  • Data boundaries: Ensure client A's data cannot leak to client B

A Framework for Choosing

When deciding which hosting pattern to use, consider four factors:

Interaction frequency: How often will users interact with the agent?

  • Sporadic → Ephemeral or Hybrid
  • Continuous → Long-running

State requirements: Does context need to persist between interactions?

  • No persistence needed → Ephemeral
  • Persistence required → Hybrid or Long-running

Security requirements: How sensitive is the data?

  • Maximum isolation → Ephemeral (fresh container per task)
  • Standard enterprise → Hybrid or Long-running with proper controls

Cost sensitivity: What's the budget constraint?

  • Minimize cost → Ephemeral or Hybrid
  • Minimize latency → Long-running

For most enterprise use cases, Hybrid Sessions offer the optimal balance—the cost efficiency of ephemeral containers with the user experience benefits of persistent state.

The Bottom Line

AI agents represent a fundamental shift from stateless API calls to stateful, persistent systems that can execute code, use tools, and maintain context over extended periods.

Understanding hosting patterns isn't optional for enterprise AI deployment—it's foundational. The choice between ephemeral, long-running, hybrid, or multi-agent patterns directly impacts:

  • Security posture: How isolated is each task or client?
  • User experience: How quickly can agents respond? Do they remember context?
  • Cost structure: What's the monthly spend per agent?
  • Scalability: How easily can you deploy agents across the organization?

The four hosting patterns each serve legitimate enterprise needs:

  1. Ephemeral for maximum security and one-off tasks
  2. Long-running for proactive monitoring and high-frequency use
  3. Hybrid for the optimal balance of cost and experience
  4. Multi-agent for specialized collaboration scenarios

As AI agents become central to enterprise operations, the organizations that understand these architectural choices will deploy agents more effectively, more securely, and more cost-efficiently than those that don't.


At OuterEdge, we help enterprises design and deploy production AI agent systems. From selecting the right hosting pattern to implementing secure, scalable architectures, we bring deep expertise in turning AI capabilities into operational reality. If you're building AI agents for your organization, book a strategy call to discuss your requirements and architecture.


Tags

ai-agentsclaude-sdkenterprise-aicontainershostingmcp-serversai-infrastructure

Share this article

Ready to transform your organization?

Book a free strategy call to discuss how AI can drive results for your business.

Book a Strategy Call