5 Architecture Patterns for Running AI Agents in Production
Battle-tested architecture patterns for deploying AI agents at scale. From single-agent setups to multi-agent networks with practical examples.
5 Architecture Patterns for Running AI Agents in Production
Building an AI agent demo is easy. Running one in production - handling real users, managing state reliably, staying secure, and scaling without breaking - is a completely different challenge.
Over the past year, we've seen hundreds of teams deploy AI agents on Oblien. These are the five architecture patterns that keep showing up - the ones that actually work at scale.
Pattern 1: The Persistent Agent
Best for: Personal assistants, coding agents, research bots, automation tools
This is the simplest pattern and where most teams should start.
┌────────────────────────────────┐
│ Persistent Agent │
│ │
│ Agent framework (always-on) │
│ Long-term memory on disk │
│ Restart policy: always │
│ Internet: yes (for LLM API) │
│ │
│ This workspace never stops. │
└─────────────────────────────────┘How it works: Your agent runs as a managed process in a permanent workspace. It has persistent storage for memory and files. It restarts automatically if it crashes. Users interact with it through your application, and the agent maintains context across sessions.
Why it works: Most agents don't need anything fancier. A single workspace with persistent storage and a managed process handles conversation history, tool execution, and state management. Don't over-architect until you hit a real limit.
When to evolve: When you need to isolate code execution from the agent itself, or when you need multiple agents with different capabilities.
Pattern 2: Agent + Sandbox
Best for: Coding assistants, code evaluation platforms, AI tutors, any agent that runs untrusted code
┌──────────────────┐ ┌──────────────────┐
│ Persistent Agent│ │ Sandbox (temp) │
│ │ creates │ │
│ Makes decisions │────────►│ Runs code │
│ Holds memory │ reads │ Air-gapped │
│ Calls LLM │◄────────│ TTL: 60 seconds │
│ │ │ Auto-destroys │
└──────────────────┘ └──────────────────┘How it works: The agent lives in a permanent workspace (Pattern 1), but when it needs to run code, it doesn't do it in its own environment. Instead, it creates a temporary, air-gapped sandbox workspace. The code runs there. Results are read back. The sandbox is destroyed.
Why it works: It separates the agent's brain from its hands. If the code execution goes wrong (infinite loop, malicious package, filesystem corruption), only the disposable sandbox is affected. The agent's memory, credentials, and state are untouched.
The key detail: Sandboxes should be truly isolated - no internet access, no private links to other workspaces, no access to the agent's environment. They receive input, produce output, and that's it.
Pattern 3: Agent + Private Services
Best for: Agents that need databases, search indices, or backend services without exposing them to the internet
┌──────────────┐
│ Agent │
│ (permanent) │
└──┬───────┬───┘
│ │
│ │ Private links (10.x.x.x)
│ │
┌──▼──┐ ┌──▼───────┐
│ DB │ │ Search │
│ │ │ Index │
│ No │ │ No │
│ net │ │ internet │
└─────┘ └──────────┘How it works: The agent workspace connects to service workspaces (database, search, cache) over private networking. Each service workspace has no internet access and no public endpoints. Only the agent can reach them, and only on specific ports.
Why it works: Your agent gets full access to rich data services, but those services are invisible to the internet. A leaked database password is worthless if the database can't be reached from outside the private network.
The security model: Each connection is explicitly declared. The database workspace accepts connections only from the agent workspace, only on port 5432. Even other workspaces on the same account can't reach it.
Pattern 4: Per-User Isolated Agents
Best for: SaaS platforms where each user gets their own AI agent or execution environment
┌─────────────────────┐
│ Your Application │
│ (manages users) │
└──┬───────┬──────┬───┘
│ │ │
┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│User │ │User │ │User │
│ A │ │ B │ │ C │
│ VM │ │ VM │ │ VM │
│ │ │ │ │ │
│Own │ │Own │ │Own │
│disk │ │disk │ │disk │
│Own │ │Own │ │Own │
│key │ │key │ │key │
└─────┘ └─────┘ └─────┘
▲ ▲ ▲
Can't see each otherHow it works: When a user signs up (or starts a session), your backend creates a workspace for them. All the user's activity - code execution, file storage, agent interactions - happens in their workspace. When they're done, the workspace is paused (to save costs) or destroyed (for one-off sessions).
Why it works: Hardware isolation between users. User A can't see User B's data, processes, or network. Not because your code correctly filters by user ID, but because they're in physically separate virtual machines with separate kernels and separate encrypted disks.
Cost management: Pause idle workspaces (resume takes milliseconds). Set TTLs on temporary sessions. Delete churned users' workspaces - cryptographic erasure ensures data is unrecoverable.
Pattern 5: Multi-Agent Network
Best for: Complex workflows requiring reasoning + research + coding + testing + deployment
┌──────────────┐
│ Orchestrator │
│ (permanent) │
└──┬──┬──┬──┬──┘
│ │ │ │
┌──────┘ │ │ └──────┐
│ │ │ │
┌────▼───┐ ┌──▼──▼──┐ ┌───▼────┐
│Research│ │ Coder │ │ Tester │
│ Agent │ │ Agent │ │ Agent │
│ │ │ │ │ │
│Internet│ │ Files │ │Air-gap │
│ yes │ │ & exec │ │only │
└────────┘ └────────┘ └────────┘How it works: A lead orchestrator agent receives tasks, breaks them down, and delegates to specialist agents. Each specialist runs in its own workspace with only the capabilities it needs. The orchestrator coordinates results.
Why it works: Separation of concerns at the infrastructure level. The researcher can browse the web but can't modify code. The coder can write files but can't access the internet. The tester runs in an air-gapped sandbox where malicious test code is harmless.
Communication: The orchestrator creates specialist workspaces with private links back to itself. Specialists send results through the private network. Nothing goes over the internet.
The advanced move: Spin up specialist workspaces on demand, per task. Don't keep them running. Create → delegate → collect result → destroy. This keeps costs low and eliminates stale state.
How to choose your pattern
Start with Pattern 1. Upgrade when you hit a real limit:
| Problem | Pattern to add |
|---|---|
| "Agent errors when executing code corrupt its own environment" | Pattern 2 (sandboxes) |
| "Agent needs a database but I don't want to expose it" | Pattern 3 (private services) |
| "Users can see each other's data" | Pattern 4 (per-user workspaces) |
| "Tasks are too complex for one agent" | Pattern 5 (multi-agent) |
| "Nothing's broken, it just works" | Stay on Pattern 1 |
Most production agents are Pattern 2 or 3. Multi-agent (Pattern 5) is powerful but has more moving parts - only use it when the task genuinely requires specialization.
Implementation tips
Don't premature-optimize
A single workspace with a well-designed agent handles 90% of use cases. Add complexity only when you have a specific problem to solve.
Destroy temporary workspaces aggressively
Don't accumulate sandbox workspaces. Create, use, destroy. Set TTLs. Clean up failed workspaces. The 130ms boot time means there's zero cost to creating fresh environments.
Use restart policies
Set restart policy to "always" for permanent agents. If they crash, they restart. No cron jobs, no health checks, no process managers.
Monitor per-workspace metrics
Each workspace has its own CPU, memory, disk, and network metrics. Watch for anomalies - an agent suddenly using 10x normal CPU might be in a broken loop.
Keep secrets in environment variables
Don't hardcode API keys in code. Set them in workspace environment variables through the dashboard or SDK. They're encrypted at rest and available to the agent process.
Summary
Production AI agents need more than an API key. They need isolated infrastructure with the right architecture:
- Persistent Agent - Simple, always-on, handles most use cases
- Agent + Sandbox - Safe code execution separate from the agent
- Agent + Private Services - Databases and services without internet exposure
- Per-User Isolated - Hardware isolation between users
- Multi-Agent Network - Specialist agents for complex tasks
Start simple. Add patterns when you need them. Every workspace is a ~130ms microVM, so the infrastructure adapts as fast as your requirements change.
How to Add AI-Powered Web Search and Data Extraction to Your App
Add web search, content extraction, and site crawling to your product with managed APIs. Get structured data from any page in seconds.
How to Auto-Scale AI Agent Workers from Zero to Thousands
Build an auto-scaling system for AI agents that scales to thousands of workers and drops to zero when idle. No Kubernetes required.