Oblien
Architecture

5 Architecture Patterns for Running AI Agents in Production

Battle-tested architecture patterns for deploying AI agents at scale. From single-agent setups to multi-agent networks with practical examples.

Oblien Team profile picture
Oblien Team
1 min read

5 Architecture Patterns for Running AI Agents in Production

Building an AI agent demo is easy. Running one in production - handling real users, managing state reliably, staying secure, and scaling without breaking - is a completely different challenge.

Over the past year, we've seen hundreds of teams deploy AI agents on Oblien. These are the five architecture patterns that keep showing up - the ones that actually work at scale.


Pattern 1: The Persistent Agent

Best for: Personal assistants, coding agents, research bots, automation tools

This is the simplest pattern and where most teams should start.

┌────────────────────────────────┐
│        Persistent Agent         │
│                                 │
│  Agent framework (always-on)    │
│  Long-term memory on disk       │
│  Restart policy: always         │
│  Internet: yes (for LLM API)   │
│                                 │
│  This workspace never stops.    │
└─────────────────────────────────┘

How it works: Your agent runs as a managed process in a permanent workspace. It has persistent storage for memory and files. It restarts automatically if it crashes. Users interact with it through your application, and the agent maintains context across sessions.

Why it works: Most agents don't need anything fancier. A single workspace with persistent storage and a managed process handles conversation history, tool execution, and state management. Don't over-architect until you hit a real limit.

When to evolve: When you need to isolate code execution from the agent itself, or when you need multiple agents with different capabilities.


Pattern 2: Agent + Sandbox

Best for: Coding assistants, code evaluation platforms, AI tutors, any agent that runs untrusted code

┌──────────────────┐         ┌──────────────────┐
│  Persistent Agent│         │  Sandbox (temp)   │
│                  │ creates │                   │
│  Makes decisions │────────►│  Runs code        │
│  Holds memory    │ reads   │  Air-gapped       │
│  Calls LLM      │◄────────│  TTL: 60 seconds  │
│                  │         │  Auto-destroys    │
└──────────────────┘         └──────────────────┘

How it works: The agent lives in a permanent workspace (Pattern 1), but when it needs to run code, it doesn't do it in its own environment. Instead, it creates a temporary, air-gapped sandbox workspace. The code runs there. Results are read back. The sandbox is destroyed.

Why it works: It separates the agent's brain from its hands. If the code execution goes wrong (infinite loop, malicious package, filesystem corruption), only the disposable sandbox is affected. The agent's memory, credentials, and state are untouched.

The key detail: Sandboxes should be truly isolated - no internet access, no private links to other workspaces, no access to the agent's environment. They receive input, produce output, and that's it.


Pattern 3: Agent + Private Services

Best for: Agents that need databases, search indices, or backend services without exposing them to the internet

┌──────────────┐
│    Agent     │
│  (permanent) │
└──┬───────┬───┘
   │       │
   │       │ Private links (10.x.x.x)
   │       │
┌──▼──┐ ┌──▼───────┐
│ DB  │ │ Search   │
│     │ │ Index    │
│ No  │ │ No       │
│ net │ │ internet │
└─────┘ └──────────┘

How it works: The agent workspace connects to service workspaces (database, search, cache) over private networking. Each service workspace has no internet access and no public endpoints. Only the agent can reach them, and only on specific ports.

Why it works: Your agent gets full access to rich data services, but those services are invisible to the internet. A leaked database password is worthless if the database can't be reached from outside the private network.

The security model: Each connection is explicitly declared. The database workspace accepts connections only from the agent workspace, only on port 5432. Even other workspaces on the same account can't reach it.


Pattern 4: Per-User Isolated Agents

Best for: SaaS platforms where each user gets their own AI agent or execution environment

┌─────────────────────┐
│   Your Application  │
│   (manages users)   │
└──┬───────┬──────┬───┘
   │       │      │
┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│User │ │User │ │User │
│  A  │ │  B  │ │  C  │
│ VM  │ │ VM  │ │ VM  │
│     │ │     │ │     │
│Own  │ │Own  │ │Own  │
│disk │ │disk │ │disk │
│Own  │ │Own  │ │Own  │
│key  │ │key  │ │key  │
└─────┘ └─────┘ └─────┘
  ▲         ▲        ▲
  Can't see each other

How it works: When a user signs up (or starts a session), your backend creates a workspace for them. All the user's activity - code execution, file storage, agent interactions - happens in their workspace. When they're done, the workspace is paused (to save costs) or destroyed (for one-off sessions).

Why it works: Hardware isolation between users. User A can't see User B's data, processes, or network. Not because your code correctly filters by user ID, but because they're in physically separate virtual machines with separate kernels and separate encrypted disks.

Cost management: Pause idle workspaces (resume takes milliseconds). Set TTLs on temporary sessions. Delete churned users' workspaces - cryptographic erasure ensures data is unrecoverable.


Pattern 5: Multi-Agent Network

Best for: Complex workflows requiring reasoning + research + coding + testing + deployment

         ┌──────────────┐
         │ Orchestrator  │
         │ (permanent)   │
         └──┬──┬──┬──┬──┘
            │  │  │  │
     ┌──────┘  │  │  └──────┐
     │         │  │         │
┌────▼───┐ ┌──▼──▼──┐ ┌───▼────┐
│Research│ │ Coder  │ │ Tester │
│ Agent  │ │ Agent  │ │ Agent  │
│        │ │        │ │        │
│Internet│ │ Files  │ │Air-gap │
│ yes    │ │ & exec │ │only    │
└────────┘ └────────┘ └────────┘

How it works: A lead orchestrator agent receives tasks, breaks them down, and delegates to specialist agents. Each specialist runs in its own workspace with only the capabilities it needs. The orchestrator coordinates results.

Why it works: Separation of concerns at the infrastructure level. The researcher can browse the web but can't modify code. The coder can write files but can't access the internet. The tester runs in an air-gapped sandbox where malicious test code is harmless.

Communication: The orchestrator creates specialist workspaces with private links back to itself. Specialists send results through the private network. Nothing goes over the internet.

The advanced move: Spin up specialist workspaces on demand, per task. Don't keep them running. Create → delegate → collect result → destroy. This keeps costs low and eliminates stale state.


How to choose your pattern

Start with Pattern 1. Upgrade when you hit a real limit:

ProblemPattern to add
"Agent errors when executing code corrupt its own environment"Pattern 2 (sandboxes)
"Agent needs a database but I don't want to expose it"Pattern 3 (private services)
"Users can see each other's data"Pattern 4 (per-user workspaces)
"Tasks are too complex for one agent"Pattern 5 (multi-agent)
"Nothing's broken, it just works"Stay on Pattern 1

Most production agents are Pattern 2 or 3. Multi-agent (Pattern 5) is powerful but has more moving parts - only use it when the task genuinely requires specialization.


Implementation tips

Don't premature-optimize

A single workspace with a well-designed agent handles 90% of use cases. Add complexity only when you have a specific problem to solve.

Destroy temporary workspaces aggressively

Don't accumulate sandbox workspaces. Create, use, destroy. Set TTLs. Clean up failed workspaces. The 130ms boot time means there's zero cost to creating fresh environments.

Use restart policies

Set restart policy to "always" for permanent agents. If they crash, they restart. No cron jobs, no health checks, no process managers.

Monitor per-workspace metrics

Each workspace has its own CPU, memory, disk, and network metrics. Watch for anomalies - an agent suddenly using 10x normal CPU might be in a broken loop.

Keep secrets in environment variables

Don't hardcode API keys in code. Set them in workspace environment variables through the dashboard or SDK. They're encrypted at rest and available to the agent process.


Summary

Production AI agents need more than an API key. They need isolated infrastructure with the right architecture:

  1. Persistent Agent - Simple, always-on, handles most use cases
  2. Agent + Sandbox - Safe code execution separate from the agent
  3. Agent + Private Services - Databases and services without internet exposure
  4. Per-User Isolated - Hardware isolation between users
  5. Multi-Agent Network - Specialist agents for complex tasks

Start simple. Add patterns when you need them. Every workspace is a ~130ms microVM, so the infrastructure adapts as fast as your requirements change.

Read the full docs →