How to Build a Multi-Tenant AI Platform Where Every User Gets Their Own Computer
Architecture guide for multi-tenant SaaS: user-per-VM isolation, private networking, session management, and cost optimization patterns.
How to Build a Multi-Tenant AI Platform Where Every User Gets Their Own Computer
The dream for any AI-powered SaaS: every user gets their own isolated environment. Their own filesystem, their own processes, their own network. Complete privacy and zero interference from other users.
This used to be insanely expensive. Giving every user a full virtual machine meant minutes of provisioning time, $50+/month per user just for the VM, and operational nightmare at scale.
With microVMs, it's now practical. Each user gets a hardware-isolated Linux environment that boots in milliseconds and costs pennies per hour. Here's how to architect it.
Why User-Per-VM Matters
The Container Problem
Most multi-tenant platforms use containers. User A and User B get separate Docker containers but share:
- The host kernel (300+ shared syscalls)
- The container runtime (a bug in containerd affects everyone)
- The host network (if misconfigured, containers can see each other)
- Physical hardware (cache side-channel attacks are real)
For a blog platform, this is fine. For an AI platform where users run arbitrary code, process sensitive data, and interact with agents that make autonomous decisions - it's terrifying.
What User-Per-VM Gives You
| Property | Containers | User-Per-VM |
|---|---|---|
| Kernel isolation | Shared | Separate |
| Memory isolation | cgroups (software) | KVM (hardware) |
| Network isolation | iptables (configurable) | Namespace (default) |
| Filesystem isolation | Union mount | Encrypted block device |
| Escape risk | Multiple CVEs/year | Zero known escapes |
| Data deletion | Delete files | Cryptographic erasure |
Architecture Overview
┌────────────────────────────────────┐
│ Your SaaS Platform │
│ │
│ ┌─────────┐ ┌──────────────────┐ │
│ │ Auth & │ │ User Dashboard │ │
│ │ Billing │ │ │ │
│ └────┬────┘ └────────┬─────────┘ │
│ │ │ │
│ └────────┬───────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Workspace │ │
│ │ Manager │ │
│ │ (the key piece) │ │
│ └────────┬────────┘ │
└────────────────┼─────────────────────┘
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ User A │ │ User B │ │ User C │
│ VM │ │ VM │ │ VM │
│ │ │ │ │ │
│ Agent │ │ Agent │ │ Agent │
│ Files │ │ Files │ │ Files │
│ DB │ │ DB │ │ DB │
└────────┘ └────────┘ └────────┘The Workspace Manager is the key component. It maps users to workspaces, handles lifecycle, and manages resources.
The Workspace Manager
This service handles:
User-to-workspace mapping
Maintain a mapping in your database:
user_id → workspace_id, status, created_at, last_activeWhen a user logs in:
- Check if they have an active workspace
- If yes → connect them to it
- If no → create one
Lifecycle management
Each workspace goes through states:
Created → Running → Idle → Paused → Resumed → Running → ...
→ Deleted- Running: user is active, workspace is live
- Idle: no activity for 15-30 minutes
- Paused: frozen, no compute cost, disk preserved
- Resumed: user comes back, workspace unfreezes
- Deleted: TTL expired or user deleted account
Resource allocation
Different user tiers get different resources:
| Tier | CPU | RAM | Disk | Max Sessions |
|---|---|---|---|---|
| Free | 1 | 512 MB | 2 GB | 1 |
| Pro | 2 | 2 GB | 10 GB | 3 |
| Team | 4 | 4 GB | 20 GB | 5 |
| Enterprise | 8 | 16 GB | 100 GB | 10 |
Enforce at workspace creation - the VM literally cannot use more than allocated. No noisy neighbors.
Session Management
Users connect to their workspace through your web app. The connection flows:
- User authenticates with your platform
- Your backend gets/creates a workspace for the user
- Your backend creates a session token scoped to that workspace
- Frontend connects to the workspace via WebSocket
- All interactions (terminal, file ops, agent commands) go through the WebSocket
Multiple sessions
Users might open multiple browser tabs. Each tab creates a new session to the same workspace. Handle this by:
- Allowing up to N concurrent sessions per workspace
- Sharing the filesystem (all tabs see the same files)
- Sharing running processes (a server started in tab 1 is visible in tab 2)
- Independent terminals (each tab can have its own terminal session)
Session timeout
If all sessions disconnect (user closes all tabs):
- Start a countdown (e.g., 30 minutes)
- If no reconnection → pause the workspace
- User comes back → resume workspace, re-establish session
Connecting Users to Services
Users often need databases, caches, or other services. Give each user their own:
Option A: Embedded services
Run Postgres/Redis inside the user's workspace. Simplest approach - everything is in one VM. Works for development environments and small-scale apps.
Option B: Dedicated service workspaces
Create a separate workspace per service:
User A:
├── App workspace (2 CPU, 2 GB RAM)
├── Postgres workspace (1 CPU, 1 GB RAM)
└── Redis workspace (1 CPU, 512 MB RAM)Connect them via private networking. The Postgres workspace is air-gapped (no internet) - only the user's app workspace can reach it.
Option C: Shared managed services (careful)
For non-sensitive services, use a shared managed database with row-level security. Each user gets a database schema or row-level isolation. This is cheaper but weakens the isolation guarantee.
Data Privacy by Design
With user-per-VM, privacy comes built-in:
Encryption at rest
Each workspace has a unique encryption key. Even if someone physically stole the disk, they'd need the key to read any data.
Cryptographic deletion
When a user deletes their account or you need to purge their data (GDPR Article 17):
- Delete the encryption key from the KMS
- The workspace data becomes cryptographically unrecoverable
- Sanitize the disk for defense in depth
- Done
This is provable deletion. You can show auditors that the encryption key no longer exists, making recovery mathematically impossible.
Network privacy
Workspace traffic is isolated at the network level. User A cannot even detect that User B exists, let alone access their data.
Cost Analysis
The naive calculation (scary)
"If I have 10,000 users and each gets a VM, that's 10,000 VMs. That'll cost a fortune!"
The real calculation (reasonable)
Of 10,000 users:
- ~500 are active right now (running workspace)
- ~2,000 were active today (paused workspace)
- ~7,500 haven't been active this week (no workspace)
The key insight: you only pay for active workspaces. Paused workspaces cost a fraction (just storage), and inactive users cost nothing. At scale, the per-user cost is dramatically lower than running a fixed fleet of servers with container-based multi-tenancy.
User-per-VM is actually cheaper at scale - primarily because you don't pay for idle compute and you don't need a dedicated infrastructure team.
Horizontal Scaling
As you grow from 1,000 to 100,000 users:
There's no cluster to scale
Each workspace is independent. You don't have a Kubernetes cluster that needs bigger node groups, or a Docker Swarm that needs more managers. You call the API to create a workspace - the platform handles placement.
State is self-contained
Each user's state lives entirely in their workspace. There's no shared database of agent states, no central file store, no session store to scale. When you delete a user, you delete their workspace. When you migrate a user, you snapshot and restore their workspace.
Regional deployment
Serve users from the nearest region for lower latency. Each region is independent - no cross-region coordination needed.
Building the User Experience
First login
User signs up → you create a workspace → show them a loading bar while the workspace starts → they land in their environment.
Make the loading bar informative:
- "Setting up your workspace..." (creating VM)
- "Installing tools..." (workspace boots + package install)
- "Ready!" (redirect to dashboard)
Returning user
User logs in → check if workspace exists and is running → if paused, resume it → connect.
If the workspace doesn't exist (user was inactive for weeks), recreate it. Store workspace configuration in your database so recreation uses the same settings.
Account deletion
User clicks "Delete my account":
- Delete workspace (including encryption key → cryptographic erasure)
- Remove user record from your database
- Confirm deletion
GDPR compliance in 3 steps, all verifiable.
Summary
The user-per-VM architecture:
- Every user gets hardware-level isolation - separate kernel, encrypted disk
- Create on login, pause on idle, delete on churn - pay only for active users
- Cheaper than containers at scale - no SRE team, no idle compute
- Privacy by default - encryption + cryptographic deletion = easy compliance
- No scaling bottlenecks - each workspace is independent
The era of "we can't give every user a VM, that's too expensive" is over. With millisecond boot times and per-second billing, user-per-VM is the most cost-effective way to build a secure multi-tenant platform.
Related reading → Isolated Sandboxes for Every User | Oblien Documentation
How to Let Your Users Run Code Safely Inside Your App
Add safe code execution to your product - users write and run code in sandboxed environments that can't touch your infrastructure.
Oblien vs Traditional Cloud: Why We Built a Platform Just for AI Agents
Why existing cloud platforms fail AI agent workloads, and how purpose-built microVM infrastructure solves what EC2, Lambda, and K8s can't.