Oblien
Architecture

How to Build a Multi-Tenant AI Platform Where Every User Gets Their Own Computer

Architecture guide for multi-tenant SaaS: user-per-VM isolation, private networking, session management, and cost optimization patterns.

Oblien Team profile picture
Oblien Team
1 min read

How to Build a Multi-Tenant AI Platform Where Every User Gets Their Own Computer

The dream for any AI-powered SaaS: every user gets their own isolated environment. Their own filesystem, their own processes, their own network. Complete privacy and zero interference from other users.

This used to be insanely expensive. Giving every user a full virtual machine meant minutes of provisioning time, $50+/month per user just for the VM, and operational nightmare at scale.

With microVMs, it's now practical. Each user gets a hardware-isolated Linux environment that boots in milliseconds and costs pennies per hour. Here's how to architect it.


Why User-Per-VM Matters

The Container Problem

Most multi-tenant platforms use containers. User A and User B get separate Docker containers but share:

  • The host kernel (300+ shared syscalls)
  • The container runtime (a bug in containerd affects everyone)
  • The host network (if misconfigured, containers can see each other)
  • Physical hardware (cache side-channel attacks are real)

For a blog platform, this is fine. For an AI platform where users run arbitrary code, process sensitive data, and interact with agents that make autonomous decisions - it's terrifying.

What User-Per-VM Gives You

PropertyContainersUser-Per-VM
Kernel isolationSharedSeparate
Memory isolationcgroups (software)KVM (hardware)
Network isolationiptables (configurable)Namespace (default)
Filesystem isolationUnion mountEncrypted block device
Escape riskMultiple CVEs/yearZero known escapes
Data deletionDelete filesCryptographic erasure

Architecture Overview

┌────────────────────────────────────┐
│         Your SaaS Platform          │
│                                      │
│  ┌─────────┐  ┌──────────────────┐  │
│  │ Auth &   │  │ User Dashboard   │  │
│  │ Billing  │  │                  │  │
│  └────┬────┘  └────────┬─────────┘  │
│       │                │             │
│       └────────┬───────┘             │
│                │                     │
│       ┌────────▼────────┐            │
│       │ Workspace        │            │
│       │ Manager          │            │
│       │ (the key piece)  │            │
│       └────────┬────────┘            │
└────────────────┼─────────────────────┘

    ┌────────────┼────────────┐
    │            │            │
    ▼            ▼            ▼
┌────────┐ ┌────────┐  ┌────────┐
│ User A │ │ User B │  │ User C │
│ VM     │ │ VM     │  │ VM     │
│        │ │        │  │        │
│ Agent  │ │ Agent  │  │ Agent  │
│ Files  │ │ Files  │  │ Files  │
│ DB     │ │ DB     │  │ DB     │
└────────┘ └────────┘  └────────┘

The Workspace Manager is the key component. It maps users to workspaces, handles lifecycle, and manages resources.


The Workspace Manager

This service handles:

User-to-workspace mapping

Maintain a mapping in your database:

user_id → workspace_id, status, created_at, last_active

When a user logs in:

  1. Check if they have an active workspace
  2. If yes → connect them to it
  3. If no → create one

Lifecycle management

Each workspace goes through states:

Created → Running → Idle → Paused → Resumed → Running → ...
                                                         → Deleted
  • Running: user is active, workspace is live
  • Idle: no activity for 15-30 minutes
  • Paused: frozen, no compute cost, disk preserved
  • Resumed: user comes back, workspace unfreezes
  • Deleted: TTL expired or user deleted account

Resource allocation

Different user tiers get different resources:

TierCPURAMDiskMax Sessions
Free1512 MB2 GB1
Pro22 GB10 GB3
Team44 GB20 GB5
Enterprise816 GB100 GB10

Enforce at workspace creation - the VM literally cannot use more than allocated. No noisy neighbors.


Session Management

Users connect to their workspace through your web app. The connection flows:

  1. User authenticates with your platform
  2. Your backend gets/creates a workspace for the user
  3. Your backend creates a session token scoped to that workspace
  4. Frontend connects to the workspace via WebSocket
  5. All interactions (terminal, file ops, agent commands) go through the WebSocket

Multiple sessions

Users might open multiple browser tabs. Each tab creates a new session to the same workspace. Handle this by:

  • Allowing up to N concurrent sessions per workspace
  • Sharing the filesystem (all tabs see the same files)
  • Sharing running processes (a server started in tab 1 is visible in tab 2)
  • Independent terminals (each tab can have its own terminal session)

Session timeout

If all sessions disconnect (user closes all tabs):

  1. Start a countdown (e.g., 30 minutes)
  2. If no reconnection → pause the workspace
  3. User comes back → resume workspace, re-establish session

Connecting Users to Services

Users often need databases, caches, or other services. Give each user their own:

Option A: Embedded services

Run Postgres/Redis inside the user's workspace. Simplest approach - everything is in one VM. Works for development environments and small-scale apps.

Option B: Dedicated service workspaces

Create a separate workspace per service:

User A:
  ├── App workspace (2 CPU, 2 GB RAM)
  ├── Postgres workspace (1 CPU, 1 GB RAM)
  └── Redis workspace (1 CPU, 512 MB RAM)

Connect them via private networking. The Postgres workspace is air-gapped (no internet) - only the user's app workspace can reach it.

Option C: Shared managed services (careful)

For non-sensitive services, use a shared managed database with row-level security. Each user gets a database schema or row-level isolation. This is cheaper but weakens the isolation guarantee.


Data Privacy by Design

With user-per-VM, privacy comes built-in:

Encryption at rest

Each workspace has a unique encryption key. Even if someone physically stole the disk, they'd need the key to read any data.

Cryptographic deletion

When a user deletes their account or you need to purge their data (GDPR Article 17):

  1. Delete the encryption key from the KMS
  2. The workspace data becomes cryptographically unrecoverable
  3. Sanitize the disk for defense in depth
  4. Done

This is provable deletion. You can show auditors that the encryption key no longer exists, making recovery mathematically impossible.

Network privacy

Workspace traffic is isolated at the network level. User A cannot even detect that User B exists, let alone access their data.


Cost Analysis

The naive calculation (scary)

"If I have 10,000 users and each gets a VM, that's 10,000 VMs. That'll cost a fortune!"

The real calculation (reasonable)

Of 10,000 users:

  • ~500 are active right now (running workspace)
  • ~2,000 were active today (paused workspace)
  • ~7,500 haven't been active this week (no workspace)

The key insight: you only pay for active workspaces. Paused workspaces cost a fraction (just storage), and inactive users cost nothing. At scale, the per-user cost is dramatically lower than running a fixed fleet of servers with container-based multi-tenancy.

User-per-VM is actually cheaper at scale - primarily because you don't pay for idle compute and you don't need a dedicated infrastructure team.


Horizontal Scaling

As you grow from 1,000 to 100,000 users:

There's no cluster to scale

Each workspace is independent. You don't have a Kubernetes cluster that needs bigger node groups, or a Docker Swarm that needs more managers. You call the API to create a workspace - the platform handles placement.

State is self-contained

Each user's state lives entirely in their workspace. There's no shared database of agent states, no central file store, no session store to scale. When you delete a user, you delete their workspace. When you migrate a user, you snapshot and restore their workspace.

Regional deployment

Serve users from the nearest region for lower latency. Each region is independent - no cross-region coordination needed.


Building the User Experience

First login

User signs up → you create a workspace → show them a loading bar while the workspace starts → they land in their environment.

Make the loading bar informative:

  • "Setting up your workspace..." (creating VM)
  • "Installing tools..." (workspace boots + package install)
  • "Ready!" (redirect to dashboard)

Returning user

User logs in → check if workspace exists and is running → if paused, resume it → connect.

If the workspace doesn't exist (user was inactive for weeks), recreate it. Store workspace configuration in your database so recreation uses the same settings.

Account deletion

User clicks "Delete my account":

  1. Delete workspace (including encryption key → cryptographic erasure)
  2. Remove user record from your database
  3. Confirm deletion

GDPR compliance in 3 steps, all verifiable.


Summary

The user-per-VM architecture:

  1. Every user gets hardware-level isolation - separate kernel, encrypted disk
  2. Create on login, pause on idle, delete on churn - pay only for active users
  3. Cheaper than containers at scale - no SRE team, no idle compute
  4. Privacy by default - encryption + cryptographic deletion = easy compliance
  5. No scaling bottlenecks - each workspace is independent

The era of "we can't give every user a VM, that's too expensive" is over. With millisecond boot times and per-second billing, user-per-VM is the most cost-effective way to build a secure multi-tenant platform.

Related readingIsolated Sandboxes for Every User | Oblien Documentation