Oblien
Security

How to Keep Your AI Agent's Data Private by Design

Build AI agent systems where data stays private by default - encrypted at rest, isolated per user, and cryptographically deleted when unneeded.

Oblien Team profile picture
Oblien Team
1 min read

How to Keep Your AI Agent's Data Private by Design

AI agents process sensitive data. Code repositories, database credentials, API keys, customer information, proprietary business logic. When you give an agent access to your codebase or your users' data, you need guarantees that the data stays private.

Most AI platforms treat privacy as an afterthought - a checklist of features added on top of a shared infrastructure. This article shows how to build privacy into the architecture from the ground up.


The Problem with Shared Infrastructure

When multiple agents (or multiple users' agents) share infrastructure:

Shared filesystem

If agents share a host and your isolation isn't perfect, Agent A might read Agent B's files. Even with Docker volumes, a misconfigured mount or a container escape exposes everything.

Shared memory

Containers on the same host share kernel memory structures. Side-channel attacks (Spectre-class) can leak data between containers. This isn't theoretical - researchers have demonstrated cross-container data extraction.

Shared network

Agents on the same Docker bridge network can see each other's traffic. ARP spoofing, DNS hijacking, and direct IP access are all possible within a shared network namespace.

Shared logs

Agent output often goes to a centralized logging system. One agent's sensitive data (API keys, credentials, PII) might appear in logs that other teams can access.

Shared disk

Even after you delete files, the data remains on the physical disk until overwritten. On shared infrastructure, another tenant's workload might read those disk blocks.


Privacy by Design: The Architecture

True data privacy requires isolation at every layer:

┌─────────────────────────────────────┐
│  User A's Agent                      │
│                                      │
│  ┌─────────────────────────────────┐│
│  │ Own kernel (separate from host) ││
│  ├─────────────────────────────────┤│
│  │ Own encrypted filesystem        ││
│  │ (unique per-workspace key)       ││
│  ├─────────────────────────────────┤│
│  │ Own network namespace           ││
│  │ (dark by default)               ││
│  ├─────────────────────────────────┤│
│  │ Own memory (hardware-isolated)  ││
│  └─────────────────────────────────┘│
└─────────────────────────────────────┘
     ↕ ZERO visibility
┌─────────────────────────────────────┐
│  User B's Agent                      │
│  (identical isolation)               │
└─────────────────────────────────────┘

No shared kernel. No shared filesystem. No shared network. No shared memory. Complete isolation at the hardware level.


Layer 1: Encrypted Storage

Every workspace's filesystem is encrypted with AES-256 using a unique per-workspace key:

How it works

  1. Key generation - a random encryption key is generated when the workspace is created
  2. Key management - the workspace key is protected through the KMS
  3. Transparent encryption - the Linux block device is encrypted transparently. All reads and writes go through the encryption layer
  4. Performance - hardware acceleration means cryptographic operations add negligible overhead

What this means

  • Data at rest is always encrypted - if someone clones the disk, they get ciphertext
  • Each workspace has a unique key - compromising one key reveals nothing about other workspaces
  • The agent doesn't need to encrypt anything manually - it reads and writes files normally, encryption happens below

Deletion

When a workspace is deleted:

  1. The workspace's encryption key is destroyed
  2. Without the key, the encrypted data is computationally unrecoverable
  3. All storage is securely erased for defense in depth

This is cryptographic erasure - the strongest form of data deletion. You can prove to an auditor that the key no longer exists, making data recovery mathematically impossible.


Layer 2: Network Isolation

Every workspace starts in a "dark" network state - no inbound connections, no visibility to other workspaces. You explicitly enable every network path.

Default state

  • No inbound ports open
  • No connectivity to other workspaces
  • Internet access: available (for package installation, etc.)
  • Internal services: blocked

Controlled connectivity

Need to connect an agent to a database? Create an explicit private link between the two workspaces. This link is:

  • Point-to-point (only these two workspaces can talk)
  • Encrypted in transit
  • Temporary (destroyed when either workspace is deleted)

Need to lock down an agent completely? Disable internet access. The workspace becomes air-gapped - it can only use what's already installed.

What the agent can't do

  • Scan for other workspaces on the network
  • Access cloud metadata endpoints
  • Reach your internal services (unless explicitly allowed)
  • See any network traffic except its own

Layer 3: Memory Isolation

Each workspace runs in its own KVM virtual machine. Memory is isolated at the hardware level:

  • No shared pages - VM memory is mapped by the hypervisor and inaccessible to other VMs
  • Memory is hardware-isolated per workspace - no sharing at any level
  • Side-channel mitigations - side-channel attacks between workspaces are mitigated at the hardware and OS level
  • Clean teardown - when a workspace is destroyed, all resources are securely released

This eliminates entire classes of attacks:

  • Spectre/Meltdown variants that exploit shared caches
  • Rowhammer attacks on shared DRAM modules
  • Memory bus snooping

Layer 4: Process Isolation

Traditional containers share the host kernel - every container's processes are visible to the kernel and potentially to other containers. With microVMs:

  • Each workspace has its own kernel
  • Processes inside a workspace are invisible to the host and other workspaces
  • Process IDs are independent (no PID exhaustion from other workspaces)
  • System call filtering is enforced at the VMM level, not the container runtime level

Practical Implementation

For SaaS platforms handling user data

If your product processes user data through AI agents:

Step 1: One workspace per user request Don't reuse agent workspaces across users. Each user interaction creates a fresh workspace. When it's done, the workspace is destroyed with cryptographic erasure.

Step 2: Inject only what's needed Pass the minimum required data to the workspace. If the agent needs to analyze a single CSV file, send just that file - not the entire database.

Step 3: Lock down the network User data processing workspaces should be air-gapped. No internet access, no access to your backend. The data goes in, the result comes out, nothing leaks.

Step 4: Log carefully Workspace-internal logs stay inside the workspace (encrypted). Only send sanitized, non-sensitive metadata (duration, success/failure, resource usage) to your monitoring system.

For companies using agents internally

Step 1: Per-team workspaces Each team's agents run in isolated workspaces. The marketing agent can't access the engineering codebase. The sales agent can't see the financial models.

Step 2: Secret injection via KMS Don't store API keys in environment variables that persist in the workspace image. Inject them at runtime via your KMS, scoped to each workspace.

Step 3: Audit trails Log which workspace accessed which secrets, ran which commands, and communicated with which services. The workspace isolation makes this straightforward - each workspace is a clean audit boundary.


GDPR, HIPAA, and SOC 2 Compliance

GDPR (Right to Erasure, Article 17)

User requests data deletion:

  1. Identify all workspaces containing the user's data
  2. Delete those workspaces (cryptographic erasure)
  3. Delete the user record from your database
  4. Provide proof: "Encryption keys destroyed at [timestamp]"

HIPAA (Technical Safeguards)

  • Access control ✅ - workspaces have unique identities and scoped access
  • Encryption ✅ - AES-256 at rest, TLS in transit
  • Audit controls ✅ - per-workspace command and access logging
  • Integrity controls ✅ - encrypted filesystem prevents tampering
  • Transmission security ✅ - all inter-workspace communication encrypted

SOC 2 (Security, Availability, Confidentiality)

  • Isolation ✅ - hardware-level separation between customers
  • Encryption ✅ - per-workspace keys with KMS protection
  • Access logging ✅ - every workspace action is traceable
  • Data deletion ✅ - cryptographic erasure with provable key destruction

The key theme: hardware isolation makes compliance provable. Instead of arguing that your container configuration prevents data leakage, you show that workspaces are physically separate with independent encryption keys.


The Privacy Checklist

Before running AI agents on sensitive data:

  • Each agent runs in its own isolated environment (not a shared container)
  • Storage is encrypted with a per-agent unique key
  • Network is dark by default - no inbound, minimal outbound
  • Memory is hardware-isolated (VMs, not containers)
  • Agent deletion includes cryptographic erasure
  • Only required data is passed to the agent environment
  • Logs are separated between workspace-internal and monitoring
  • Secrets are injected at runtime, not baked into images
  • Audit trails track every data access and operation
  • Retention policies automatically delete expired data

If you can check all of these, your agent infrastructure is private by design - not by configuration.

Related readingZero-Trust Networking | How to Run Untrusted Code Safely | Oblien Documentation