Oblien
Security

How to Run Untrusted Code Safely in the Cloud

Sandbox untrusted code safely - whether from users or AI agents. The right way to execute arbitrary code without risking your infrastructure.

Oblien Team profile picture
Oblien Team
1 min read

How to Run Untrusted Code Safely in the Cloud

At some point, your application will need to run code you didn't write.

Maybe your users submit code - an online IDE, a code interview platform, an educational tool. Maybe your AI agent generates code and needs to execute it to verify the output. Maybe you run a plugin system where third parties contribute extensions.

In all these cases, the code is untrusted. You don't know what it does. It might be perfectly fine. It might also try to read your environment variables, scan your internal network, consume all available memory, or install a cryptominer.

The question isn't whether to run untrusted code - it's how to run it without putting your infrastructure at risk.


What can go wrong

Let's be specific about the threats:

Data exfiltration

Untrusted code reads environment variables, filesystem contents, or network-accessible services and sends them to an external server. This is the most common attack on shared environments.

Resource abuse

Code that allocates all available memory, spawns thousands of processes, or runs an infinite loop. On shared infrastructure, this affects every other workload on the same machine.

Network attacks

Code that scans your internal network, connects to databases it shouldn't have access to, or sends spam from your IP addresses.

Persistence

Code that installs backdoors, modifies system files, or creates cron jobs that survive beyond the intended execution. The next user of that environment inherits the compromise.

Escape

The worst case: code that exploits a vulnerability in the isolation mechanism (container escape, kernel exploit) and gains access to the host machine.


Common approaches (and their limits)

Process-level sandboxing (seccomp, AppArmor)

Restricts which system calls a process can make. Effective at preventing some attacks, but complex to configure correctly. A seccomp profile that's too restrictive breaks legitimate code. Too permissive, and the sandbox is worthless.

Problem: Still shares a kernel. A kernel bug bypasses all process-level sandboxes.

Docker containers

Isolates the process in a namespace with its own filesystem and network. Better than process-level sandboxing, but shares the host kernel.

Problem: Container escapes happen. CVE-2019-5736, CVE-2020-15257, CVE-2022-0185, CVE-2024-21626. Each one allowed untrusted code to break out of the container and access the host.

Lambda / serverless functions

Cloud functions run in managed sandboxes. Good isolation, but limited: short execution times, limited filesystem, no persistent state, restricted networking.

Problem: Not suitable for complex workloads. You can't run a dev server, install packages interactively, or maintain state between executions.

Full VMs (EC2, GCE)

Strong hardware isolation. Own kernel, own memory. But slow to start (30-60 seconds), expensive to run per-execution, and complex to manage.

Problem: Boot time makes them impractical for per-request sandboxing.


The right approach: Firecracker microVMs

Firecracker microVMs solve the tradeoff. They give you:

  • VM-level isolation - Own kernel, own memory space, own block device. A kernel exploit inside the VM can't reach the host.
  • Container-like speed - ~130ms cold start. Fast enough to create per-request.
  • Full Linux environment - Not a limited sandbox. Install packages, run servers, use the filesystem. Untrusted code gets a real computer - just one that can't reach anything it shouldn't.
  • Aggressive resource limits - CPU, memory, disk, and network are all capped at the hardware level. A fork bomb hits the VM's limit, not the host's.

How to set up safe code execution on Oblien

Here's the practical setup for running untrusted code safely:

Step 1: Create a temporary workspace

For each code execution request, create a fresh workspace with strict limits:

const sandbox = await ws.create({
  image: 'python-3.13',   // or whatever runtime the code needs
  cpus: 1,                 // minimum necessary
  memory_mb: 512,          // cap memory usage
  writable_size_mb: 256,   // small disk
  ttl_seconds: 60,         // auto-destroy after 60 seconds
  allow_internet: false,   // air-gapped - no network access
});

This workspace is completely isolated. No internet, no access to other workspaces, no access to your services. It exists for 60 seconds and then self-destructs.

Step 2: Execute the untrusted code

const result = await ws.exec(sandbox.id, {
  cmd: ['python3', '-c', userCode],
  timeout_seconds: 30,     // kill if it runs too long
});

console.log(result.stdout);
console.log(result.stderr);
console.log(result.exit_code);

The code runs inside the isolated VM. It can do whatever it wants within the sandbox - install packages, create files, use all the CPU. But it can't:

  • Access the internet
  • Reach any other workspace or service
  • Persist beyond the TTL
  • Use more than 512MB of memory or 1 CPU

Step 3: Get the result and let it self-destruct

You've already set a TTL of 60 seconds. After that, the workspace is automatically destroyed - encryption key deleted, storage securely erased, network interfaces removed. No cleanup scripts needed.

For even faster cleanup, explicitly delete the workspace after getting the result:

await ws.delete(sandbox.id);

Defense in depth

The workspace setup above gives you multiple layers of protection:

LayerWhat it prevents
Hardware isolation (microVM)Kernel exploits, container escapes, side-channel attacks
No internet accessData exfiltration, C2 communication, network scanning
Resource limits (CPU, RAM, disk)Fork bombs, memory exhaustion, disk fill
Execution timeoutInfinite loops, resource abuse
TTL auto-destroyPersistence, zombie processes, leftover data
Encrypted diskData recovery after deletion
Unique encryption keyCross-workspace data access

Each layer protects against failures in the other layers. Even if the code somehow bypasses one layer, the others contain the damage.


Real-world patterns

AI agent code execution

Your AI agent generates Python code to solve a user's problem. Instead of running it in the same process as the agent (one bad import and your agent is compromised), spin up a sandbox:

  1. Agent generates code
  2. Create air-gapped workspace (130ms)
  3. Execute code in workspace
  4. Return stdout/stderr to agent
  5. Destroy workspace

The agent gets the execution result without any risk to its own environment.

Online code playground

Users write code in a browser-based editor and click "Run":

  1. Frontend sends code to your API
  2. API creates a temporary workspace (130ms)
  3. Execute user code with a 10-second timeout
  4. Stream stdout back to the browser
  5. Workspace auto-destroys via TTL

Each "Run" click gets a fresh, isolated VM. One user's code can't affect another's.

Automated testing of pull requests

PR contains test code that could be malicious (open-source repos, external contributors):

  1. CI creates a workspace per test suite
  2. Clones repo, checks out PR branch
  3. Runs tests inside the isolated workspace
  4. Reports results back to CI
  5. Destroys workspace

A malicious test can't access your CI secrets, other repos, or internal services.


Cost and performance

"Isn't a VM per execution expensive?"

For per-request sandboxing with short TTLs (10-60 seconds), cost is minimal. You're paying for a fraction of a minute of compute per execution. Oblien's pricing is based on actual CPU-time and memory-time consumed.

"Is 130ms overhead noticeable?"

For interactive use cases (user clicks "Run" and sees output), the 130ms workspace creation is invisible - the code execution itself usually takes longer than the boot.

For high-throughput batch processing (thousands of executions per second), keep a pool of pre-created workspaces and reuse them. Create the pool on startup, assign workspaces from the pool, return them after execution.


Summary

Running untrusted code safely requires:

  1. Hardware isolation - Own kernel, own memory, own disk
  2. Network isolation - No access to internal services or the internet
  3. Resource limits - CPU, memory, and disk caps
  4. Time limits - Auto-kill after a timeout, auto-destroy after a TTL
  5. Cryptographic cleanup - Encryption key destroyed on deletion

Oblien gives you all five with a single API call. Create a workspace, run the code, get the result, destroy the workspace. 130ms to create, sub-second to destroy.

Your infrastructure stays safe. Your users get a full execution environment. Everyone's happy.

Read the execution docs →