How to Run Untrusted Code Safely in the Cloud
Sandbox untrusted code safely - whether from users or AI agents. The right way to execute arbitrary code without risking your infrastructure.
How to Run Untrusted Code Safely in the Cloud
At some point, your application will need to run code you didn't write.
Maybe your users submit code - an online IDE, a code interview platform, an educational tool. Maybe your AI agent generates code and needs to execute it to verify the output. Maybe you run a plugin system where third parties contribute extensions.
In all these cases, the code is untrusted. You don't know what it does. It might be perfectly fine. It might also try to read your environment variables, scan your internal network, consume all available memory, or install a cryptominer.
The question isn't whether to run untrusted code - it's how to run it without putting your infrastructure at risk.
What can go wrong
Let's be specific about the threats:
Data exfiltration
Untrusted code reads environment variables, filesystem contents, or network-accessible services and sends them to an external server. This is the most common attack on shared environments.
Resource abuse
Code that allocates all available memory, spawns thousands of processes, or runs an infinite loop. On shared infrastructure, this affects every other workload on the same machine.
Network attacks
Code that scans your internal network, connects to databases it shouldn't have access to, or sends spam from your IP addresses.
Persistence
Code that installs backdoors, modifies system files, or creates cron jobs that survive beyond the intended execution. The next user of that environment inherits the compromise.
Escape
The worst case: code that exploits a vulnerability in the isolation mechanism (container escape, kernel exploit) and gains access to the host machine.
Common approaches (and their limits)
Process-level sandboxing (seccomp, AppArmor)
Restricts which system calls a process can make. Effective at preventing some attacks, but complex to configure correctly. A seccomp profile that's too restrictive breaks legitimate code. Too permissive, and the sandbox is worthless.
Problem: Still shares a kernel. A kernel bug bypasses all process-level sandboxes.
Docker containers
Isolates the process in a namespace with its own filesystem and network. Better than process-level sandboxing, but shares the host kernel.
Problem: Container escapes happen. CVE-2019-5736, CVE-2020-15257, CVE-2022-0185, CVE-2024-21626. Each one allowed untrusted code to break out of the container and access the host.
Lambda / serverless functions
Cloud functions run in managed sandboxes. Good isolation, but limited: short execution times, limited filesystem, no persistent state, restricted networking.
Problem: Not suitable for complex workloads. You can't run a dev server, install packages interactively, or maintain state between executions.
Full VMs (EC2, GCE)
Strong hardware isolation. Own kernel, own memory. But slow to start (30-60 seconds), expensive to run per-execution, and complex to manage.
Problem: Boot time makes them impractical for per-request sandboxing.
The right approach: Firecracker microVMs
Firecracker microVMs solve the tradeoff. They give you:
- VM-level isolation - Own kernel, own memory space, own block device. A kernel exploit inside the VM can't reach the host.
- Container-like speed - ~130ms cold start. Fast enough to create per-request.
- Full Linux environment - Not a limited sandbox. Install packages, run servers, use the filesystem. Untrusted code gets a real computer - just one that can't reach anything it shouldn't.
- Aggressive resource limits - CPU, memory, disk, and network are all capped at the hardware level. A fork bomb hits the VM's limit, not the host's.
How to set up safe code execution on Oblien
Here's the practical setup for running untrusted code safely:
Step 1: Create a temporary workspace
For each code execution request, create a fresh workspace with strict limits:
const sandbox = await ws.create({
image: 'python-3.13', // or whatever runtime the code needs
cpus: 1, // minimum necessary
memory_mb: 512, // cap memory usage
writable_size_mb: 256, // small disk
ttl_seconds: 60, // auto-destroy after 60 seconds
allow_internet: false, // air-gapped - no network access
});This workspace is completely isolated. No internet, no access to other workspaces, no access to your services. It exists for 60 seconds and then self-destructs.
Step 2: Execute the untrusted code
const result = await ws.exec(sandbox.id, {
cmd: ['python3', '-c', userCode],
timeout_seconds: 30, // kill if it runs too long
});
console.log(result.stdout);
console.log(result.stderr);
console.log(result.exit_code);The code runs inside the isolated VM. It can do whatever it wants within the sandbox - install packages, create files, use all the CPU. But it can't:
- Access the internet
- Reach any other workspace or service
- Persist beyond the TTL
- Use more than 512MB of memory or 1 CPU
Step 3: Get the result and let it self-destruct
You've already set a TTL of 60 seconds. After that, the workspace is automatically destroyed - encryption key deleted, storage securely erased, network interfaces removed. No cleanup scripts needed.
For even faster cleanup, explicitly delete the workspace after getting the result:
await ws.delete(sandbox.id);Defense in depth
The workspace setup above gives you multiple layers of protection:
| Layer | What it prevents |
|---|---|
| Hardware isolation (microVM) | Kernel exploits, container escapes, side-channel attacks |
| No internet access | Data exfiltration, C2 communication, network scanning |
| Resource limits (CPU, RAM, disk) | Fork bombs, memory exhaustion, disk fill |
| Execution timeout | Infinite loops, resource abuse |
| TTL auto-destroy | Persistence, zombie processes, leftover data |
| Encrypted disk | Data recovery after deletion |
| Unique encryption key | Cross-workspace data access |
Each layer protects against failures in the other layers. Even if the code somehow bypasses one layer, the others contain the damage.
Real-world patterns
AI agent code execution
Your AI agent generates Python code to solve a user's problem. Instead of running it in the same process as the agent (one bad import and your agent is compromised), spin up a sandbox:
- Agent generates code
- Create air-gapped workspace (130ms)
- Execute code in workspace
- Return stdout/stderr to agent
- Destroy workspace
The agent gets the execution result without any risk to its own environment.
Online code playground
Users write code in a browser-based editor and click "Run":
- Frontend sends code to your API
- API creates a temporary workspace (130ms)
- Execute user code with a 10-second timeout
- Stream stdout back to the browser
- Workspace auto-destroys via TTL
Each "Run" click gets a fresh, isolated VM. One user's code can't affect another's.
Automated testing of pull requests
PR contains test code that could be malicious (open-source repos, external contributors):
- CI creates a workspace per test suite
- Clones repo, checks out PR branch
- Runs tests inside the isolated workspace
- Reports results back to CI
- Destroys workspace
A malicious test can't access your CI secrets, other repos, or internal services.
Cost and performance
"Isn't a VM per execution expensive?"
For per-request sandboxing with short TTLs (10-60 seconds), cost is minimal. You're paying for a fraction of a minute of compute per execution. Oblien's pricing is based on actual CPU-time and memory-time consumed.
"Is 130ms overhead noticeable?"
For interactive use cases (user clicks "Run" and sees output), the 130ms workspace creation is invisible - the code execution itself usually takes longer than the boot.
For high-throughput batch processing (thousands of executions per second), keep a pool of pre-created workspaces and reuse them. Create the pool on startup, assign workspaces from the pool, return them after execution.
Summary
Running untrusted code safely requires:
- Hardware isolation - Own kernel, own memory, own disk
- Network isolation - No access to internal services or the internet
- Resource limits - CPU, memory, and disk caps
- Time limits - Auto-kill after a timeout, auto-destroy after a TTL
- Cryptographic cleanup - Encryption key destroyed on deletion
Oblien gives you all five with a single API call. Create a workspace, run the code, get the result, destroy the workspace. 130ms to create, sub-second to destroy.
Your infrastructure stays safe. Your users get a full execution environment. Everyone's happy.
How to Run Claude Code in the Cloud Without a Local Machine
Run Claude Code on a cloud server - available 24/7 with no local machine, no SSH tunnels, and no terminal babysitting. Full setup guide.
How to Secure Your AI Agent Infrastructure: A Practical Checklist
Security checklist for AI agent deployments: isolation, networking, secrets, monitoring, and incident response. Actionable steps you can use today.