Safe AI Code Execution: How to Let a Coding Agent Run Commands Without Risking Your System
AI coding agents need to run shell commands, install packages, and write files. Without sandbox isolation, a hallucinated or misdirected command can take down your machine. Here's how to contain the risk.
Safe AI Code Execution: How to Let a Coding Agent Run Commands Without Risking Your System
AI coding agents are getting good enough to write real code. The next step — the one that actually makes them useful — is letting them run it.
But there's a problem: once you give an AI agent shell access, every command it runs is one rm -rf away from disaster. Even well-behaved agents can be tripped up by ambiguous instructions, misunderstood paths, or errors that cascade into destructive behavior.
Here's how to contain that risk without giving up the productivity.
The Problem: Shell Access Is Ambient
When a coding agent runs commands in your terminal, it usually inherits your user's full environment — your SSH keys, your Python virtualenvs, your ~/.config, your access to every mounted filesystem. A hallucinated pip uninstall or a misguided git reset --hard can undo hours of work or worse.
The pattern you see in most AI coding tools:
# The agent wants to test the build...
python3 -m build
# ...but the generated command runs in YOUR environment
# with your permissions and your filesystem access
This is fine when you're sitting there watching. It's not fine when the agent is running unattended overnight.
The Fix: Execution Sandboxing
The solution is to put every agent command in a sandboxed workspace — a temporary directory with bounded permissions, isolated from your real project until you approve the output.
Three things a proper execution sandbox does:
Filesystem isolation. The agent runs in a temp directory. Its file writes, package installs, and configuration changes never touch your real project until you explicitly merge them back.
Bounded concurrency. Multiple agent steps can't spawn unlimited parallel shells. A max-slot limit prevents fork bombs and runaway resource exhaustion — a real failure mode when an agent retries a command in a loop.
Stale cleanup. Old sandbox directories from failed or abandoned runs shouldn't accumulate forever. A manager that prunes stale workspace pools keeps your disk clean without manual intervention.
What Happens Without Sandboxing
A few real failure modes we've seen:
The packaging cascade. Agent runs
pip installin what it thinks is a virtualenv. It's not. Every dependency gets installed globally, breaking the system Python.The cleanup mistake. Agent generates a cleanup script to remove test artifacts. The script's
rm -rf $TMP_DIRhas an unset variable. It removes/.The resource leak. Agent retries a failing command 200 times in 200 parallel subprocesses. The machine hits OOM and kills your database process.
None of these are the agent being malicious. They're all alignment failures between "what the human intended" and "what the agent understood."
What a Sandboxed Execution Looks Like
┌─ Workspace Pool ──────────────────────────┐
│ workspace_0/ (slot 0, running) │
│ workspace_1/ (slot 1, running) │
│ workspace_2/ (idle, pending prune) │
│ max_slots = 2, pool lock = held │
└───────────────────────────────────────────┘
The agent requests a workspace slot. The manager either hands one out (under the max_slots cap) or makes the caller wait. Commands run inside the workspace, not your home directory. When the task finishes, the workspace is staged for deletion or kept for inspection — your choice.
Why This Matters for Unattended Work
If you only use coding agents interactively — watching every command, approving every file write — sandboxing is nice but not essential.
If you want to hand a task to an agent at 10 PM and wake up to reviewable output at 8 AM, sandboxing is non-negotiable. You can't supervise commands while you're asleep.
The difference between a tool that needs supervision and one that can run unattended isn't model intelligence — it's execution safety.
Putting It Together
A safe unattended coding pipeline needs three layers:
- Spec-driven task definition — the agent knows what "done" means because you wrote acceptance criteria, not just a prompt.
- Sandboxed command execution — every shell command, package install, and file write is contained.
- Verifiable output — you don't read every line. You check that tests pass, artifacts exist, and the changeset is reviewable in a normal
git diff.
Without all three, you're either babysitting the agent or gambling on luck.
Ralph Workflow implements this three-layer pattern as a free and open-source CLI orchestrator. It sandboxes execution, enforces max concurrency, prunes stale workspaces, and leaves you with reviewable output — diffs, logs, test results — you inspect in your normal git workflow.
Star it on Codeberg · GitHub mirror · Start with a first task
Best evaluator path
Turn the idea into a real overnight test, not another saved tab.
Codeberg-first: open the primary repo, choose one bounded backlog task, run it tonight, and ask one question tomorrow morning — would I merge this? GitHub stays available as the mirror.
Open the primary Codeberg repo
Read the public source before you install anything.
Pick a first task
Use the guide to choose a bounded backlog item that is honest to review.
Install and run Ralph Workflow
Keep the machine awake, then decide in the morning whether the diff is good enough to merge.