Skip to main content
claude-code unattended orchestration workflow

How to Run Claude Code Unattended

Claude Code can be automated, but unattended coding needs more than non-interactive mode. Here is why specs, checkpoints, verification, recovery, and Ralph Workflow matter if you want to walk away and still trust the result.

How to Run Claude Code Unattended

The first idea everybody has is the obvious one: run claude -p, give it a big prompt, go make coffee, and come back later.

That is not a bad instinct. Anthropic clearly supports that kind of usage. Claude Code has non-interactive mode, hooks, subagents, worktrees, permission modes, background execution, and now even cloud-hosted routines for scheduled work. If you want to automate Claude Code, the docs are not coy about it.

But there is a big difference between Claude Code can be automated and Claude Code by itself is a reliable unattended coding loop.

That gap is the whole article.

If you found this because you are specifically looking for a Claude Code autonomous mode wrapper, read this short companion page too: Claude Code Autonomous Mode Wrapper: What Actually Works.

The problem is not automation

Claude Code already has plenty of automation primitives. Anthropic documents claude -p for CI and scripts, hooks for lifecycle automation, permission modes, subagents, and routines for scheduled or event-driven runs. That part is real.

What those features do not give you automatically is a durable engineering loop that keeps checking the work until the work is actually complete. They give you ways to run the agent. They do not, by themselves, give you a trustworthy stop condition, and that matters more than people expect.

The annoying failure mode is not dramatic

The failure mode is usually not that Claude Code goes completely off the rails. It is something more boring than that, which is part of why it catches people.

It does a plausible amount of work, solves most of the immediate task, and then tells you it is done. Sometimes that is true. Sometimes it fixed the easy path and missed the real edge case. Sometimes it touched more files than it should have, ran the wrong check, or produced a diff that technically works but is miserable to review. Sometimes it got stuck, retried badly, and then narrated confidence anyway.

That is not a Claude Code-specific flaw. It is just how agentic coding behaves when the model is also being asked to judge its own success.

Models are very good at producing a completion story.

Engineering needs stronger evidence than a completion story.

Interactive Claude Code works because you are doing the missing job

This is why Claude Code feels so good when you are nearby. You read the plan and say yes or no. You notice when it is drifting. You tell it which command actually matters. You look at the diff. You decide whether the answer is complete or merely convincing. You restart when the session gets noisy. You decide when the result is clean enough to commit.

In other words, the human is the workflow. That is fine when you are at the keyboard. It becomes a bottleneck when the run is long, repetitive, or something you want to leave alone for an hour or overnight.

The hidden human loop has to become explicit.

Otherwise the run is not unattended.

It is just unsupervised.

Bigger prompts do not fix this

People usually try to solve the problem with a more forceful prompt. You can absolutely tell Claude Code to follow the spec, avoid unrelated changes, run tests, fix failures, and stop only when everything passes.

That helps. It is still not the same as having a workflow.

Long-running coding work is a loop. The agent has to plan, edit, inspect, verify, recover, retry, and stop. If all of that responsibility lives inside one prompt, the prompt is trying to be the task contract, the process, the reviewer, and the stop condition all at once.

That is too much to stuff into one instruction block.

Prompts are useful.

They are not governance.

Anthropic's docs are actually pretty clear about this

One thing I wanted to be careful about here is not understating Claude Code. Anthropic has done real work on automation. The docs are better than the dismissive “it is just a chat tool” takes people sometimes repeat.

But the docs also do not claim that Claude Code magically turns into a complete self-correcting engineering workflow if you remove the prompts. Non-interactive mode is presented as a way to integrate Claude Code into scripts and CI. Plan mode is explicitly about exploring and proposing changes before editing, which is useful, but it is still a human-oriented approval flow. Hooks are deterministic event handlers, not a general-purpose reasoning layer.

And one small detail in the docs says a lot: in -p mode, repeated blocks abort because there is no user to ask. That is a very honest design detail. It tells you exactly what these primitives are—useful building blocks, not the whole unattended loop.

What unattended coding actually needs

If the goal is "start a run, walk away, come back to something you can trust," the requirements are not especially mysterious.

You need a written spec, a planning gate, verification that happens outside the model's own self-confidence, recovery when the run flakes or gets rate-limited, and a handoff that looks like engineering evidence instead of a chat transcript.

In practice that means a task contract in a file, plan-first execution, a way to loop weak plans back before coding starts, a way to stop weak diffs from drifting into commits, repo-native verification, and some notion of logs, artifacts, checkpoints, and resume.

That is the difference between automation and orchestration.

The spec is not paperwork

For Ralph Workflow, the task lives in PROMPT.md. That is not because everything needs to become a giant RFC. It is because unattended work needs a durable contract.

Bad:

Improve the dashboard.

Better:

Add loading and empty states to the dashboard. Do not change the data model. Reuse the existing component style. Add tests for the empty state. Run the frontend test suite and type checker before review.

The second version does not just help the agent start. It gives the workflow something concrete to keep coming back to after the first pass, after the first failed check, and after the first retry.

That is what matters. The spec is not there to make the prompt longer. It is there to stop the run from quietly redefining success halfway through.

Weak plans should not drift into code

This is one of the core lessons from using coding agents for real work: many bad runs fail before the first edit.

The agent chooses the wrong abstraction, grabs the wrong files, ignores the constraint that matters, or proposes something that sounds tidy in English but does not actually solve the task.

When you are interactive, you catch that. When the run is unattended, the workflow has to catch it. That is why I keep coming back to a planning gate.

The question is simple:

Is this plan good enough to earn the handoff into implementation?

A weak plan should loop back. It should not become a diff just because the model sounded persuasive.

Weak diffs should not cross the commit boundary

The same logic applies after implementation. This is where a lot of “autonomous coding” talk gets mushy. People say the agent built the feature, but what they really mean is that it produced a diff.

A diff is not the same thing as a good result.

The workflow should still ask a few blunt questions: did it solve the actual task, did it stay inside scope, did it run the right checks, is this diff readable enough that a human reviewer will not hate you, and is there evidence for why the run stopped here?

That is why the handoff should be a diff, commits, logs, verification output, and artifacts. Not vibes. Not “Claude said it was complete.” Evidence.

Failure recovery is part of the pitch

This is another place where most demos undersell the real problem.

Long unattended runs fail in very ordinary ways. The network flakes. The provider rate-limits. The agent gets stuck. The context gets noisy. The plan was fine but the implementation pass was weak.

If your only model of autonomy is "fire one huge prompt and hope," every one of those failures turns into wasted time.

The workflow I actually want is more like autopilot than magic. It should checkpoint, resume, preserve evidence, and let one weak pass fail without erasing the whole run.

That is much more compelling to me than some theatrical promise of full autonomy.

Why the Ralph pattern still matters

The original Ralph idea is still good because it is simple: run the agent in a fresh context and let the filesystem carry the state.

That solves a real problem. Long agent sessions accumulate junk. Tool output piles up, failed attempts hang around, and eventually the model starts working through a fog of its own history.

Fresh context helps, but raw repetition is not enough. For unattended work, that loop still needs structure around it: planning, planning analysis, development, development analysis, verification, commit, recovery, and clear stop conditions.

That is the difference between "run the agent again" and "run an engineering workflow."

Where Ralph Workflow becomes compelling

[Ralph Workflow](/) is interesting to me not because it adds more AI, but because it puts discipline around the AI.

Claude Code can still be the coding engine. Or OpenCode. Or Codex. Or a mix of them. The stronger pitch is not vendor loyalty. It is that the workflow is explicit, inspectable, and repo-native.

The task lives in a file. The routing lives in TOML. The run leaves commits and artifacts behind. The next pass starts from fresh context instead of a bloated chat. Weak plans loop back. Weak diffs do not quietly cross the commit boundary.

That is the part I find compelling. Not "look how autonomous this is," but "look how much less babysitting this takes without turning the repo into a mystery novel."

A practical way to start

Do not start with "build the whole product." Start with something boring and bounded.

Add tests to an existing module. Fix a known class of lint or type-check failures. Refactor one narrow subsystem. Update docs from code. Migrate one small internal API.

These are the kinds of tasks where unattended work is actually useful. They are annoying enough to hand off, but constrained enough that a spec and a verification loop mean something.

That is where you learn whether the workflow is trustworthy—not in a demo, but in the repo.

Minimal setup

Install Ralph Workflow:

pipx install ralph-workflow

Initialize the default workflow:

ralph --init

Write the task in PROMPT.md.

Configure your agents in .agent/ralph-workflow.toml.

Then run:

ralph

The advice here is the same advice I would give for any automation system: keep the first version boring. Do not invent a giant custom flow on day one. Use the default loop. Pick a task you can judge clearly. Read the commits and artifacts afterward. Then change the workflow once you know what your actual failure modes look like.

The handoff should be evidence

When the run finishes, do not review it like a conversation. Review it like engineering work.

Look at the diff, the commits, the verification output, the plan, the analysis artifacts, and whether the result stayed inside the spec.

That is the point of all of this. The workflow should give you something you can inspect, not just something you can believe.

Claude Code can absolutely be automated. Anthropic has put real effort into making that true.

But automation primitives are not the same thing as a reliable unattended coding workflow. If you want unattended work you can trust, the important question is not whether the agent can keep typing while you are gone. It is what happens when the plan is weak, the diff is weak, the checks fail, the context gets noisy, or the run stops halfway through.

That is where the workflow matters. Claude Code provides the coding power. Ralph Workflow makes that power easier to leave alone without giving up reviewability, recovery, or control.