How to Tell if an AI Coding Task Is Actually Done¶
Ralph Workflow is a free and open-source AI agent orchestrator built around a simple core loop inspired by the original Ralph loop. That simple core composes into a stronger workflow system for serious repo work, and the default workflow is already strong enough to start with before you customize anything.
Ralph Workflow is the operating system for autonomous coding: a free and open-source composable loop framework and AI orchestrator that runs the coding agents you already use on your own machine.
It is for developers and technical teams with work that is too big to babysit and too risky to trust blindly.
What makes it different is not that it produces a cleaner summary. Ralph Workflow is built to leave you with software and verification you can actually judge — executable changes, checks, artifacts, and a short result summary — instead of just a transcript and a confident done claim.
Why read this now? An AI coding task is not done when the model sounds done. It is done when the result comes back in a shape you can review, verify, and decide whether you would merge.
The fast test: done means mergeable or honestly blocked¶
A trustworthy finish state should answer five questions quickly:
What changed?
You should be able to name the files or surfaces that moved.
The scope should still match the task you actually handed off.
What proof came back?
Tests, lint, build, screenshots, artifact files, or other concrete evidence should exist.
“It works” is not proof.
What is still uncertain?
Open questions should be called out explicitly.
Hidden uncertainty is how “done” turns into cleanup work for the human reviewer.
Would you merge it?
If the answer is yes, the task is done enough to matter.
If the answer is no, the run is not done just because the agent stopped typing.
If it is blocked, is the block legible?
A good failed run still leaves a readable trail: what was attempted, what failed, and what should happen next.
That is the real standard: mergeable or honestly blocked.
Red flags that the task is not actually done¶
Be skeptical if you see any of these:
the summary is confident but the diff does not match the ask
there are changed files but no meaningful verification evidence
the run touched shared boundaries and nobody checked the merged state
the agent produced a long transcript but no short result summary
the result created obvious follow-up work that was never named
you still need to reconstruct the whole night before you can judge anything
Those are signs that the task may be stopped, but not truly done.
What “done” should look like tomorrow morning¶
A strong unattended coding result should hand back:
working behavior you can verify
changed files that match the brief
checks that actually ran
a short result summary
artifacts you can inspect when needed
explicit open questions or residual risk
a review path that ends in one question: does the implementation hold up?
If you want the concrete review checklist, read How to Review AI Coding Output Before You Merge.
If you want to see the artifact shape first, open Example Review Bundle.
Why this matters for unattended coding¶
The problem with most “AI coding is done” claims is not that the model failed to write code.
It is that the finish line stayed fuzzy.
For small interactive work, you can often patch that with live supervision. For overnight or unattended work, that does not scale.
You need the end state to be legible without replaying the entire session. That is why Ralph Workflow is built around repo-local execution, verification, and artifacts, not just raw logs.
The honest first evaluation path¶
If you are deciding whether Ralph Workflow is worth trying, use this sequence:
Inspect the primary Codeberg repo first: https://codeberg.org/RalphWorkflow/Ralph-Workflow
Run one real bounded task on your own machine
Review the output with this question: does the implementation hold up?
Turn the outcome into one public Codeberg action:
promising run → star or watch on Codeberg
rough run → open a Codeberg issue with the missing proof or friction
Best next public actions:
Inspect / star / watch on Codeberg: https://codeberg.org/RalphWorkflow/Ralph-Workflow
Report first-run friction on Codeberg: https://codeberg.org/RalphWorkflow/Ralph-Workflow/issues/new
Use GitHub only as the mirror: https://github.com/Ralph-Workflow/Ralph-Workflow