Your First Overnight Task with Ralph Workflow: A Start-Here Guide

You have a backlog. Some of those items are real work — not toy tasks, not AI demos, but things you would actually merge if they were done right. The question is not whether an AI coding agent can write code. The question is whether it can do a task well enough that you would trust the result without a forensic audit.

This guide helps you pick the right first task, set it up, and judge the outcome honestly. It assumes you have read about Ralph Workflow and want to try it on something real. If you have not read about it yet, the project lives on Codeberg — open source, vendor-neutral, runs on your machine.

Step 1: Pick the right first task

The most common mistake is starting with a task that is too ambitious or too vague. Either kills trust in the tool, even when the tool itself would have worked.

A good first task fits four criteria:

1. It has a clear boundary. You should be able to describe what "done" means in one sentence. If you need a paragraph to define the boundary, the task is too big.

2. It has a clear correctness check. Something concrete must tell you whether the result is right — tests that pass, a script that runs end to end, a diff shape you can recognize at a glance.

3. It is real but not critical. Pick a backlog item you would actually merge, but not one where a mistake breaks production. The goal is to build trust in the workflow, not to gamble the deployment.

4. It is 2-6 hours of work. Micro-tasks (15 minutes) do not exercise the loop. Multi-day monsters do not let you build a tight feedback cycle. A good first task fits in an evening while you sleep.

Good candidates

Type	Example	Why it works
Refactor	Split an 800-line module into layers	Clear boundary, tests validate behavior
Migration	Move from config files to a database-backed config	Mechanical work, well-scoped
Test coverage	Add unit tests for an untested service class	Verifiable, low risk
Documentation	Generate API docs from docstrings + examples	Helpful, hard to break
Dependency bump	Upgrade a library with breaking changes	Tests catch failures

Bad candidates

Type	Why it fails
"Improve performance"	Too vague — no clear correctness check
"Add feature X" where X requires design decisions	AI will guess instead of asking
Tasks needing access to SaaS dashboards	The agent cannot log into your Stripe console
"Explore and suggest" assignments	No concrete deliverable to judge

The sweet spot: a task where the path is clear but the work is tedious enough that you would prefer not to do it yourself.

Action: Open your backlog right now. Find one item that fits all four criteria. Open it in a tab. You will write the spec next.

Step 2: Write the spec in five minutes

Ralph Workflow uses PROMPT.md in your project root as the task contract. This file is the single most important thing you will write. A weak spec produces a weak run. A tight spec produces a run you can actually review.

The spec is not a prompt. A prompt says "please do this." A spec defines what done means.

The spec template

## Task
[One sentence describing what to do.]

## Scope
- [Concrete boundary 1]
- [Concrete boundary 2]
- [Concrete boundary 3]

## What must not change
- [Behavior that must be preserved]
- [Tests that must still pass]
- [Public interfaces that must stay stable]

## Verification
- [ ] [Test or check 1]
- [ ] [Test or check 2]
- [ ] [Test or check 3]

Here is a real example. The task is to replace raw SQL queries in a data-access layer with an ORM while keeping all existing tests passing.

## Task
Migrate all raw SQL queries in `dal/report_store.py` to SQLAlchemy ORM, using the existing model definitions in `models/report.py`.

## Scope
- Replace raw `execute()` calls with SQLAlchemy session queries
- Keep the same method signatures on `ReportStore`
- Add session-scoped transaction handling

## What must not change
- Every test in `tests/unit/test_report_store.py` must pass without modification
- The `ReportStore` public API surface stays identical
- Return types (list of dicts) stay the same even though the backend changes

## Verification
- [ ] `pytest tests/unit/test_report_store.py -v` passes
- [ ] `pytest tests/integration/ -v` passes
- [ ] `black --check dal/` passes (no formatting drift)
- [ ] `mypy dal/` passes (type coverage holds)

That took about four minutes to write. The key elements: scope boundaries, invariants, and concrete verification steps.

The "what must not change" section is mandatory

This is where most prompts go wrong. They describe what to do but not what to protect. The invariant section is what makes the output reviewable — you can scan the diff and check whether the protected things stayed intact.

Action: Write your spec using the template above. Take five minutes. Keep it under 25 lines. If it is longer, the task is either too large or your boundaries are fuzzy. Save it as PROMPT.md in your project root.

Step 3: Start the run and walk away

pip install ralph-workflow
ralph --pipeline build

That is it. Ralph Workflow takes over from here:

Planning pass. The agent reads your PROMPT.md, your repo structure, and your project conventions. It produces a plan. If the plan is weak — vague steps, missing boundaries — the workflow loops it back rather than charging into implementation.
Implementation. The agent writes code against the plan. Your filesystem carries the state forward. No long chat session accumulating fog.
Verification. Your spec's verification steps run automatically. If tests fail, the workflow retries with the failure context. If they pass, the handoff is clean.
Handoff. You get back commits, logs, and a diff you can read in a few minutes.

The whole thing runs on your machine, uses the agents you already have installed (Claude Code, Codex, whatever is on your PATH), and does not phone home. You can inspect every line on Codeberg — it is AGPL-3.0, vendor-neutral, and local-first.

Step 4: The morning-after review

This is the part that matters. Open your laptop, check the diff, and ask yourself three questions:

1. Would I merge this?

Not "would I merge this after fixing a few things." Would you merge it as-is? If yes, the workflow did its job. If no, ask the next question.

2. What specifically would I need to change?

The answer should be a short list. "The error handling in the new module swallows exceptions silently" is useful feedback. "It did not feel right" is not.

3. Could I have written a better spec to prevent this?

Most bad results are spec failures, not code-generation failures. If the agent did something you did not want, ask whether your PROMPT.md actually said not to do it. The invariant section should have caught it. If it did not, add it for next time.

This loop — run, review, tighten the spec — is the engine that makes unattended coding work. Each run teaches you something about what makes a good spec. After three or four runs, you will develop a sense for what "done" means in a way the agent can actually verify.

Step 5: What to do next

After your first successful run, you have options:

Run it again tonight. Pick the next backlog item. Same workflow, less anxiety.

Tune your verification steps. Your spec's verification section is the highest-leverage part. Spend time on it. Good verification gates make the difference between "I trust this" and "I need to read every line."

Try a larger task. After a few successful runs, you can scope tasks that take 4-8 hours. The key is keeping boundaries tight even as the work gets larger.

Use two agents together. Ralph Workflow can run Claude Code for planning and Codex for implementation, or vice versa. Sometimes one model is stronger at design and the other at execution. The workflow handles the handoff.

Configure checkpoints. For long tasks, Ralph Workflow can checkpoint after verification gates pass. If a later step fails, you resume from the last good checkpoint instead of starting over.

The important thing: the first run builds the muscle. After one good outcome, "run it tonight" becomes a real option instead of a hypothetical.

The honest assessment

Ralph Workflow is not magic. Here is when it works and when it does not.

When it works

Tasks with clear boundaries and existing tests
Mechanical work (refactoring, migration, test expansion)
Tasks where "correct" is checkable before merge
Repos with good project conventions the agent can learn from

When it does not

Greenfield design where requirements are still forming
Tasks requiring access to SaaS dashboards, databases you cannot expose, or accounts you would not share
Work that needs a lot of back-and-forth with humans
Tasks where the spec itself would take longer than just doing the work

If your task falls in the first list, you are in the right place. If it falls in the second, write a tighter spec or pick a different task.

Start here, now

Get Ralph Workflow on Codeberg — it is free, open source (AGPL-3.0), and runs on your machine. The repo includes a getting-started guide, example specs, and full documentation.

A mirror is also available on GitHub.

Pick a backlog task. Write the spec. Run it tonight. Come back to something reviewable.

Quick install: pipx install ralph-workflow

Your First Overnight Task with Ralph Workflow: A Start-Here Guide

Step 1: Pick the right first task

Good candidates

Bad candidates

Step 2: Write the spec in five minutes

The spec template

The "what must not change" section is mandatory

Step 3: Start the run and walk away

Step 4: The morning-after review

Step 5: What to do next

The honest assessment

When it works

When it does not

Start here, now

Related Posts

Good vs Bad Unattended AI Coding Tasks: How to Know Before You Start

When Your Overnight AI Coding Run Fails: A Troubleshooting Guide

Overnight Refactoring with Ralph Workflow: A Walkthrough

Step 1: Pick the right first task

Good candidates

Bad candidates

Step 2: Write the spec in five minutes

The spec template

The "what must not change" section is mandatory

Step 3: Start the run and walk away

Step 4: The morning-after review

Step 5: What to do next

The honest assessment

When it works

When it does not

Start here, now

Related Posts

Related posts

Good vs Bad Unattended AI Coding Tasks: How to Know Before You Start

When Your Overnight AI Coding Run Fails: A Troubleshooting Guide

Overnight Refactoring with Ralph Workflow: A Walkthrough