The verification model¶
Mental model page. This is explanation, not a how-to. For the practical verification contract, see
docs/agents/verification.mdand Advanced pipeline configuration.
Ralph Workflow treats verification as a non-bypassable gate that runs after every code-affecting phase and at the terminal of every run. The verification model is the same regardless of which agent produced the change, which phase emitted the artifact, or which policy bundle drove the run.
What make verify proves¶
make verify is the canonical verification command. It runs four kinds
of checks:
Lint —
ruff check ralph/ tests/Typecheck —
mypy ralph/Test —
pytestunder the immutable 60-second combined budgetAudit — the
ralph.testing.audit_*scripts that detect circumvention of the policy and quality gates
A clean make verify proves:
The Python code is lint-clean
The Python code is type-clean
The unit and integration tests pass within the budget
No audit invariant has been silently weakened
A green make verify is a necessary precondition for declaring work
done, but it is not sufficient: the runtime also verifies the run
artifact against the phase’s declared contract (see
Artifact lifecycle).
The 60-second combined test budget — immutable¶
The test budget is 60 seconds, combined, ABSOLUTE and IMMUTABLE. This
is enforced by ralph/verify.py:_TOTAL_TEST_BUDGET_SECONDS = 60.0 and
tracked cumulatively across all _BUDGET_TRACKED_STEPS via
time.monotonic().
The budget cannot be circumvented by:
Splitting tests into more suites (cumulative tracker sums time across ALL budget-tracked steps)
Moving slow tests to a different suite, target, or Makefile recipe
Renaming test targets without updating
_KNOWN_TEST_STEP_LABELSRaising
DEFAULT_SUITE_TIMEOUT_SECONDSorPYTEST_SUITE_TIMEOUT_SECONDSSetting
RALPH_PYTEST_SUITE_TIMEOUT_SECONDSorRALPH_PYTEST_TEST_TIMEOUT_SECONDSRaising
_TOTAL_TEST_BUDGET_SECONDS(blocked by import-timeRuntimeErrorchecks — immune topython -O)Emptying
_KNOWN_TEST_STEP_LABELSto hide test stepsEmptying
_BUDGET_TRACKED_STEPSto disable enforcementRemoving
'make test'from_KNOWN_TEST_STEP_LABELS
Each RuntimeError is enforced via if/raise, not assert, so it
survives python -O. A timeout failure is a test design defect — fix
the test, not the budget.
The audit invariant set¶
Ralph Workflow ships with 14 audit scripts in ralph/testing/audit_*.py.
Each one detects a class of circumvention:
Audit |
Detects |
|---|---|
|
Lint rule weakening via per-file-ignores or blanket noqa |
|
Mypy rule weakening via |
|
Real I/O or |
|
Unbounded blocking calls in |
|
Unbounded accumulators (deque without maxlen, unbounded lists) |
|
Artifact writes not via canonical path |
|
Dormant parallel mode invariant violations |
… and 7 more |
See |
Each audit has a documented allowlist. Adding an entry to an allowlist is the only way to weaken a check, and the entry must cite a real justification.
Per-step timeouts¶
The runtime enforces per-step timeouts (_VERIFY_STEP_TIMEOUT_SECONDS)
in addition to the combined budget. The per-step timeout is a secondary
cap — it cannot extend the combined budget, only fail fast on a stuck
step. The default per-step timeout is >= 5.0 seconds.
Non-circumvention rules¶
The verification model has explicit non-circumvention rules. The full set
lives in AGENTS.md; the highlights:
Lint, typecheck, and test checks cannot be weakened to get green
The MCP timeout contract cannot be bypassed without an inline marker and a documented reason
Resource accumulators must carry a size cap or a justified marker
The test budget cannot be circumvented by splitting or renaming
Artifact submissions must go through the canonical path
The fabrication guard cannot be weakened or skipped
These are policies, not suggestions. Each one is enforced by an audit or a runtime check, and each bypass requires an entry in a documented allowlist.
Why the verification model is strict¶
The verification model exists because the project has shipped bugs, stale claims, and fabricated stats. The strict checks are the response: they make the failure mode loud rather than silent. Every rule that seems excessive is the scar tissue of a real failure that happened.
The 60-second budget, in particular, exists because slow tests create feedback loops that erode developer trust in the test suite. A test that takes 30 seconds to run is a test that gets skipped in the inner loop. The budget forces the test design to be fast by construction, which forces production code to be testable by construction.
What to read next¶
docs/agents/verification.md— the contributor-side verification contractAGENTS.md— the full non-circumvention policyAdvanced pipeline configuration — per-phase verification overrides
Watchdogs and timeouts — the runtime watchdog contract