Verification Before Completion
Core Rule: NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
Step 1: Identify the Proof Command
Before claiming any work is complete, determine the exact command that proves the claim:
| Claim type | Required proof command |
|---|---|
| “Tests pass” | Fresh test runner invocation (e.g., pytest, npm test) |
| “Bug fixed” | Test that reproduces the bug, run after the fix |
| “Deployed” | Health check or status endpoint (e.g., curl https://api/health) |
| “Configured” | Command that reads back the configured value |
| “Build succeeds” | Full build command (e.g., tsc --noEmit, npm run build) |
| “Complete” | All acceptance criteria individually verified |
If no proof command can be identified, do not proceed to a completion claim. Ask the user how to verify.
Step 2: Scan Own Response for Unverified Language
Before finalizing any response that implies completion, check for the following phrases. If any are present, they must be replaced with verified evidence or removed:
| Flagged phrase | Problem |
|---|---|
| “should work now” | Conditional, not verified |
| “that should fix it” | Assumption, not proof |
| “probably passes” | Uncertainty — not verified |
| “looks good” | Visual inspection is not execution |
| “Done!” (before testing) | Premature |
| “I believe this works” | Belief is not evidence |
If any flagged phrase is found: do not publish the response. Execute Step 3 first.
Rationalization Prevention
If you catch yourself thinking any of these, STOP and run verification instead:
| Excuse | Reality |
|---|---|
| “Should work now” | RUN the verification |
| “I’m confident” | Confidence ≠ evidence |
| “Just this once” | No exceptions |
| “Linter passed” | Linter ≠ compiler ≠ tests |
| “Agent said success” | Verify independently — agent reports are claims, not proof |
| “Partial check is enough” | Partial proves nothing about the whole |
| “Different words so rule doesn’t apply” | Spirit over letter — any implication of success counts |
| “I’ll verify after I commit” | Verification gates the commit, not the other way around |
Red Flags — Immediate STOP
- Using “should”, “probably”, “seems to” in a completion claim
- Expressing satisfaction before verification (“Great!”, “Perfect!”, “All done!”)
- About to commit/push/PR without fresh verification output
- Trusting a subagent’s success report without checking the diff/output yourself
- Thinking “just this once” or “it’s a trivial change”
- Relying on a previous run’s output instead of a fresh execution
If any red flag is present: do not publish the response. Execute Step 3 first.
Step 3: Execute the Proof Command
- Run the proof command identified in Step 1.
- Use a fresh run — do not rely on cached output, previous runs, or agent-reported results.
- Capture the full output including exit code.
If execution is blocked (no shell access, command unavailable): state this explicitly. Do not substitute a visual inspection or assumption.
Step 4: Read and Interpret the Output
- Read the complete output, not just the final line.
- Check the exit code: non-zero means failure even if partial output looks successful.
- Check whether the output matches the specific claim being made:
- One test passing does not confirm all tests pass.
- Linter passing does not confirm the build succeeds.
- Agent output saying “success” does not confirm actual success.
- If the output confirms the claim: proceed to Step 5.
- If the output does not confirm the claim: proceed to Step 6.
Step 5: State the Result with Evidence
Report the outcome using this structure:
[Action taken]. Verifying:
$ [command]
[actual output]
[Conclusion with specific evidence, e.g., "All 5 tests pass."]
Do not summarize the output — include it directly. Do not editorialize. Let the output speak.
Step 6: Handle Verification Failure
If verification fails at Step 4:
- Do not claim completion.
- Report what actually happened: quote the relevant output lines.
- Investigate the failure before making any further changes.
- Apply a fix.
- Return to Step 3. Repeat until verification passes.
- Only after Step 4 confirms success: proceed to Step 5.
Step 7: TDD Special Case
For test-driven development, verification requires the full red-green cycle. Apply this extended gate:
- Write the test. Run it. Confirm it fails (red). If it passes without implementation, the test is invalid — rewrite it.
- Implement the feature. Run the test. Confirm it passes (green).
- Refactor if needed. Run the test again. Confirm it still passes.
A test that never failed proves nothing. Skip the red phase at the cost of proof validity.
Adapted from obra/superpowers verification-before-completion skill