ultragoal

Durable repo-native multi-goal plans with embedded success criteria and an evidence audit trail.

The ultragoal skill is LazyCodex (LZX) acting as a goal orchestration agent for multi-goal work that survives across turns and sessions. Every goal carries its own success criteria, every criterion is proven by observable evidence from a real-usage scenario that was actually run, and every pass, fail, block, steering change, and checkpoint is audited in an append-only ledger. It ships in the OmO (oh-my-openagent) plugin that LazyCodex installs into Codex.

A green test suite is supporting evidence, never completion proof. Tests alone never prove done.

Durable state

State lives in the repo, never in the model's head. The skill reads these artifacts before resuming, steering, or checkpointing, and never invents state outside them or omo ultragoal status --json.

Artifact	Purpose
`.omo/ultragoal/brief.md`	The original brief and durable constraints.
`.omo/ultragoal/goals.json`	Goals, each with embedded `successCriteria`.
`.omo/ultragoal/ledger.jsonl`	Append-only audit trail of every result and checkpoint.

All writes go through the omo ultragoal CLI path — state files are never hand-edited.

Manual-QA channels

Each criterion is exercised through exactly one channel, run by the agent itself before recording a PASS. --dry-run, printing the command, "should respond", and "looks correct" never count.

Channel	How
HTTP call	Hit the live endpoint with `curl -i` (or Playwright APIRequestContext); capture status line + headers + body.
tmux	`tmux new-session`, drive with `send-keys`, dump with `capture-pane`; the transcript is the artifact.
Browser use	Drive the real page via Playwright / Puppeteer / Chromium; capture the action log + screenshot.
Computer use	OS-level GUI automation against the running app; capture the action log + screenshot.

Pure CLI stdout, a DB state diff, or a parsed config dump satisfy CLI- or data-shaped criteria but never replace a channel scenario for user-facing behavior.

How it runs

Bootstrap

Create goals from the brief with omo ultragoal create-goals (--brief, --brief-file, or --from-stdin). Refine each goal so it carries 3+ successCriteria covering happy path, edge, regression, and adversarial risk — each with an id, scenario, expectedEvidence, adversarial classes, a stop condition, and the named Manual-QA channel. Then inspect state with omo ultragoal status --json.

Execution loop

Loop per goal, capped at 5 cycles per goal and 3 identical same-criterion failures. Acquire the next goal via omo ultragoal complete-goals --json. For each criterion: plan from the scenario and prior ledger entries, register atomic todos, execute one bounded change, then actually run the named channel scenario and capture the observable artifact.

Clean, then record

Tear down every runtime artifact the scenario spawned — server PIDs, tmux sessions, browser contexts, containers, bound ports, temp files — before recording. A one-line cleanup receipt is embedded in the evidence string. Record exactly one result with omo ultragoal record-evidence --status pass|fail|blocked. A missing cleanup receipt means BLOCKED, not PASS.

Goal completion

Confirm every criterion is pass with omo ultragoal criteria, then omo ultragoal checkpoint --status complete with the evidence summary and a fresh goal snapshot. Blocked or failed goals checkpoint with --status blocked / --status failed plus diagnosis evidence.

Final quality gate

On the last remaining goal with all criteria passing: run targeted verification, run ai-slop-cleaner on changed files, rerun verification, then run $code-review. Clean means recommendation == "APPROVE" and architectStatus == "CLEAR". A clean review checkpoints final completion with --quality-gate-json; a non-clean review records review blockers instead.

Dynamic steering

Steering is only for structured, evidence-backed mutation — natural-language steering requests are rejected. Each kind has required fields and runs through omo ultragoal steer --kind <kind> ... --evidence "<...>" --rationale "<...>" --json. Kinds include add_subgoal, split_subgoal, reorder_pending, revise_pending_wording, revise_criterion, annotate_ledger, and mark_blocked_superseded.

Stop rules

All goals complete + all criteria pass + a clean final quality gate: DONE.
3× the same criterion failing, or 5 cycles on one goal without all-pass: checkpoint failed and surface the diagnosis.
A safety boundary (destructive command, secret exfiltration, production write): block and surface a safe substitute.
Leftover QA state (a live process, tmux session, browser context, bound port, temp dir): not a PASS — clean up, append the receipt, then continue.
The user issues /cancel: release in-progress state cleanly and do not auto-resume.