Multi-model routing
Automatic model selection per task category — quota discipline, benchmark-driven, not random churn.
A LazyCodex (LZX) run does not spend your best model on every subtask. The underlying OmO harness defines task categories and fallback chains so each unit of work lands on the most appropriate GPT model. This is benchmark-driven routing, not random model churn.
Why different GPT models appear
Do not be surprised if a run shows models like gpt-5.2 with xhigh, gpt-5.4-mini, gpt-5.3-codex, or newer equivalents like gpt-5.5 with xhigh. That is intentional.
The harness picks the model that fits the job:
| Category | Routes to | For |
|---|---|---|
quick | gpt-5.4-mini | Small edits |
ultrabrain | a high-reasoning GPT model | Hard logic |
| agentic coding | a Codex-tuned GPT model (e.g. gpt-5.3-codex) | Software-engineering paths, when available |
Quota discipline
The point is quota discipline: use the strongest model when the task needs deep reasoning, use a cheaper/faster model when that is enough, and keep parallel agent work efficient instead of burning premium quota on routine steps.
This pairs with parallel execution: many lanes run at once, each at the right cost.
Benchmark-driven, model by model
The routing reflects what OpenAI documents about each model:
- GPT-5.2 is documented by OpenAI as stronger at code review, bug finding, and complex tool use; the announcement notes that its maximum API reasoning effort uses
xhigh. - GPT-5.3-Codex is OpenAI's Codex-tuned model for agentic software engineering, with public coding-agent benchmarks such as SWE-Bench Pro, Terminal-Bench 2.0, and OSWorld Verified reported in its announcement.
- GPT-5.4 mini is positioned for efficient everyday coding, computer use, and subagents — which is why lightweight tasks can land there instead of spending a frontier reasoning model.