Evening Retrospective — 2026-05-20
Summary
A strong day for self-improvement: three bugs fixed (router budget reset, parser normalization gap, stale model_map entries), one long-standing auth issue closed (#3110, open 9+ days), and a new feature shipped (prompt-file task bodies). Throughput remained healthy at ~175 successes vs ~12 failures in 24h. The service is lagging behind the latest release (0.73.0 vs 0.73.3 latest), meaning today's fixes are not yet deployed in production.
What Was Accomplished
| PR | Description | Closes |
|---|---|---|
| #3170 | bug(router): LLM routing budget fallback recurs throughout day, degrading to round-robin | #3167 |
| #3171 | bug(parser): skipped status alias not normalized, causing avoidable run failures | #3168 |
| #3172 | fix(router): prune unavailable opencode models from model_map at config load time | #3169 |
| #3166 | feat(jobs): allow task body to be loaded from a prompt file | — |
Additionally, issue #3110 (Claude auth 401 Invalid authentication credentials, blocked 9+ days) was closed today — the most significant open blocker from the morning review.
Task Run Outcomes (Last 24h)
| Agent | Model | Outcome | Count |
|---|---|---|---|
| claude | sonnet | success | 47 |
| kimi | opus | success | 37 |
| codex | gpt-5.3-codex | success | 29 |
| opencode | github-copilot/gpt-5-mini | success | 18 |
| opencode | github-copilot/claude-sonnet-4.6 | success | 13 |
| opencode | opencode/qwen3.6-plus-free | success | 5 |
| kimi | opus | failed | 4 |
| claude | sonnet | failed | 3 |
| codex | gpt-5.3-codex | failed | 3 |
| glm | opus | rate_limit | 2 |
| minimax | opus | rate_limit | 2 |
| opencode | github-copilot/gpt-5.3 | failed | 1 |
| kimi | opus | aborted | 1 |
| codex | gpt-5.3-codex | blocked | 1 |
| opencode | github-copilot/gpt-5-mini | blocked | 1 |
Success rate: ~175 successes vs ~12 failures + 4 rate limits — healthy, consistent with recent days.
What Failed, Retried, Or Needed Intervention
1) Routing budget reset bug — fixed today
The LLM routing budget (llm_budget_secs) was being reset with each task routing decision, causing budget exhaustion to re-trigger throughout the day and effectively degrading the router to round-robin for all subsequent tasks. PR #3170 fixes the reset semantics. Root cause was subtle: the budget counter reset on each routing attempt rather than persisting across the tick. This explains routing quality fluctuations observed in prior days.
2) skipped status alias gap — fixed today
The parser had a normalization map for most status aliases (complete, ready_for_review, etc.) but skipped was missing, causing runner failures when an agent used that value. PR #3171 adds skipped → done normalization. This was one of the 8 "unrecognized status" failures tracked in the known issue from yesterday.
3) Stale opencode model_map entries — fixed today
PR #3172 prunes unavailable opencode models at config load time rather than relying only on runtime warnings. One residual github-copilot/gpt-5.3 failure still appeared in today's data, but this was likely a pre-fix in-flight dispatch. Should trend to zero once service upgrades to 0.73.1+.
4) kimi review failure rate — ongoing
kimi/opus shows 4 failures + 1 abort out of 43 runs (~11% failure rate). The is_hard_failure gap (#3159) — where agent_result_is_error=true bypasses the terminal_reason:completed rescue in review.rs:851 — is still open. Tasks self-recover via minimax fallback, so throughput is preserved, but the underlying misclassification has not been fixed.
5) Service version lag — ongoing
| Component | Version |
|---|---|
| CLI | 0.72.0 |
| Service | 0.73.0 |
| Latest release | 0.73.3 |
Today's three merged fixes are included in 0.73.1–0.73.3 and are NOT yet running in production. The service needs upgrading before the routing budget fix and model_map pruning take effect operationally.
6) Four blocked opencode/gpt-5-mini internal tasks
Tasks internal:149970, internal:149863, internal:149855, internal:149673 are all blocked on opencode/github-copilot/gpt-5-mini with zero review cycles and no block_reason. This is anomalous — tasks with 0 review cycles and no explicit block reason should not be blocked. These may be stale entries from a prior known gap or a corner case in block escalation logic.
Routing Accuracy
Routing distributed load appropriately across the four primary agents (claude, kimi, codex, opencode). Today's fixes directly address two systemic routing degradations:
- Round-robin fallback on budget exhaustion (now properly bounded per tick, not per day)
- Model_map containing dead entries (now pruned at startup)
No evidence of systemic misrouting. GLM/MiniMax rate limits remain low-volume and contained.
Morning Review Follow-up
| Priority | Status |
|---|---|
| Monitor #3110 (Claude 401) | ✅ Closed today |
| Stale model WARN volume | ✅ Fixed by PR #3172 (pending service upgrade) |
| Throughput health | ✅ Healthy, no action needed |
| internal:149337 SSH block | ⏳ Still blocked, SSH environment issue |
Issues Created
None. All observed problems today were either:
- Already addressed by today's merged PRs, or
- Previously tracked (kimi #3159, SSH block #149337), or
- Awaiting investigation after service upgrade
Priorities For Tomorrow's Morning Review
- Upgrade service to 0.73.3 — today's three fixes (router budget, parser normalization, model_map pruning) are not yet running.
brew update && brew upgrade orch && orch service restart. - Verify
skippednormalization fix is effective — checktask_runsfor any remaining "unrecognized status" errors after upgrade. - Investigate the 4 blocked opencode/gpt-5-mini tasks — zero review cycles + no block_reason is anomalous; check
task_runsfor each to determine actual failure cause. - Kimi review is_hard_failure (#3159) — ongoing at ~11% review failure rate; if not fixed soon, consider escalating to an explicit fix issue.
- internal:149337 SSH block — if this persists another day, prompt the operator to restart the SSH agent.
Prepared by Orch automation (internal:150049).