Morning Review — 2026-05-29
Recent Commits (Last 24h)
| Commit | Description |
|---|---|
e1a571d2 | docs(posts): add evening retrospective for 2026-05-28 (#3201) |
ea2c4662 | docs(posts): add morning review for 2026-05-28 (#3199) |
No code changes landed in the last 24 hours. The last real code fix was 6ac8f851 (fix(runner): reject CLI parser diagnostics — part of v0.73.14), which remains undeployed because the service has not been restarted.
Operational Health
Overall: Degraded. The service is still running 0.73.8 (Day 5 of no restart). Codex failure rate has worsened to ~86%. Claude and opencode are carrying all productive work. Multi-agent degradation (kimi/minimax/glm) continues with ~1d10h remaining on cooldowns. Throughput is up from yesterday despite the degraded pool.
Service Version
CLI: 0.73.13
Service: 0.73.13 ✓ in sync ← FALSE REPORT (issue #3200)
Latest: 0.73.14 ⚠ upgrade available
Running binary: /opt/homebrew/Cellar/orch/0.73.8/bin/orch (PID 84871)The engine is still running 0.73.8. orch version falsely reports "in sync" — this is the known bug (#3200). The lsof confirms it: PID 84871 maps to Cellar/orch/0.73.8/bin/orch. Both #3190 (remove codex --ask-for-approval) and #3198 (reject CLI diagnostics in synthesize) are not running in production. A single brew services restart orch fixes everything.
Agent/Model Health (Last 24h)
| Agent | Model | Outcome | Count |
|---|---|---|---|
| claude | opus | success | 27 |
| claude | sonnet | success | 22 |
| opencode | mimo-v2.5-free | success | 15 |
| opencode | deepseek-v4-flash-free | success | 13 |
| codex | gpt-5.3-codex | failed | 18 |
| codex | gpt-5.4 | success | 8 |
| opencode | nemotron-3-super-free | success | 4 |
| claude | opus | failed | 4 |
| opencode | github-copilot/gpt-5-mini | failed | 5 |
| glm | opus | rate_limit | 2 |
| codex | gpt-5.3-codex | success | 3 |
| codex | gpt-5.3-codex | rate_limit | 1 |
| minimax | opus | rate_limit | 1 |
Key observations:
- Codex gpt-5.3-codex failure rate: ~86% (18 failed / 21 total). Day 5 of degradation. Root cause remains the running 0.73.8 engine emitting
--ask-for-approval— a flag codex 0.133.0+ rejects with a clap error. Fix is deployed in the installed binary but will only take effect on service restart. - Claude remains the primary workhorse: opus 87% success (27/31), sonnet near-perfect. The 4 opus failures are new — monitor but not alarming at current rates.
- opencode/deepseek-v4-flash-free recovered — 13 successes overnight, exactly as the retro predicted (was in ~58m cooldown at 23:02 UTC).
- opencode/mimo-v2.5-free continues strong at 15 successes — now the second most active model.
- opencode/github-copilot/gpt-5-mini in a 6d11h cooldown — the 5 failed runs predate that cooldown.
Active Cooldowns
| Key | Remaining | Reason |
|---|---|---|
| glm | 1d 10h | agent_error |
| kimi | 1d 10h | billing_cycle_exhausted |
| minimax | 1d 10h | agent_error |
| opencode:github-copilot/gpt-5-mini | 6d 11h | persisted |
The kimi/minimax/glm triple degradation continues. All three expire roughly 20:00 UTC tonight. opencode:deepseek-v4-flash-free has cleared (no longer listed) — routing pool is recovering.
Task Activity (Last 12h)
| Event | Count |
|---|---|
| status_change | 445 |
| dispatch | 128 |
| branch_delete | 124 |
| push | 96 |
| review_start | 64 |
| routed | 62 |
| review_decision | 40 |
| pr_create | 39 |
| error | 33 |
| rerouted | 3 |
Throughput improved versus yesterday (445 vs 328 status_changes). 39 PRs created with 128 dispatches — productive day driven by claude and opencode. The error count (33) is slightly elevated but proportional to volume; no crash-level errors in logs. Error log is 0 bytes (clean service run).
Log Patterns
- Recurring
WARN multi-agent degradation detectedevery tick: kimi/minimax/glm — expected, benign, cosmetic noise. - One transient
WARN HTTP send failed, will retryfrom GitHub API — auto-recovered, not a pattern. - No WATCHDOG fires, no panics, no critical errors.
Stuck / Blocked Tasks
- internal:149337 — blocked (18 days). SSH agent signing failure on auto-merge push:
sign_and_send_pubkey: signing failed for ED25519 "/Users/gb/.ssh/default_id_ed25519.pub". Operator action required:ssh-add ~/.ssh/default_id_ed25519 orch task unblock all
Retro Follow-ups
| Item | Status |
|---|---|
| Service restart / upgrade to 0.73.13+ | NOT DONE (Day 5) — CRITICAL |
| Upgrade to 0.73.14 | NOT DONE |
| Unblock internal:149337 (ssh-add) | NOT DONE (Day 18) |
| Prune dead opencode model entries from config | NOT DONE |
| Verify WATCHDOG stalls don't recur | ✓ No stalls today |
| opencode/deepseek-v4-flash-free recovery | ✓ Recovered — 13 successes |
| Minimax investigation after recovery | kimi/minimax/glm all still cooled; re-check tonight |
Priorities For Today
CRITICAL (operator — blocks codex entirely)
Restart the service and upgrade:
orch service restart # loads installed 0.73.13 (fixes #3190, #3198) brew update && brew upgrade orch # gets 0.73.14 brew services restart orch # deploys 0.73.14 lsof -p $(pgrep -f 'orch serve' | head -1) | grep -i 'txt.*Cellar/orch' # verify real binaryThis is Day 5. Every codex task run is either silently faked or erroring. The fix is two commands.
Unblock internal:149337 (Day 18):
ssh-add ~/.ssh/default_id_ed25519 orch task unblock all
Monitoring (once service restarts)
- Verify codex recovery — post-restart runs should show non-zero cost, runtime >30s, real work product.
- Watch claude/opus failures — 4 failures today is new. If it persists post-restart, investigate failure mode.
- kimi/minimax/glm cooldown expiry tonight (~20:00 UTC) — confirm they re-enter the routing pool cleanly without immediately re-entering cooldown.
- Prune dead opencode model entries from
~/.orch/config.yml(github-copilot/gpt-5.3,github-copilot/claude-opus-4.6) — reduces router WARN noise every tick.
Open Code Issues
- #3200 —
orch versionfalse "✓ in sync": fix should PID-bind the version file or query the live engine. Still open, not yet assigned.
Prepared by Orch automation (internal:150845)