Morning Review — 2026-05-28
Recent Commits (Last 24h)
| Commit | Description |
|---|---|
6ac8f851 | fix(runner): reject CLI parser diagnostics in synthesize_response_from_text (#3198) |
f99c15e9 | docs(posts): add morning review for 2026-05-27 (#3196) |
One code fix landed: the runner now rejects CLI parser diagnostic messages when synthesizing text responses, preventing spurious parse errors from being treated as real content.
Operational Health
Overall: Improved. The operator completed the long-pending service upgrade — CLI and Service are now both at 0.73.13. A new release 0.73.14 is already available. WATCHDOG stalls occurred this morning on the old 0.73.8 service during routing, but should not recur now that the cascade fix from #3189 is deployed. Multi-agent degradation persists (kimi/glm) but minimax is near expiry (~2h). Codex gpt-5.3-codex failure rate worsened to ~79%.
Service Version
CLI: 0.73.13
Service: 0.73.13 ✓ in sync
Latest: 0.73.14 ⚠ upgrade availableThe service upgrade that was carried for 3+ days has been completed. CLI and Service are now aligned at 0.73.13. However, 0.73.14 is already available — operator should run the upgrade cycle again:
brew update && brew upgrade orch && brew services restart orchWATCHDOG Stalls (This Morning, Pre-Upgrade)
The router stalled for ~3.5 minutes while routing this task (internal:150765) on the old 0.73.8 service. The cascade attempted 4 failing pool entries before succeeding:
opencode/nemotron-3-super-free→ "no text output" → cooldown recordedkimi/haiku→ LLM pool timeout (~50s)minimax/haiku→ LLM pool timeout (~50s)glm/haiku→ LLM pool timeout (~50s)claude/haiku→ LLM router succeeded, selected opencode → rerouted to claude (opencode cooled)
WATCHDOG fired at 80s, 110s, 140s, 170s, 200s, 230s. The cascade fix (#3189) was in 0.73.x but NOT in 0.73.8. Now that the service is on 0.73.13, stalls should not recur. Monitor today to confirm.
Multi-Agent Degradation
| Agent | Reason | Cooldown Remaining |
|---|---|---|
| minimax | agent_error | ~2 hours |
| glm | agent_error | ~10 hours |
| kimi | billing_cycle_exhausted | ~58 hours |
Minimax is near recovery — cooldown expires within ~2 hours. GLM expires tonight. Kimi remains out for ~2.4 more days. Routing continues to fall back correctly to claude, opencode, and codex.
Agent/Model Health (Last 24h)
| Agent | Model | Outcome | Count |
|---|---|---|---|
| opencode | deepseek-v4-flash-free | success | 24 |
| claude | opus | success | 22 |
| claude | sonnet | success | 19 |
| codex | gpt-5.3-codex | failed | 15 |
| codex | gpt-5.4 | success | 6 |
| codex | gpt-5.3-codex | success | 4 |
| opencode | mimo-v2.5-free | success | 4 |
| claude | haiku | success | 3 |
| opencode | nemotron-3-super-free | success | 3 |
| opencode | github-copilot/gpt-5-mini | failed | 2 |
| codex | gpt-5.3-codex | rate_limit | 1 |
Key observations:
- opencode/deepseek-v4-flash-free is the new throughput leader (24 successes). New star performer.
- Claude remains healthy: opus 100%, sonnet ~95%. Haiku clean.
- Codex gpt-5.3-codex failure rate worsened to ~79% (15 failed / 19 total) — up from ~47% yesterday. Generic cooldown active. Third consecutive day of degradation.
- codex gpt-5.4 continues healthy (6/6 successes).
- opencode/mimo-v2.5-free appeared with 4 clean successes.
cooldown:github:5xxis active — GitHub returning 5xx errors to codex; contributing to failures.
Task Activity (Last 12h)
| Event | Count |
|---|---|
| status_change | 328 |
| branch_delete | 130 |
| dispatch | 107 |
| push | 86 |
| review_start | 53 |
| routed | 52 |
| review_decision | 34 |
| pr_create | 34 |
| error | 21 |
| rerouted | 1 |
Throughput is reduced compared to yesterday (328 vs 1,476 status_changes). The WATCHDOG stall this morning likely consumed a significant tick window. Error count (21) is low — a healthy signal for the volume processed.
Stuck / Blocked Tasks
- internal:149337 — blocked (17d). SSH agent signing failure during auto-merge push:
sign_and_send_pubkey: signing failed for ED25519 "/Users/gb/.ssh/default_id_ed25519.pub". Requires operator:ssh-add ~/.ssh/default_id_ed25519.
Retro Follow-ups
- RESOLVED: Service upgrade to 0.73.13 — completed! CLI and Service now in sync.
- NEW: Upgrade to 0.73.14 —
brew update && brew upgrade orch && brew services restart orch. - Operator (persistent): Resolve internal:149337 SSH signing failure —
ssh-add ~/.ssh/default_id_ed25519. - Operator (persistent): Prune stale opencode model entries (
github-copilot/gpt-5.3,github-copilot/claude-opus-4.6) from~/.orch/config.ymlto reduce router WARN noise. - Monitor: WATCHDOG stalls were pre-upgrade. Verify they don't recur on 0.73.13 today.
- Monitor: Codex gpt-5.3-codex failure rate at ~79% for 3 days. Generic cooldown active. If rate persists through today, consider whether there's an underlying API issue distinct from the 0.133.0
approval_policyfix (#3190).
Priorities For Today
- Operator (new):
brew update && brew upgrade orch && brew services restart orch— 0.73.14 available. - Operator:
ssh-add ~/.ssh/default_id_ed25519— unblock internal:149337 (17 days stale). - Operator: Prune dead opencode model entries from config.
- Monitor: Confirm WATCHDOG stalls don't recur on 0.73.13 with cascade fix in effect.
- Monitor: Codex gpt-5.3-codex. Day 3 of elevated failures. If still above 60% today, worth checking codex CLI changelog or API status.
- Watch: Minimax recovers in ~2h — verify it re-enters the routing pool cleanly.
Prepared by Orch automation (internal:150765)