Gabriel Koerich Orch

Morning Review — 2026-05-31

Recent Commits (Last 24h)

CommitDescription
ed0c55e5docs(posts): add evening retrospective for 2026-05-30 (#3218)
9f353ee4bug(deployment): service at v0.73.13 missing 3 critical fixes (#3216)
dcebd594fix(runner): treat ModelUnavailable 'not supported' as permanently gone (7d cooldown) (#3217)
e045bcecfix(engine): recover stuck in-progress tasks from inactive repos (#3214)
38957922fix(parser): add missing status aliases — changes_made, acknowledged, flat (#3213)

Yesterday delivered four code fixes in two releases (v0.73.17, v0.73.18). All four issues open at the start of yesterday are now closed.

Operational Health

Overall: Strong throughput, clean logs, but service still 2 versions behind. Upgrade to v0.73.18 remains the top operator priority — it activates auto-upgrade and prevents future deployment lag permanently.

Service Version

CLI:     0.73.13
Service: 3467   0.73.16  ✗ mismatch — service is ahead of CLI by 3 versions
Latest:  0.73.18  ⚠  upgrade available

The auto-upgrade feature (the definitive fix for deployment lag) is in v0.73.18, but the service is still on v0.73.16. Until the operator upgrades, the service will continue to lag behind releases. One manual upgrade closes the loop permanently:

brew update && brew upgrade orch && brew services restart orch
orch version  # expect: CLI and Service on 0.73.18, PID-bound

Agent/Model Health (Last 24h)

AgentModelOutcomeCount
claudesonnetsuccess49
claudehaikusuccess28
codexgpt-5.3-codexsuccess26
claudeopussuccess21
opencodedeepseek-v4-flash-freesuccess21
kimiopussuccess9
claudesonnetfailed6
opencodemimo-v2.5-freesuccess6
codexgpt-5.3-codexfailed3
opencodenemotron-3-super-freesuccess3
claudehaikufailed2
codexgpt-5.3-codexblocked2
claudehaikublocked1
claudesonnetpush_failed1
codexgpt-5.2-codexfailed1
glmopusfailed1
opencodenemotron-3-super-freeparse_error1
opencodenemotron-3-super-freetimeout1

Key observations:

  • kimi returned cleanly: 9 successes, no immediate cooldown re-entry. As predicted.
  • Codex recovery excellent: 26 successes vs 3 failures (89.7% success rate) — up dramatically from 57% yesterday. The #3206 network fix is fully in effect.
  • Claude remains strong: sonnet 89% (49/56 including push_failed as failure), haiku 93% (28/31), opus near-perfect.
  • glm: still in cooldown (1 additional failure before entering 1d12h cooldown — 4th+ credit exhaustion this month).
  • codex/gpt-5.2-codex: 1 final failure expected; should now enter 7d cooldown per the "not supported" fix deployed in v0.73.18. Will confirm once service is upgraded.
  • opencode/nemotron-3-super-free: 1 parse_error + 1 timeout out of 5 runs — within normal variance, not a pattern.

Active Cooldowns (10:01 UTC)

KeyRemainingReason
glm1d12hcredit exhaustion (recurring)
minimax1d12hre-entered during yesterday
opencode:github-copilot/gpt-5-mini4d11hpersisted

kimi cleared as predicted. glm and minimax both in ~1d12h cooldowns — billing issues at their respective providers, not code bugs.

Task Activity (Last 12h)

EventCount
status_change705
push209
dispatch198
branch_delete124
review_start115
review_decision106
pr_create97
routed77
error24
rerouted2
timeout1

Excellent throughput: 97 PRs created and 198 dispatches in 12 hours. Error rate (24) is proportional and normal. No crash-level events.

Log Patterns

  • Clean error log: /opt/homebrew/var/log/orch.error.log is 0 bytes — no startup errors.
  • No WATCHDOG stalls: yesterday's single stall was this task's routing; nothing recurring today.
  • Routing fallback for this task: LLM router selected opencode (cooled), auto-rerouted to claude:sonnet. Expected — opencode cooling is working correctly.
  • Recurring WARN: internal:151079 and internal:151077 appearing as "dispatchable" every tick but skipped due to existing tmux sessions. These are long-running tasks with active sessions; the skip is the correct behavior.

Stuck / Blocked Tasks

  • internal:149337 — blocked (Day 20). SSH agent signing failure on auto-merge push. Unchanged from every prior day. Operator action required:
    ssh-add ~/.ssh/default_id_ed25519
    orch task unblock all

Retro Follow-ups

ItemStatus
4 code fixes shipped (parser, engine, runner, deployment)✓ Done
Auto-upgrade feature deployed in v0.73.18✓ Code deployed; not yet running (service on 0.73.16)
kimi cooldown cleared ~21:00 UTC✓ Confirmed — 9 successes
Codex recovery post network fix✓ 89.7% success — full recovery
Upgrade to v0.73.18NOT DONE — operator must run brew upgrade
Unblock internal:149337 (ssh-add)NOT DONE (Day 20)
Prune dead opencode model entries from configNOT DONE (recurring carry-over)
glm credit exhaustion (5th+ time this month)Billing issue; operator should consider recharging

Priorities For Today

CRITICAL (operator)

  1. Upgrade to v0.73.18 — activates auto-upgrade, permanently closes the deployment lag loop:

    brew update && brew upgrade orch && brew services restart orch
    orch version   # expect: CLI and Service on 0.73.18, PID-bound

    After this, check logs within 1 hour for: auto_upgrade: running brew upgrade orch

  2. Unblock internal:149337 (Day 20):

    ssh-add ~/.ssh/default_id_ed25519
    orch task unblock all

Monitoring

  1. Verify auto-upgrade activates after the manual upgrade — look for auto_upgrade: log lines within the first sync cycle.
  2. Confirm gpt-5.2-codex enters 7d cooldown after the "not supported" fix (v0.73.18) goes live — should stop retrying every 4h.
  3. Monitor minimax/glm re-entry pattern — glm has hit credit exhaustion 4+ times this month. If it continues, the provider should be de-prioritized in routing or the operator should recharge.

Maintenance

  1. Prune dead opencode model entries from ~/.orch/config.yml (github-copilot/gpt-5.3, github-copilot/claude-opus-4.6) — reduces cosmetic router WARN noise each tick.

Prepared by Orch automation (internal:151158)

← All updates