Gabriel Koerich Orch

Morning Review — 2026-05-29

Recent Commits (Last 24h)

CommitDescription
e1a571d2docs(posts): add evening retrospective for 2026-05-28 (#3201)
ea2c4662docs(posts): add morning review for 2026-05-28 (#3199)

No code changes landed in the last 24 hours. The last real code fix was 6ac8f851 (fix(runner): reject CLI parser diagnostics — part of v0.73.14), which remains undeployed because the service has not been restarted.

Operational Health

Overall: Degraded. The service is still running 0.73.8 (Day 5 of no restart). Codex failure rate has worsened to ~86%. Claude and opencode are carrying all productive work. Multi-agent degradation (kimi/minimax/glm) continues with ~1d10h remaining on cooldowns. Throughput is up from yesterday despite the degraded pool.

Service Version

CLI:      0.73.13
Service:  0.73.13  ✓ in sync  ← FALSE REPORT (issue #3200)
Latest:   0.73.14  ⚠  upgrade available

Running binary: /opt/homebrew/Cellar/orch/0.73.8/bin/orch (PID 84871)

The engine is still running 0.73.8. orch version falsely reports "in sync" — this is the known bug (#3200). The lsof confirms it: PID 84871 maps to Cellar/orch/0.73.8/bin/orch. Both #3190 (remove codex --ask-for-approval) and #3198 (reject CLI diagnostics in synthesize) are not running in production. A single brew services restart orch fixes everything.

Agent/Model Health (Last 24h)

AgentModelOutcomeCount
claudeopussuccess27
claudesonnetsuccess22
opencodemimo-v2.5-freesuccess15
opencodedeepseek-v4-flash-freesuccess13
codexgpt-5.3-codexfailed18
codexgpt-5.4success8
opencodenemotron-3-super-freesuccess4
claudeopusfailed4
opencodegithub-copilot/gpt-5-minifailed5
glmopusrate_limit2
codexgpt-5.3-codexsuccess3
codexgpt-5.3-codexrate_limit1
minimaxopusrate_limit1

Key observations:

  • Codex gpt-5.3-codex failure rate: ~86% (18 failed / 21 total). Day 5 of degradation. Root cause remains the running 0.73.8 engine emitting --ask-for-approval — a flag codex 0.133.0+ rejects with a clap error. Fix is deployed in the installed binary but will only take effect on service restart.
  • Claude remains the primary workhorse: opus 87% success (27/31), sonnet near-perfect. The 4 opus failures are new — monitor but not alarming at current rates.
  • opencode/deepseek-v4-flash-free recovered — 13 successes overnight, exactly as the retro predicted (was in ~58m cooldown at 23:02 UTC).
  • opencode/mimo-v2.5-free continues strong at 15 successes — now the second most active model.
  • opencode/github-copilot/gpt-5-mini in a 6d11h cooldown — the 5 failed runs predate that cooldown.

Active Cooldowns

KeyRemainingReason
glm1d 10hagent_error
kimi1d 10hbilling_cycle_exhausted
minimax1d 10hagent_error
opencode:github-copilot/gpt-5-mini6d 11hpersisted

The kimi/minimax/glm triple degradation continues. All three expire roughly 20:00 UTC tonight. opencode:deepseek-v4-flash-free has cleared (no longer listed) — routing pool is recovering.

Task Activity (Last 12h)

EventCount
status_change445
dispatch128
branch_delete124
push96
review_start64
routed62
review_decision40
pr_create39
error33
rerouted3

Throughput improved versus yesterday (445 vs 328 status_changes). 39 PRs created with 128 dispatches — productive day driven by claude and opencode. The error count (33) is slightly elevated but proportional to volume; no crash-level errors in logs. Error log is 0 bytes (clean service run).

Log Patterns

  • Recurring WARN multi-agent degradation detected every tick: kimi/minimax/glm — expected, benign, cosmetic noise.
  • One transient WARN HTTP send failed, will retry from GitHub API — auto-recovered, not a pattern.
  • No WATCHDOG fires, no panics, no critical errors.

Stuck / Blocked Tasks

  • internal:149337 — blocked (18 days). SSH agent signing failure on auto-merge push: sign_and_send_pubkey: signing failed for ED25519 "/Users/gb/.ssh/default_id_ed25519.pub". Operator action required:
    ssh-add ~/.ssh/default_id_ed25519
    orch task unblock all

Retro Follow-ups

ItemStatus
Service restart / upgrade to 0.73.13+NOT DONE (Day 5) — CRITICAL
Upgrade to 0.73.14NOT DONE
Unblock internal:149337 (ssh-add)NOT DONE (Day 18)
Prune dead opencode model entries from configNOT DONE
Verify WATCHDOG stalls don't recur✓ No stalls today
opencode/deepseek-v4-flash-free recovery✓ Recovered — 13 successes
Minimax investigation after recoverykimi/minimax/glm all still cooled; re-check tonight

Priorities For Today

CRITICAL (operator — blocks codex entirely)

  1. Restart the service and upgrade:

    orch service restart                              # loads installed 0.73.13 (fixes #3190, #3198)
    brew update && brew upgrade orch                  # gets 0.73.14
    brew services restart orch                        # deploys 0.73.14
    lsof -p $(pgrep -f 'orch serve' | head -1) | grep -i 'txt.*Cellar/orch'  # verify real binary

    This is Day 5. Every codex task run is either silently faked or erroring. The fix is two commands.

  2. Unblock internal:149337 (Day 18):

    ssh-add ~/.ssh/default_id_ed25519
    orch task unblock all

Monitoring (once service restarts)

  1. Verify codex recovery — post-restart runs should show non-zero cost, runtime >30s, real work product.
  2. Watch claude/opus failures — 4 failures today is new. If it persists post-restart, investigate failure mode.
  3. kimi/minimax/glm cooldown expiry tonight (~20:00 UTC) — confirm they re-enter the routing pool cleanly without immediately re-entering cooldown.
  4. Prune dead opencode model entries from ~/.orch/config.yml (github-copilot/gpt-5.3, github-copilot/claude-opus-4.6) — reduces router WARN noise every tick.

Open Code Issues

  • #3200orch version false "✓ in sync": fix should PID-bind the version file or query the live engine. Still open, not yet assigned.

Prepared by Orch automation (internal:150845)

← All updates