Morning Review — 2026-05-09
Recent Commits (last 24h)
| Hash | Message |
|---|---|
26d79aaf | docs(posts): add evening retrospective for 2026-05-08 (internal:149254) (#3084) |
d59e028f | Github issues synced only after restart (#3083) |
b6b2d38d | fix(runner): synthesize done when NDJSON envelope reports success but result lacks AgentResponse schema (#3082) |
7d37dbfe | feat(version): warn when deployed service is behind latest release (#3080) |
36334282 | bug(runner): codex --full-auto flag placed before exec subcommand — CLI 0.128.0 broke autonomous codex dispatch (#3076) |
aefb7548 | fix(review): check all attempt dirs for output when exit-1 and output.json missing |
Headline: yesterday's two open blockers (#3072 kimi exit-1, #3073 codex --full-auto) both closed. The 36334282 runner fix flipped the failure pattern — pre-fix runs throw unexpected argument '--full-auto'; post-fix runs no longer hit that error.
Operational Summary
Orch service: 0.71.1 running, 0.71.2 available — minor upgrade pending (brew upgrade orch && brew services restart orch). CLI is at 0.71.0.
Agent breakdown for last 24h (task_runs):
| Agent | Model | Outcome | Count |
|---|---|---|---|
| kimi | opus | success | 14 |
| opencode | github-copilot/gpt-5-mini | success | 13 |
| minimax | opus | success | 10 |
| claude | sonnet | success | 9 |
| opencode | github-copilot/claude-sonnet-4.6 | success | 9 |
| codex | gpt-5.3-codex | failed | 8 |
| glm | opus | success | 7 |
| claude | sonnet | failed | 3 |
| kimi | opus | failed | 3 |
| codex | gpt-5.3-codex | success | 2 |
| opencode | github-copilot/gpt-5-mini | failed | 2 |
| opencode | github-copilot/gpt-5.3 | failed | 2 |
| kimi | opus | rate_limit | 1 |
codex/gpt-5.3-codex: 8 failed / 10 total — but the failures are pre-deploy. Detail by timestamp:
- 6 of the 8 failures were
--full-autoflag errors prior to the runner fix landing. - 2 post-fix failures (00:10Z and 11:18Z) show a different pattern:
codex exit 0: empty-output-exit0. Low volume, monitor. - Most recent codex success at 06:29Z.
kimi/opus: 14 success, 3 failed, 1 rate_limit — failure rate looks healthy now that the exit-1 / output.json fix is in.
Task Snapshot
| Status | Task | Note |
|---|---|---|
| in_progress | internal:149285 | This review |
| open issues | (none) | All issues closed |
gh issue list --state open returns no open issues — backlog is clear.
Retro Follow-Up (from 2026-05-08 evening)
| Priority | Status |
|---|---|
| Finish runner fix for #3073 (codex flag order) | ✅ Closed — 36334282 shipped |
Implement primary-path rescue for kimi exit-1 / output.json | ✅ Closed — aefb7548 covers attempt dirs; #3072/#3071 closed |
Spot-check task_runs for repeated error patterns | ✅ Done in this review |
Active Cooldowns
| Key | Remaining | Reason |
|---|---|---|
codex:gpt-5.3-codex | 2h57m | persisted (model failures) |
glm:haiku | 10h38m | persisted |
opencode:github-copilot/claude-opus-4.6 | 8h38m | persisted |
opencode:github-copilot/gpt-5.3 | 9h32m | persisted |
All standard model-level cooldowns from the generic backoff system. None require intervention.
Log Health
- Watchdog warns: morning burst produced repeated
WATCHDOG: tick loop has not completed a tick in 1025s/2548sevents (~10:13Z–12:02Z). Tasks did get processed; this is the same morning-cron-burst pattern already addressed byrouter.llm_budget_secs=30sandmax_tasks_per_tick=1. Settled architecture — not refiling. - Silence detection killed three tasks during the stall window (internal:149285, 149286, 149287); failover to claude succeeded.
- GitHub HTTP transients:
error decoding response bodyand 5xx circuit-breaker onapi.github.com/graphqlaround 11:01Z–12:02Z — retry path handled them. - Telegram notification: a single DNS error at 12:02Z — transient, no impact.
/opt/homebrew/var/log/orch.error.logis 0 bytes (truncated by latest restart) — clean.
Priorities for Today
- Run the upgrade: service is on
0.71.1, latest is0.71.2—brew update && brew upgrade orch && brew services restart orch. Use the newfeat(version)warning to keep the deployment current. - Watch codex post-fix: confirm
--full-autoflag errors do not reappear in the next 24h, and keep an eye on the newempty-output-exit0pattern (2 occurrences). If it climbs, file a generic-classifier issue (do not add per-model handling). - Plan smaller morning bursts: backlog is empty so this is a quiet day — good window to validate that morning-cron-burst stalls have actually flatlined now that #3073/#3072 are closed and there are fewer failover/cooldown cycles to absorb.
Issues Filed This Review
None. No new operational problems requiring an issue. Open backlog is empty; recurring patterns (morning watchdog warns, GitHub transients) are either settled-architecture or expected transients per the closed-issue history.
Prepared by Orch automation (internal task internal:149285, attempt 1).