Morning Review -- 2026-03-30
Summary
Extremely high-output night and morning — 22+ commits landed in the last 24 hours, addressing the full kimi billing-cycle cascade, error classification improvements, routing weight configuration, and migration safety. The pipeline is healthy: minimax is the reliable workhorse (72 success runs), routing weights are now configurable, and the router now gates on cooldown availability. Remaining focus: 6 open bugs currently being worked on, plus a stale worktree metadata issue in the error log.
Recent Activity (Last 24h)
Infrastructure / Architecture
feat: configurable per-agent routing weights— operators can now tune per-agent weights in config.yml; replaces the fragile LLM "distribute evenly" heuristicfeat: gate router on cooldown availability (#1266)— router no longer selects cooled agents/models; prevents wasted dispatchfeat: separate runners for MiniMax and Kimi (claude CLI wrappers)— MiniMaxClaudeRunner and KimiClaudeRunner now handle non-"type":"result"stream parsing correctlyfix: restore migration 012 checksum and split new column to 014+test: add migration idempotency tests— migration safety fix + idempotency tests to catch future checksum breaks
Bug Fixes
fix: remove 24h billing-cycle cooldown fallback from parse_retry_at— cooldown fallback was interfering with vendor-specified retry-at timestampsfix: stash uncommitted changes before rebase on startup (#1277)— prevents startup rebase destroying agent work (#1245 companion fix)bug: transport last_output persists across retries, replaying stale output (#1279)— stale output from prior attempt was being replayed to review agentbug: auto_unblock_count should reset when block reason changes (#1276)— counter sharing between CI recovery and auto-unblock mechanisms fixedfix: restore ci_recovery_count usage in handle_review_changes (#1284)— counter separation fix to avoid mixing auto_unblock vs CI recovery countsgithub(http): request latest check-runs per name (filter=latest) (#1283)— avoids picking oldest stale check-run when multiple exist per namebug: incomplete task_runs with NULL outcome block auto-unblock (#1272)— NULL-outcome runs no longer block the auto-unblock mechanismfix: clear agent/model on auto-unblock (#1269)— ensures re-routing picks fresh assignment after unblockbug: has_commits_ahead returns false on git failures — skips push + PR creation (#1291)— git errors were silently blocking PR creationbug: model_for_complexity returns None when all pool models cooled (#1278)— dispatch was proceeding with empty model; opencode exits 0 silently in this casefix: re-route external tasks that complete without code changes— tasks that reported success without a PR now get re-routed to a different agentperf: single git fetch per project at startup (#1274)— startup time reduced (was N sequential fetches)fix: truncate stdout/stderr to 500 chars in router LLM failure log (#1273)— very long stdout was polluting router failure logs
Operational Health
Agent Performance (last 24h task_runs)
| Agent | Model | Success | Failed | Rate-limit | NULL | Notes |
|---|---|---|---|---|---|---|
| minimax | opus | 72 | 3 | 2 | 4 | Primary workhorse |
| opencode | gpt-5-mini | 39 | 3 | 0 | 2 | Healthy |
| claude | sonnet | 12 | 3 | 2 | 6 | 6 NULL needs watch |
| opencode | (empty) | 2 | 7 | 0 | 0 | model_for_complexity=None bug — now fixed in #1278 |
| kimi | opus/sonnet | 0 | 2 | 5 | 0 | Billing exhausted, cooldown active |
| codex | gpt-5.2-codex | 3 | 1 | 1 | 0 | Low volume, healthy |
Notable: The 7 opencode/empty-model failures (dispatch with null model) are explained by #1278 which was just fixed. These should disappear. Claude has 6 NULL-outcome runs — these typically indicate mid-session cleanup races; worth watching but not alarming given the high success count.
Task Activity (last 12h)
- 1,356 status changes — high throughput day
- 401 dispatches
- 65 errors — low relative to activity
- 51 review decisions, 50 PR creates — review pipeline healthy
- 5 auto-unblocks (small, expected)
Log Errors
Recurring in brew error log:
fatal: not a git repository: /Users/gb/Projects/orch/.git/worktrees/gh-issue-1005-...
fatal: not a git repository: /Users/gb/Projects/orch/.git/worktrees/gh-issue-1130-...
fatal: not a git repository: /Users/gb/Projects/orch/.git/worktrees/gh-issue-1251-...These are stale git worktree metadata entries in the main project directory (not ~/.orch/worktrees). The worktrees were cleaned up but .git/worktrees/ metadata entries were never pruned. This is benign (doesn't block any task) but spams the log. A git worktree prune in the main project dir would clear them. Related to #1225 (startup reconciliation prune fix) — appears the prune runs in ~/.orch/worktrees but not the user project's .git/worktrees/.
Stuck / Blocked Tasks
| ID | Status | Agent | Tries | Title |
|---|---|---|---|---|
| #1245 | blocked | minimax | — | startup rebase destroys worktree with unstaged changes (15h blocked) |
| #1247 | in_progress | claude | 4 | silence detection spurious needs_review + review cycle |
| #1250 | in_progress | opencode | 3 | handle_failover no fallback path never records agent cooldown |
| #1267 | in_progress | claude | 3 | blocking std::fs calls in async code |
| #1271 | in_progress | claude | 2 | engine marks code tasks done when no PR |
| #1292 | in_progress | claude | 2 | detect_rate_limit false positive |
| #1244 | in_progress | claude | 3 | model cooldowns in-memory only (lost on restart) |
#1245 is blocked (15h) — the startup rebase worktree destruction bug. The companion fix (#1277, stash before rebase) landed yesterday, but #1245 itself is blocked and may need manual unblock. Worth checking if the fix would resolve the blocked state.
#1247 at 4 tries — silence detection race condition (kill vs exit-0). Claude is on attempt 4; if this fails again, the task will likely be escalated to blocked.
Retro Follow-ups
From the 2026-03-29 evening retro:
| Item | Status |
|---|---|
| opencode silent exit-0 (model=None path) | ✅ Fixed — #1278 merged |
| Stale copilot models (#1257) | Not in current open list — likely merged |
| #1244 in-memory cooldowns lost on restart | Still open, in_progress (claude) |
| #1227 + #1232 auto_unblock counter sharing | ✅ Fixed — #1276 and fix: restore ci_recovery_count (#1284) |
| #1241 channel thread bindings (4 opencode failures) | Not visible in open issues — likely resolved or blocked-then-closed |
| Stale KV cooldown entries | Not explicitly addressed — stale cooldown:kimi:k2p5 may still be in KV |
Today's Priorities
- Monitor #1245 (blocked, 15h) — startup rebase fix (#1277) is deployed; manually unblock #1245 if the engine doesn't auto-recover it, since the fix is already in place.
- #1244 (in-memory cooldowns) — SQLite persistence for cooldowns is still the most impactful durability fix remaining. Target closure today.
- #1247 (silence detection spurious review, 4 tries) — high retry count; watch for resolution or escalation to blocked.
- Watch opencode/empty-model failures — should drop to zero after #1278 is in production. If they persist, check whether the fix actually resolves the dispatch path.
- Stale git worktree metadata — the
fatal: not a git repositorylog spam. Investigate whether startup reconciliation should also rungit worktree prunein user-managed project directories. Low urgency, but the log noise is misleading. - Stale KV cooldown entries — verify
cooldown:kimi:k2p5and other expired entries are cleaned up. If not, the cooldown expiry check may not be evicting them.