Gabriel Koerich Orch

Daily Review — 2026-06-06

What Shipped (Last 24h)

1 new commit landed on gabrielkoerich/orch:

CommitDescription
3f26c6f2fix(cooldown): sync in-memory map from KV every tick so external clears land

Service upgraded to v0.80.1 (was v0.79.1 in yesterday's review). This includes the CLI trim (#3270), orch commit LLM messages (#3269), the cooldown sync fix, and all prior fixes through the v0.79.x line.

No new closed issues since yesterday's review. The 3 bug fixes (#3272, #3273, #3274) remain open.

Operational Health

Task Run Summary (Last 24h)

AgentModelSuccessFailedRate LimitTimeoutParse ErrorOther
claudesonnet1137131 push_failed
claudeopus41521 aborted
opencodedeepseek-v4-flash-free209611 empty
opencodenemotron-3-ultra-free113112
kimiopus11071 aborted
codexgpt-5.534
codexgpt-5.431
codexgpt-5.302
minimaxopus052
opencodeminimax-m3-free04
opencodemimo-v2.5-free011

Total agent runs: ~270 (lower than yesterday's ~376 — cooldowns throttled capacity).

Agent Pool Health

  • Active cooldowns:
    • codex — 39m (agent-wide, persisted)
    • kimi — 1d21h (agent-wide, billing cycle exhaustion)
    • kimi:opus — 21h8m (model-specific)
    • minimax — 20h27m (agent-wide, persisted)
  • Degraded agents: codex, kimi, minimax (3 degraded — same as yesterday)
  • Recovered agents: opencode (cleared degradation during this tick)
  • Effective routing pool: claude (sonnet/opus) — effectively single-agent operation

Key Error Patterns

  1. kimi massive cooldown (1d21h) — kimi hit its usage limit and is locked out for nearly 2 days. All 3 open bug fixes (#3272-#3274) are stuck behind kimi's forced agent:kimi label.
  2. minimax 429 persisted (5 agent + 2 rate_limit failures) — agent on 20h cooldown.
  3. opencode empty-output-exit0 (4× deepseek-v4-flash-free) — agent exits with code 0 but no JSON output.
  4. Claude "session limit" misclassified (sonnet 2×, opus 3×) — still classified as failed not rate_limit. #3272 filed but stuck on kimi 429.
  5. Codex gpt-5.3 account restriction (2×) — "not supported when using Codex with a ChatGPT account".
  6. Router LLM pool timed out at 02:39:04 — tried minimax/haiku (20h cooldown), wasted 45s before weighted round-robin fallback selected opencode. This is the same task running this review.
  7. Watchdog triggered at 02:39:24 — tick stalled 79s (threshold 60s) during worktree creation + dispatch.
  8. Multi-agent degradation warning persistent: codex=persisted, kimi=agent_error, minimax=persisted.

Stuck / Blocked Tasks

TaskStatusAgent/ModelIssue
internal:151442blockedopencode/gpt-5-miniSelf-improvement (old, Jun 2). Children done but auto-unblock failed.
#3272new— (was kimi)claude session limit misclassification — 5 attempts, all kimi 429
#3273blocked— (was kimi/sonnet)normalize_status missing aliases — waiting on PR #3275 contributor
#3274blocked— (was kimi/opus)opencode false-positive rate_limit — waiting on PR #3275 contributor
internal:151994blockedclaude/sonnetBean close daily — escalated after 6 retries
internal:152092newNot yet routed (cooled pool)

Note: #3273 and #3274 have PR #3275 from contributor @Jah-yee, but review requested splitting into separate PRs. #3276 was opened as an alternative with the split. Owner set ~24h hold for contributor response.

Routing Accuracy

  • LLM routing unavailable for most of the period — all agents in the routing LLM pool (kimi, minimax, codex) were cooled.
  • Weighted round-robin fallback selected opencode (weight 0.2) when LLM pool timed out.
  • Effecitve single-agent mode for execution: only claude sonnet/opus + opencode deepseek-v4-flash-free are available.
  • Router LLM selected minimax/haiku despite 20h cooldown — wasted 45s before timeout. The pool index should skip cooled agents.
  • The agent:kimi labels on #3272-#3274 are now blocking those tasks since kimi is on cooldown. The engine clears the label on failure, but the router keeps re-selecting kimi. Root cause likely the label override reapplied by issue sync.

Performance

  • Watchdog triggered at 02:39:24 — tick stalled 79s. Caused by worktree creation + opencode dispatch during routing cooldown recovery.
  • Router LLM timeout (45s minimax/haiku) — contributed 45s of the 79s stall. Fallback to weighted round-robin succeeded.
  • GitHub GraphQL operations appear healthy (no EOF errors observed today).
  • SQLite query latency minimal across all operations (<1ms for rate limit queries).

Evening Update

What Shipped (Afternoon)

6 additional commits landed since the morning review, plus orch upgraded to v0.80.2:

CommitDescription
1834788afix(parser): normalize_status aliases + detect_rate_limit word-boundary guard (#3279)
4cb7b176ci+review: trigger CI on pull_request, fix sandbox image, add review-pr-ci recipe
6a748d09chore(review): sandboxed external-PR review workflow
5f9f459ddocs(agents): require clone+execute inside Docker for external PRs
831a2289docs(agents): policy for reviewing external-contributor PRs
b5afd534bug(runner): claude 'session limit' still classified as failed → #3278

Issues closed this afternoon: #3272 (claude session limit misclassification), #3273 (normalize_status missing aliases).

PRs merged: #3278 (maintainer fix for claude session limit), #3279 (contributor @Jah-yee parser fixes — merged after review split).

External PR Review Security Workflow

A full secure review workflow was added for external-contributor PRs:

  • CLAUDE.md: hard policy — no fork clones, no code execution outside Docker, new deps = immediate no-merge
  • scripts/review/Dockerfile.fetch + Dockerfile.run — two-stage sandboxed execution (network-gated fetch → offline run)
  • just review-pr <N> recipe — automated hooks-check + Docker spin-up + cleanup
  • scripts/review/hooks-check.sh — tripwire for .cargo/config.toml, build.rs, CI workflows, shell scripts
  • CI now triggers on pull_request for test validation (but not pull_request_target — fork secrets remain safe)

Operational Issues Filed (Evening)

IssueTitleSeverity
#3283bug(runner): claude "weekly limit" misclassified as failed — reset timestamp not parsedhigh

#3283 root cause: detect_rate_limit() in src/engine/runner/agents/mod.rs lacks "weekly limit" in its pattern list. Claude messages like "You've hit your weekly limit · resets Jun 9 at 1am (America/Sao_Paulo)" fall through to a generic Failed classification — no cooldown is set, the 3-day reset timestamp is discarded, and orch retries immediately instead of waiting. Fix: add "weekly limit" alongside the existing "session limit" entry (same parse_retry_at logic applies). Tasks affected: 152324, 152327, 152331.

Current Cooldown State (Evening)

KeyRemainingReason
codex:gpt-5.317hpersisted
kimi22h13mpersisted (billing)
minimax1d21hpersisted
minimax:opus1d21hpersisted
opencode/deepseek-v4-flash-free1d3hpersisted
opencode/mimo-v2.5-free4h26mpersisted
opencode/minimax-m3-free4h58mpersisted
opencode/nemotron-3-ultra-free8h11mpersisted

ALL opencode models are cooled at time of this review — all agents in the routing pool hit cooldowns simultaneously. Effective pool for next ~4h: claude sonnet/opus + codex (gpt-5.4/gpt-5.5 only).

Remaining Stuck Tasks

TaskStatusAgeIssue
#3281blocked4h, 5 attemptscontrol: split oversized messages — opencode failing on all attempts
#3274blocked1d, 3 attemptsopencode false-positive rate_limit (#3279 partially addressed word-boundary fix)
internal:151442blocked4dSelf-improvement — children done but auto-unblock still stale

Task Run Summary (Full Day)

AgentModelSuccessFailedOther
claudesonnet744
opencodenemotron-3-ultra-free4341 parse_error, 1 timeout
codexgpt-5.52521 blocked
codexgpt-5.4181 rate_limit
claudeopus1753 blocked
opencodedeepseek-v4-flash-free103
opencodemimo-v2.5-free1032 aborted, 2 timeout
opencodeminimax-m3-free823 timeout
minimaxopus06— (all failed)
codexgpt-5.304— (model restricted)

Activity totals (24h): 315 dispatches · 222 pushes · 103 review starts · 89 review decisions · 86 PR creates · 40 errors · 9 timeouts · 3 auto-unblocks.

Tomorrow's Priorities

  1. Fix #3283 (claude weekly limit misclassification) — add "weekly limit" to detect_rate_limit() patterns alongside "session limit". The parse_retry_at logic already handles the reset timestamp format; the fix is a one-line addition + regression test. Without it, 3-day cooldowns are discarded and orch retries immediately on weekly limit exhaustion.
  2. Fix #3274 (opencode rate_limit false-positive) — the word-boundary guard in #3279 may not be sufficient; the root cause is nextest output containing test function names with rate_limit. Needs a smarter check (e.g., JSON output gate or test-output exclusion pattern).
  3. Fix #3281 (control oversized messages) — 5 attempts, still blocked. Assign to claude when opencode remains cooled. Message chunking is a straightforward string-split task.
  4. Unblock internal:151442 — 4-day-old self-improvement task, children done, auto-unblock stale. Check orch task unblock all.
  5. Monitor kimi/minimax cooldown recovery — kimi clears in ~22h, minimax in ~1d21h. When they recover, re-route any remaining backlog.
  6. Verify #3279 parser fixdetect_rate_limit now uses word-boundary guard. Watch for any residual false-positives on rate_limit in test output over the next cycle.

Morning section prepared by internal:152037 (attempt 4). Evening update by internal:152385 (attempt 3).

← All updates