Gabriel Koerich Orch

Daily Review — 2026-06-09

What Shipped (Since 2026-06-09 Morning)

10 commits landed today across two batches (morning + evening), closing all 3 operational priorities from the 06-06 review:

Batch 1 (morning) — Prior report

CommitDescription
90cee705control: split oversized messages into chunks instead of truncating (#3282)
86a49feffix(runner): detect "weekly limit" as rate_limit, not failed (#3285)
ec517123Daily review (last 24h) (#3284)
1834788afix(parser): normalize_status aliases + detect_rate_limit word-boundary guard (#3279)
4cb7b176ci+review: trigger CI on pull_request, fix sandbox image, add review-pr-ci recipe

Batch 2 (evening) — New since prior report

CommitDescription
0a1e5380fix(router): check agent-level cooldown in call_router_llm before model-level check (#3289)
28ee1992fix(runner): all-agents-exhausted resets to new/blocked instead of needs_review (#3290)
b4a18d76docs(posts): daily review for 2026-06-09 (#3288)
e2a8986cbug(parser): normalize_status missing 'NO_SETUPS', 'alerts' (plural), 'not_configured' (#3292)
79f65f16Self-improvement: debug agent errors and fix root causes (#3293)

Service version: v0.80.7 (unchanged — fixes committed but not yet released).

Issues Closed

IssueTitleStatus
#3286 / #3289fix(router): check agent-level cooldown in call_router_llm✅ Committed, pending deployment
#3287 / #3290fix(runner): all-agents-exhausted needs_review refire loop✅ Committed, pending deployment
#3274bug(runner): opencode false-positive rate_limit on cargo test✅ Resolved (word-boundary guard)
#3291 / #3292bug(parser): normalize_status missing NO_SETUPS, alerts, not_configured✅ Committed
#3285 / #3283fix(runner): detect "weekly limit" as rate_limit✅ Shipped
#3281 / #3282control: split oversized messages✅ Shipped
#3272claude session limit misclassification✅ Shipped
#3271ALL AGENTS COOLED false fire✅ Shipped
#3268orch commit: LLM message generation✅ Shipped

All 3 priorities from the 06-06 review are now resolved. Priority 2 (LLM pool should skip cooled agents) is addressed by #3289 but requires a release deploy.

Operational Health

Task Run Summary (Last 24h)

AgentModelSuccessFailedTimeoutParse ErrorOther
claudesonnet528
opencodenemotron-3-ultra-free396112 rate_limit
opencodedeepseek-v4-flash-free22132 empty
opencodemimo-v2.5-free16341
kimiopus111 rate_limit
minimaxopus8
codexgpt-5.34
codexgpt-5.51
opencodenorth-mini-code-free3
opencode(other)1

Total dispatches (24h): 265 (up from 148 in the morning). Opencode free-tier + claude/sonnet handled the entire workload; claude stepped up significantly in the afternoon/evening session as codex/kimi/minimax remain degraded.

Task activity totals: 929 status changes · 265 dispatches · 131 routes · 138 pushes · 61 PR creates · 49 errors · 39 reroutes · 48 review decisions · 8 timeouts · 384 branch deletes.

Agent Pool Health

Three agents remain degraded:

AgentStatusReasonExpected Recovery
codexDegraded + cooledUsage limit hit; retry-at Jun 10 9:31 PM~36h
kimiDegraded + cooledPersisted cooldown (billing cycle)Unknown
minimaxDegraded + cooled429 usage limit (code 2056) · repeated failuresUnknown

Effective routing pool: opencode free-tier (nemotron-3, mimo-v2.5, deepseek-v4-flash) + claude/sonnet as fallback.

Router LLM pool: minimax/haiku is still in the pool but timing out consistently (45s timeout fires, then fallback to weighted round-robin → claude/sonnet). This wastes 45s on every routing attempt that hits minimax. The router should skip cooled agents in the LLM pool, not just in execution routing.

Key Error Patterns

  1. Codex billing limit (unchanged) — Hit usage ceiling. parse_retry_at correctly parsed "Jun 10th, 2026 9:31 PM" → cooldown until 2026-06-11 00:31 UTC. Codex remains degraded.

  2. Minimax 429 (code 2056) (unchanged) — "Request rejected (429) · usage limit exceeded (2056)" appearing repeatedly. Minimax remains degraded with no clear recovery window.

  3. Kimi 11 failures + 1 rate_limit — All kimi/opus runs failing. Billing cycle exhaustion at the provider level. The single rate_limit event suggests the system tried to parse a retry-at timestamp but the billing cooldown dominates.

  4. Router LLM pool timeout — Fix committed (#3289: check agent-level cooldown in call_router_llm) but not deployed. The live service still attempted minimax/haiku at 23:01 UTC and timed out after 45s before falling back to weighted round-robin → opencode. Deploying v0.80.8 will resolve this.

  5. Claude/sonnet 8 failures — Claude is picking up tasks that codex/kimi/minimax would normally handle (38% of all dispatches vs. much less previously). The 8 failures may reflect task-agent mismatch for tasks designed for other agents. Worth monitoring — if failures trend above 15%, investigate.

  6. Watchdog alert (69s) — Tick stalled 69s at 23:01 UTC during worktree creation + routing + dispatch of the daily review task. Same pattern as the morning's 61s stall. Root cause: router LLM timeout (45s) + worktree creation overhead. Should be mitigated by #3289 deployment.

  7. Self-improvement task completed — internal:152792 successfully analyzed root causes, addressing 5 agent error patterns across parser/normalize_status, detect_rate_limit, weekly-limit detection, opencode variant support, and kimi/minimax rate-limit classification. The self-improvement loop is functioning correctly.

Stuck / Blocked Tasks

Previously Blocked — Now Resolved

The 10+ blocked trading/bean tasks from this morning's report (all resolved):

  • 152672, 152675, 152677, 152686, 152689, 152690, 152693, 152370, 152431 — all completed
  • The entire morning trading batch ran successfully on opencode free-tier models
  • No SSH/push dependency chain issues remained — the cleanup orch task unblock all cleared accumulated blockages

Currently Blocked (Orch — stale, not actionable)

TaskTitleAgeTriesIssue
148985Research: Anthropic prompt framework37d1Blocked — needs human review, no retry code
149038Research: Monitor USDPT on Solana36d1Blocked — needs human review, no retry code

These are research tasks that were blocked awaiting human review. They are not operational.

Still Blocked (Bean — security audit findings)

~30+ blocked tasks in the bean/oblivion project (security audit findings from April) remain blocked. These are audit-discovered bugs that need manual prioritization — they won't auto-resolve. If still relevant, they should be re-triaged.

In Progress

TaskTitleAgentStatus
internal:152928Daily review (this task)opencode/deepseek-v4-flash-freein_progress
internal:152929Daily evening retrospectiveclaude/sonnetin_progress

Routing Accuracy

  • LLM routing: Degraded. Fix committed (#3289) but not deployed. The live service still attempts minimax/haiku on every routing tick and times out after 45s before falling back to weighted round-robin, wasting 45s per tick.
  • Weighted round-robin: Working correctly. When LLM pool fails, fallback selects claude → opencode by routing weight (0.2).
  • Cooldown system: Working correctly. codex/gpt-5.5 retry-at parsed accurately. kimi/minimax on extended billing-cycle cooldowns.
  • Agent failure routing: Failover from minimax → claude/sonnet and codex → claude triggered correctly throughout the day.

Deployment needed: #3289 adds is_agent_in_cooldown() check in call_router_llm before attempting the LLM call. The next release (v0.80.8) will eliminate the 45s wasted timeout on every routing attempt.

Priorities for Tomorrow (2026-06-10)

  1. Deploy v0.80.8 — Merge and push to main. The release pipeline auto-tags, publishes to Homebrew, and restarts the service. Three critical fixes are pending:

    • #3289: router LLM skips cooled agents (eliminates 45s timeout on every routing tick)
    • #3290: all-agents-exhausted resets properly (stops spurious needs_review refire loop)
    • #3292: parser normalize_status missing aliases (prevents false task failures)
    git push origin main
    gh run watch --exit-status
    brew update && brew upgrade orch
    brew services restart orch
  2. Deploy after CI: git pushgh run watch --exit-statusbrew update && brew upgrade orch && brew services restart orch

  3. Monitor codex recovery — Codex usage limit clears Jun 10 9:31 PM. After recovery:

    • Verify gpt-5.5 routes correctly
    • Verify gpt-5.3 (account-restricted) stays in permanent cooldown via record_persistent_model_failure
    • Watch for routing weight restoration
  4. Monitor kimi/minimax recovery — Both remain on extended billing-cycle cooldowns with no clear recovery window. When they recover:

    • Verify the LLM pool picks them back up correctly
    • Watch for routing weight restoration
    • Check that model cooldowns don't immediately re-fire
  5. Watch claude/sonnet failure rate — 8 failures in the last 24h (vs. 52 successes = 13% failure rate). If this trends up, investigate whether tasks designed for codex/kimi are poorly suited to claude, or if claude is hitting its own usage limits.

  6. Stale bean/oblivion blocked tasks — The cluster of ~30 blocked security audit findings from April needs human triage. If relevant, these should be retried or closed. This is not a code fix — it requires the owner to review and decide.


Prepared by internal:152928 (evening update — routed opencode/deepseek-v4-flash-free via weighted round-robin after LLM pool timed out on minimax/haiku).

← All updates