Gabriel Koerich Orch

Evening Retrospective — 2026-04-14

Summary

Lighter commit day — 10 merges vs yesterday's 28 — but meaningful coverage: Ollama local model routing ships as a feature, corrupted worktree recovery lands, and the cooldown async guarantees fix ensures KV persistence survives restarts. 59 tasks completed in 12h, 156 in 24h.

The dominant concern carries over and worsened: claude/opus declined from 50% → 27% success rate — the third consecutive day of deterioration. Additionally, 10 "no PR or code changes produced" failures appeared across multiple agents, not just opus. Both patterns warrant investigation.


What was accomplished today

10 commits merged:

CommitIssueDescription
86a990de#2651Fallback message for empty task_runs.error — addresses the silent-failure diagnostic gap
f87bc031#2647Hoist batch_session_active() to tick() — eliminates duplicate tmux subprocess per cycle (perf)
50dc043e#2646PushResult::NoCommits variant — semantic fix; was incorrectly using PushResult::Failed
fedaf9f5#2645Security: has_leaks only on high-confidence patterns — reduces false positives in gitleaks scan
190c086aset_model_cooldown / set_agent_cooldown now async — guarantees KV persistence; was fire-and-forget before
f96a1cc9#2641RouterConfig::from_config() per review cycle — called once now, not per dispatch
976ef4f1#2610files_modified falls back to empty when agent doesn't report it
ddf5636e#2623Ollama local model routing — new feature: route tasks to local models via Ollama
b26b7ab2#2636GitHub API calls fire-and-forget in tick phases 1b, 2, 4 — reduces tick latency
d33f4463#2635Corrupted worktree index recovery — detect and recover from bad index in setup_worktree

Morning priorities — status

PriorityStatus
Fix CLI version mismatchNot verifiedbrew upgrade orch still pending check. Do this first.
Investigate claude/opus 50% rateWorsened: now 27% (3 success / 8 failed in 12h). Three-day declining trend.
Verify tick loop stall resolvedNew stall (#2633, 350s) was filed and closed today — a separate instance. Monitoring continues.
Monitor kimi recovery (Apr 15)cooldown:kimi at 7h1m remaining. On track for ~03:00 UTC Apr 15.
Investigate claude/(blank) modelNot investigated. Carry forward.
Review blocked tasks47 tasks blocked. Not audited today.

Agent health (12h snapshot)

AgentModelSuccessFailedOtherRate
claudesonnet34231 timeout59%
claudeopus3827%
glmopus1162 rl, 1 to55%
minimaxopus2334 rl, 1 to74%
opencodegpt-5-mini120100%
opencodeminimax-m2.5-free15194%
opencodenemotron-3-super-free6555%
opencodegemini-3.1-pro-preview261 unknown25%
opencodegpt-5.41420%
opencodeclaude-sonnet-4.6 (copilot)040%
opencodeclaude-opus-4.6 (copilot)020%

Notable:

  • claude/opus at 27% — worst yet. Error patterns: empty error (pre-fix noise, should clear), "no PR or code changes produced" (2 opus-specific), plus general task failures.
  • claude/sonnet at 59% — down from 69% yesterday. Concerning but less dramatic.
  • opencode/gpt-5-mini 100% — steady best-performing free model. Carrying load reliably.
  • nemotron-3-super-free at 55% — not as reliable today; cooldown at ~1h13m (from yesterday notes).
  • GitHub Copilot models — continue failing. Cooldowns active and correctly applied.

"No PR or code changes produced" — 10 failures today

Spread across multiple agents/models:

AgentModelCount
claudesonnet5
claudeopus2
glmopus1
opencodegpt-5.41
opencodeclaude-sonnet-4.6 (copilot)1

This is not purely a claude/opus issue — 5 of 10 are claude/sonnet. Possible causes:

  1. Tasks with unclear requirements where agents complete but don't commit
  2. Response parser failing to detect completed work
  3. Tasks that were legitimately no-ops (e.g. "already done" cases)

Needs a query against the actual task bodies to determine if these are valid no-ops or agent failures.


Active cooldowns

Cooldown keyRemainingReason
codex41h18mBilling cycle exhausted
kimi7h1mBilling cycle
glm:haiku44mModel cooldown
opencode (agent-level)~0m (expiring)Short cooldown
opencode:github-copilot/claude-opus-4.63h29mSilence detection
opencode:github-copilot/gemini-3.1-pro-preview3h59mSilence detection
opencode:github-copilot/claude-sonnet-4.61h29mFailure
opencode:minimax-m2.5-free59mShort cooldown

What failed or needs attention

1. claude/opus at 27% — three-day declining trend

DaySuccessFailedRate
Apr 12~13~13~50%
Apr 13131350%
Apr 143827%

Error patterns over 48h (sqlite query from earlier):

  • 30 with empty error (pre-#2652 fix — this noise should clear after today's fix)
  • 4 "no PR or code changes produced"
  • 2 silent exit 0

The empty-error fix (#2652) landed today, so tomorrow's data should be cleaner. If the rate stays below 40% with proper error messages, it points to genuine task failure — likely hard complexity:complex tasks.

2. 47 blocked tasks — unaudited

Carried from yesterday. These could be: max_review_cycles reached, CI failures, agent loop detection, or human-required tasks. Needs an audit to distinguish actionable blocked tasks from permanent blocks.

3. CLI/service version mismatch — unresolved

brew upgrade orch still pending from yesterday's finding (CLI 0.67.7 vs service 0.67.9). Run before next session.


Issues created today

1 new issue filed:

  • #2653investigate: claude/opus success rate declining 3 days running (52%→50%→27%)

Priorities for tomorrow (morning review)

  1. Fix CLI version mismatch firstbrew upgrade orch && brew services restart orch && orch version. Do this before anything else. Was pending since Apr 13.

  2. Check claude/opus error patterns after #2652 fix — Now that empty errors are populated, query task_runs for the actual error messages on claude/opus failures. Determine: hard tasks vs model degradation vs prompt quality.

    sqlite3 ~/.orch/orch.db "SELECT error, COUNT(*) FROM task_runs WHERE agent='claude' AND model='opus' AND outcome='failed' AND started_at > datetime('now', '-12 hours') GROUP BY error ORDER BY COUNT(*) DESC LIMIT 10;"
  3. Monitor kimi recovery (~03:00 UTC Apr 15)cooldown:kimi expires tonight. Verify kimi begins routing and check first few completions.

  4. Audit "no PR or code changes produced" — 10 failures today, cross-agent. Pull task bodies for the affected task IDs to determine if these are legitimate no-ops or agent failures.

    sqlite3 ~/.orch/orch.db "SELECT tr.task_id, t.title FROM task_runs tr JOIN tasks t ON t.id=tr.task_id WHERE tr.error='no PR or code changes produced' AND tr.started_at > datetime('now', '-12 hours') LIMIT 10;"
  5. Audit blocked tasks — 47 blocked. Categorize by block reason and prioritize human-review cases.

  6. Monitor Ollama routing — #2623 just merged. Verify the routing path to Ollama models works end-to-end if olm agent is configured.


Prepared by Orch automation (internal task internal:145446).

← All updates