Evening Retrospective — 2026-04-06
Summary
Extremely productive day. ~35 commits merged, 20 issues closed. The dominant theme was two high-severity security fixes that should have shipped sooner (GitHub token exposure, GraphQL injection), followed by a broad review pipeline correctness sweep and auto_merge reliability fixes. By end of day the review pipeline is substantially more reliable: per-agent extractors, Alibaba rate limit detection, review_agent_failures reset, and auto_merge merge-conflict bail correctly routed.
Queue is active: 4 tasks in_review (CI fix, NetworkError backoff, GraphQL projects.rs, NeedsReview backoff), 1 in_progress (#2048 pagination), and 4 blocked on review agent failures.
Morning Priorities — Outcome
| Priority from morning review | Status |
|---|---|
| Upgrade CLI/service (mismatch 0.60.44 vs 0.60.50) | No evidence of action — CLI/service mismatch may persist |
| Codex usage limit cooldown (until Apr 9) | ✓ Cooldown applied: cooldown:codex expires Apr 9 18:22 UTC. 8 dispatches today — all failed before cooldown locked in. |
| kimi recovery at ~13:00 UTC | ✗ Not yet recovered — cooldown expires ~23:33 local (Apr 7 02:33 UTC). failure_count:kimi=19, kimi:haiku=22. |
| Unblock internal:54549 | Unknown — not visible in current task list. Presumably handled or expired. |
| Async blocking audit (#2045) | ✗ Now blocked ("review agent blocked — exceeded failure threshold"). Fix for #2043 would unblock it. |
| Watch review pipeline stability | ✓ No unexpected accumulation. Five overnight fixes confirmed stable. |
What Was Accomplished
Security fixes (highest priority)
- GitHub token in CLI args (#2007 /
7e5dd799) — token was passed directly viagit -c http.extraheader=...making it visible inpsoutput to any local user. Fixed to use env var or credential helper. - GraphQL injection via
format!()(#2005 /42de5506) — string interpolation in GraphQL queries allowed injection attacks. Fixed to use variables in all queries. A companion issue (#2030) remains open forprojects.rsspecifically (currently in_review).
Review pipeline correctness sweep
- Per-agent extractors for review parse (#2032 /
8bb493d2) — replaced manual parse logic with per-agent extractors. Codex and opencode previously failed JSON extraction frequently; parse failures now give structured error with raw output captured (8939bdcb). - Alibaba "rate increased too quickly" detection (#2023 /
e454c61d) — opencode/qwen3.6 rate limit messages were not being detected, causing re-routing without cooldown. 7 out of 10 qwen3.6 runs failed today (pre-fix). Fix now in production. - review_agent_failures not reset across review cycles (#2024 /
eb486c3b) — transient failures from previous review cycles were accumulating and blocking tasks prematurely. Fixed: failures now reset when re-routing for a new cycle. - Review watermark persistence (
a52e518c) — watermark was persisted beforehandle_review_changessucceeded; on DB error, same review was re-processed every tick. Fixed to persist only on success. - Auth headers in branch deletion (#2010 /
efd25fbe) — cleanup.rsdelete_brancheswas missing auth headers, causing silent 401 failures on remote branch cleanup.
auto_merge reliability
- Merge-conflict bail misrouted as review_agent_failure (#2031 /
3294fc3e) —auto_merge_prmerge-conflict errors were going throughclassify_review_failure, incorrectly incrementingreview_agent_failuresinstead ofmerge_conflict_retries. This was the root cause of task #2001 blocking cascade. Fixed. - Null
mergeabletreated as merge-ready (#2011 /7106b2e7) — GitHub returnsnullformergeablewhile still computing it; was proceeding to merge and hitting 405. Now retried as transient error. - Stash drop by reference (#2006 /
697e1d95) —git stash dropwas called with SHA hash instead of stash ref (stash@{N}), causing silent no-ops during rebase cleanup.
Data integrity / correctness
- store_set_result for attempts counter (
21d04147) —max_attemptssafeguard used a non-atomic increment that returned 0 on DB failure, allowing bypass of the max attempt limit. - reconcile_closed_tasks worktree cleanup only on status update success (
3f933608) — worktrees were being deleted even when the DB status update failed, losing work on transient DB errors. - PR recovery check on non-transient final errors (
aa9c4aa0) — final non-transient errors skipped the PR recovery path, leaving stale PRs open. - skip GitHub API calls for internal task IDs (
88428c3a) — internal task IDs (e.g.internal:65302) were being sent to GitHub API, producing 404 errors on every tick.
Performance
- Store helpers redundant SQL (
499bebf3) — store helpers were re-resolvingtask_idon every call via separate SQL query; now passed directly. - Ingest fan-out collapse (
f644fd4a/ #1998) —ingest_external_taskswas issuing separatelist_by_statusqueries per status; collapsed into singleactive-issuesquery.
Infrastructure
- Router LLM fallback routing to same cooled agent (
57b0a5b7) — when only one available agent was cooled, the fallback still routed to it. Fixed. - Control session: restrict directory listing (
3ade1051) — control sessions were able to read/list the main project directory, bypassing the sandbox intent.
What Failed and Why
Agent failures (last 12h)
| Agent | Model | Dispatches | Successes | Failures | Root cause |
|---|---|---|---|---|---|
| codex | gpt-5.3-codex | 8 | 0 | 8 | Credit exhaustion (until Apr 9 18:22 UTC) |
| kimi | opus | 5 | 0 | 5 | Billing cycle exhausted |
| opencode | qwen3.6-plus-free | 10 | 3 | 7 | Alibaba rate limit (detection fix now deployed) |
| claude | sonnet | 50 | 47 | 3 | 2 cooldown-triggered, 1 unknown |
| minimax | opus | 50 | 45 | 5 | Periodic agent_error (failure_count:minimax:haiku=3) |
codex (8 failures): All failed due to credit exhaustion (Apr 9). Cooldown is now applied correctly. No dispatches expected until Apr 9.
kimi (5 failures): Billing cycle not reset yet. failure_count:kimi=19, kimi:haiku=22. Cooldowns: kimi expires ~02:33 UTC Apr 7, kimi:haiku ~00:00 UTC Apr 7. Both should clear overnight. No intervention needed.
opencode/qwen3.6 (7 failures): Alibaba rate limit pre-dates today's detection fix. With e454c61d deployed, next failures will apply cooldown and stop the retry loop.
Review agent blocks (4 tasks)
Four tasks are blocked with "review agent blocked — exceeded failure threshold": #2045, #2043, #2001, internal:63857. All blocked due to review parse failures before the per-agent extractor fix (8bb493d2) was deployed. Issue #2043 (parse error should re-route instead of blocking) is currently in_review — once that fix lands, these tasks can be unblocked and re-reviewed successfully.
Routing Accuracy
Routing is working correctly for degraded agents — codex and kimi are cooled and skipped. Load is spread across claude, minimax, and opencode variants.
| Agent | Successes | Notes |
|---|---|---|
| claude/sonnet | 47 | Primary workhorse. 94% success. |
| minimax/opus | 45 | 90% success. Minor periodic cooldowns. |
| opencode/minimax-m2.5-free | 13 | 100% |
| opencode/gh-copilot/gpt-5-mini | 12 | 100% |
| opencode/gh-copilot/claude-sonnet-4.6 | 10 | 100% |
| opencode/qwen3.6-plus-free | 3 | 30% — rate limited (fix deployed) |
| opencode/nemotron-3-super-free | 8 | 89% |
| claude/haiku | 6 | 100% |
| claude/opus | 5 | 100% |
| codex/gpt-5.3-codex | 0 | Cooled until Apr 9 |
| kimi/opus | 0 | Cooled until ~02:33 UTC Apr 7 |
The router LLM fallback fix (57b0a5b7) prevents the edge case where a single available cooled agent was still selected on fallback. Weight decay and cooldown signals are working correctly to avoid wasted dispatches.
System Health
- Queue: 4 in_review (CI, NetworkError backoff, projects.rs GraphQL, NeedsReview backoff), 1 in_progress (#2048), 4 blocked (review agent failure — pending #2043 fix)
- Errors (last 12h): 5 task errors visible in activity (HTTP transient). Claude cooldown at 23:33 local suggests a rate limit event this evening.
- Active cooldowns: codex (until Apr 9), kimi (until ~02:33 UTC Apr 7), kimi:haiku (tonight UTC), minimax (short — already close to expiry), opencode (short)
- CI: #2047 (CI broken on main) currently in_review — pipeline may be temporarily unstable
- Stale KV:
failure_count:opencode:github-copilot/gemini-3.1-pro. Did you mean(17) andfailure_count:opencode:opus/(15) are stale corrupted keys from #1934. Harmless but persist.
Priorities for Tomorrow
Verify CLI/service sync — Morning review noted 0.60.44 vs 0.60.50 mismatch. Check
orch versionand runbrew upgrade orch && brew services restart orchif still mismatched.#2043 fix landing — Parse error in review should re-route instead of block. Currently in_review. Once merged, manually unblock the 4 tasks stuck with "review agent blocked" (#2045, #2043, #2001, internal:63857) so they get re-reviewed with the new extractor logic.
kimi recovery overnight — Both kimi cooldowns expire before morning UTC. Verify at morning review that
failure_count:kimidrops and dispatches resume.opencode/qwen3.6 stability — With Alibaba rate limit detection now deployed, watch that qwen3.6 failures apply cooldown correctly on next rate limit. If 0% success continues without cooldown being applied, something in the detection path is still wrong.
Async blocking audit (#2045) — Deferred three days running. Once unblocked post-#2043 fix, this is a targeted
rg 'std::fs::' src/pass across async functions. Should be simple/quick but keeps slipping.#2030 GraphQL projects.rs — Currently in_review. Completes the injection prevention sweep from #2005. Should land tomorrow.