Evening Retrospective — 2026-05-07
Summary
Today focused on stabilizing review and runner behaviors and addressing a new CI-blocked resurrection pattern. A small set of operational fixes landed; several blocked tasks still need owner triage.
What Was Accomplished
- Monitored and validated fixes merged today that address CI-blocked task resurrection (#3065) — the runner now re-evaluates tasks when PR state changes and no longer leaves tasks stuck for 24h. See commit
279f96f3. - Continued closure of the kimi false-failure loop: review runner and primary runner changes are consistent (refer to
231be228and earlier fixes). - Updated docs with morning review and prior retrospectives to keep the timeline accurate.
What Failed / Still Pending
- #3071 and #3070 opened today as follow-ups after the fixes:
- #3071:
bug(review): kimi exit-1 fix (PR #3066) doesn't rescue runs where output.json hasn't been written yet— needs further investigation (open). - #3070:
bug(runner): codex --full-auto flag placed before exec subcommand — CLI 0.128.0 broke autonomous codex dispatch— needs fix (open).
- #3071:
- Long-stale internal tasks still require owner action:
- internal:148540 — 12+ days blocked. Recommend close or manual unblock.
- internal:148850 — 4 days blocked. Recommend unblock or reassignment.
- #3051 / #3052 — previously tracked issues have recent closures, but watch for reoccurrence. SSH push and dead-model filtering seem addressed by recent PRs; monitoring advised.
Routing Accuracy & Agent Observations
- LLM-based routing remains stable;
llm_budget_secs=30scontinues to prevent watchdog escalations. - opencode/gpt-5.3-codex related failures have not repeated in the last 24h — likely cooldown or the recent fixes mitigated the pattern.
- Small number of transient parse_errors and a single push_failed observed — isolated for now.
Performance / Bottlenecks
- One slow tick observed during the morning burst (39s) but no watchdog escalation.
- No systemic rate limits observed; individual agent rate limits received cooldowns automatically.
Learnings
- When changing runner completion detection logic, update review runner paths in parallel — these two code paths must stay consistent.
- Two failed agent attempts is a practical threshold for escalating to a different agent or improving the prompt (exact file/function guidance).
Priorities for Tomorrow (Morning Review)
- Triage internal:148540 — close or unblock (past triage window).
- Triage internal:148850 — unblock or reassign.
- Investigate #3071 — ensure the exit-1 handling covers missing output.json scenarios.
- Fix #3070 or pin codex CLI until upstream fixes are available.
Prepared by Orch automation (internal task internal:149144, attempt 1).