Evening Retrospective — 2026-04-23
High-throughput reliability day. The project closed a large batch of bug tickets across parser normalization, review reroute correctness, cooldown behavior, cleanup reliability, and control/session safety.
What Was Accomplished
- Closed a large set of production bugs today, including:
- review/merge-loop correctness fixes (#2998, #2991, #2990, #2962, #2956)
- parser envelope/status normalization fixes (#2986, #2979, #2977, #2915)
- PR/base and git-ops reliability fixes (#2978, #2976, #2973)
- cleanup and transport robustness fixes (#2987, #2966, #2953)
- cooldown and failure classification corrections (#2950, #2941, #2936)
- Most important morning-review priorities were executed:
- routing/review correctness work landed across multiple issues
- parse-error hardening continued and reduced parser-related misclassification risk
- #2881 appears addressed by subsequent reliability work and is now closed
- Open queue is very small: only #2789 remains open (
Collect raw GLM failing run artifacts...).
What Failed (and Why)
Today’s task_runs still show recurring failure modes on agent runs:
failed(9),rate_limit(4),timeout(2), plus a fewaborted/push_failedoutcomes.- Top repeated error strings:
silence detection set task to new(3)max attempts reached(3)- GLM/Kimi 429 rate limits after repeated attempts
- model-availability mismatches (e.g., unavailable
github-copilot/gpt-5.3, unsupportedgpt-5.2-codexaccount/model combination)
Interpretation: core engine reliability is improved, but provider/model lane instability and account/model compatibility remain material failure sources.
Routing Accuracy
Routing quality was mixed but generally acceptable:
- Strong lanes:
opencode/github-copilot/gpt-5-mini: 19/20 success (95.0%)claude/sonnet: 15/18 success (83.3%)codex/gpt-5.3-codex: 10/12 success (83.3%)
- Weaker lanes still active:
glm/opus: 1/5 success (20.0%)kimi/opus: 4/9 success (44.4%)
Assessment: routing is mostly choosing viable executors for common tasks, but degraded model lanes still consume retries. Additional pre-emptive avoidance for clearly degraded/unavailable model combos would reduce wasted attempts.
Performance and Operational Notes
- Throughput remained high enough to close many issues in one day.
- Review pipeline behavior improved significantly (fewer infinite reroute/review loops after today’s fixes).
- Retry attempts still end in non-success states often enough to warrant continued focus on cooldown + routability checks.
Skill/Operational Learnings Check
~/.claude/skills/orch/SKILL.md remains broadly aligned with current operations:
- Emphasis on
task_runs-first diagnosis is still correct. - Guidance around stale tmux/review states and autonomous recovery remains relevant.
- No new skill-file delta observed today requiring docs sync from this retrospective.
Priorities for Tomorrow Morning Review
- Finish #2789 (GLM artifact capture) and decide whether GLM lanes should be temporarily deprioritized until artifact-backed recovery criteria are met.
- Audit and tighten model-availability guards to avoid retries on known-invalid model/account combinations.
- Review silence-detection-triggered resets (
silence detection set task to new) to confirm they are true positives and not over-eager recoveries. - Recheck today’s weak lanes (GLM/Kimi) against cooldown and routing-weight behavior after overnight runs.
Issues Created From This Review
No new issues created. Observed problems are already covered by existing open/closed issue history, with #2789 remaining as the primary unresolved thread.
Prepared by Orch automation (internal task internal:148544).