Evening Retrospective — 2026-05-19
Summary
Delivery stayed strong with 127 successful task runs in the last 24h, and two reliability fixes landed today (#3159, #3161). Core routing remained effective (Claude/Codex/Kimi/Opencode carrying successful throughput), while known degraded pools (GLM/Minimax review rate limits and one stale OpenCode model alias) continued to add retry noise.
What Was Accomplished
| Area | Outcome |
|---|---|
| Reliability fixes merged | f32bd1d1 fixed review hard-fail behavior for completed Kimi NDJSON; 38e17faf added service auto-upgrade support |
| Throughput | Last-24h outcomes: success=127, failed=11, rate_limit=5, blocked=3, timeout=1 |
| High-performing pools | claude/sonnet (26 agent + 17 review successes), codex/gpt-5.3-codex (15 agent + 12 review successes), kimi/opus (15 agent successes), opencode/copilot-sonnet (12 agent successes) |
| Daily planning cadence | Morning review for 2026-05-19 was published and aligned with today’s observed behavior |
What Failed, Retried, Or Needed Intervention
1) Provider instability still concentrated in known pools
Observed recurring failures were mostly known classes, not new regressions:
kimi failed: API Error: The server had an error while processing your request(3)- review-stage
rate_limitonglm/opus(3) andminimax/opus(2) - a small number of silence-detection resets (2)
2) Stale OpenCode alias still surfaced once
One run still hit:
model unavailable (github-copilot/gpt-5.3)
This is a known stale-alias pattern already addressed by recent routing/cooldown work; no duplicate issue was filed.
3) Non-orch environmental blockers appeared in internal tasks
A few blocked runs were caused by external environment constraints (credentials/timeouts/host tooling), not orch core routing logic.
Routing Accuracy
Routing quality was good overall:
- Most successful load remained on reliable pools (
claude/sonnet,codex/gpt-5.3-codex,kimi/opus,opencode/copilot-sonnet). - Degraded pools were contained to relatively low volumes and mostly handled through retries/fallbacks.
- No evidence today of systemic misrouting comparable to prior dead-model recurrence incidents.
Net: routing is directionally accurate; residual failures are predominantly provider-side or environment-side.
Prompt / Workflow Effectiveness
Prompting appears adequate. The dominant failure causes were provider API quality/rate limits and runtime/environment prerequisites, not task-instruction ambiguity. The highest leverage remains operational hardening (cooldowns, model pool hygiene, service upgrade freshness), not broad prompt rewrites.
Learnings Reflected From Orch Skill Notes
Today’s outcomes align with current orch skill guidance:
- Keep failure handling generic via classifier + cooldown pipeline.
- Avoid model-specific hardcoding; let per-model cooldowns and health checks absorb dead-model events.
- Use
task_runsoutcomes/errors as the primary operational truth source.
Newly merged work today (review hard-fail fix + service auto-upgrade support) is consistent with those principles.
Priorities For Tomorrow Morning Review
- Verify the deployed service is now running the latest release that includes today’s merged fixes, and confirm upgrade automation behavior in practice.
- Re-check the 24h fail-ish ratio on
kimi/opus,glm/opus, andminimax/opus; escalate only if throughput degrades materially. - Confirm stale
github-copilot/gpt-5.3events trend to zero after current routing protections and deployed version catch-up. - Follow up on open blocker
#3110(Claude auth 401), which remains the only open GitHub issue in this repo view.
Issues Created
None.
No new root-cause issue was filed tonight: observed problems were either already tracked, provider-transient, or consistent with known environmental blockers outside orch core code.
Prepared by Orch automation (internal:149943).