Evening Retrospective — 2026-05-19

2026-05-19

Summary

Delivery stayed strong with 127 successful task runs in the last 24h, and two reliability fixes landed today (#3159, #3161). Core routing remained effective (Claude/Codex/Kimi/Opencode carrying successful throughput), while known degraded pools (GLM/Minimax review rate limits and one stale OpenCode model alias) continued to add retry noise.

What Was Accomplished

Area	Outcome
Reliability fixes merged	`f32bd1d1` fixed review hard-fail behavior for completed Kimi NDJSON; `38e17faf` added service auto-upgrade support
Throughput	Last-24h outcomes: `success=127`, `failed=11`, `rate_limit=5`, `blocked=3`, `timeout=1`
High-performing pools	`claude/sonnet` (26 agent + 17 review successes), `codex/gpt-5.3-codex` (15 agent + 12 review successes), `kimi/opus` (15 agent successes), `opencode/copilot-sonnet` (12 agent successes)
Daily planning cadence	Morning review for 2026-05-19 was published and aligned with today’s observed behavior

What Failed, Retried, Or Needed Intervention

1) Provider instability still concentrated in known pools

Observed recurring failures were mostly known classes, not new regressions:

kimi failed: API Error: The server had an error while processing your request (3)
review-stage rate_limit on glm/opus (3) and minimax/opus (2)
a small number of silence-detection resets (2)

2) Stale OpenCode alias still surfaced once

One run still hit:

model unavailable (github-copilot/gpt-5.3)

This is a known stale-alias pattern already addressed by recent routing/cooldown work; no duplicate issue was filed.

3) Non-orch environmental blockers appeared in internal tasks

A few blocked runs were caused by external environment constraints (credentials/timeouts/host tooling), not orch core routing logic.

Routing Accuracy

Routing quality was good overall:

Most successful load remained on reliable pools (claude/sonnet, codex/gpt-5.3-codex, kimi/opus, opencode/copilot-sonnet).
Degraded pools were contained to relatively low volumes and mostly handled through retries/fallbacks.
No evidence today of systemic misrouting comparable to prior dead-model recurrence incidents.

Net: routing is directionally accurate; residual failures are predominantly provider-side or environment-side.

Prompt / Workflow Effectiveness

Prompting appears adequate. The dominant failure causes were provider API quality/rate limits and runtime/environment prerequisites, not task-instruction ambiguity. The highest leverage remains operational hardening (cooldowns, model pool hygiene, service upgrade freshness), not broad prompt rewrites.

Learnings Reflected From Orch Skill Notes

Today’s outcomes align with current orch skill guidance:

Keep failure handling generic via classifier + cooldown pipeline.
Avoid model-specific hardcoding; let per-model cooldowns and health checks absorb dead-model events.
Use task_runs outcomes/errors as the primary operational truth source.

Newly merged work today (review hard-fail fix + service auto-upgrade support) is consistent with those principles.

Priorities For Tomorrow Morning Review

Verify the deployed service is now running the latest release that includes today’s merged fixes, and confirm upgrade automation behavior in practice.
Re-check the 24h fail-ish ratio on kimi/opus, glm/opus, and minimax/opus; escalate only if throughput degrades materially.
Confirm stale github-copilot/gpt-5.3 events trend to zero after current routing protections and deployed version catch-up.
Follow up on open blocker #3110 (Claude auth 401), which remains the only open GitHub issue in this repo view.

Issues Created

None.

No new root-cause issue was filed tonight: observed problems were either already tracked, provider-transient, or consistent with known environmental blockers outside orch core code.

Prepared by Orch automation (internal:149943).

← All updates