Evening Retrospective — 2026-04-24
Focused correctness day. Four commits landed addressing cooldown backoff, merge-conflict handling, task lifecycle semantics, and test reliability. Three new bugs were discovered and filed against audit data quality and model validation.
What Was Accomplished
Commits (last 12 hours)
| Commit | Description |
|---|---|
dc88e160 | fix(cooldown): past retry_at timestamp skips exponential backoff (#3004) |
c4c96688 | fix: remove auto_close_task_on_approval — tasks done only when PR merges |
0a7bf86a | fix: route merge conflicts to task agent instead of blocking or looping |
9c41cb02 | fix(tests): eliminate real network calls to Discord and GitHub APIs |
Most impactful fix: dc88e160 — record_agent_failure_with_message() was comparing a stale retry_at timestamp against the current time after every failure. When retry_at was in the past (e.g., an expired vendor hint), the function returned early before applying exponential backoff. Net effect: after a rate-limited agent recovered, its next failure would reset the backoff clock to zero rather than accumulating. Correct behavior now: retry_at only applies when it's in the future; otherwise, normal record_agent_failure() logic runs.
Merge-conflict rerouting (0a7bf86a): Previously, merge conflicts detected in review_poll either blocked tasks outright or looped. They now re-route to the original task agent for resolution, which is the correct operational behavior.
auto_close_task_on_approval removal (c4c96688): Approval of a PR was closing tasks before the PR actually merged, creating false done states. The engine now marks tasks done only on actual PR merge.
Test network isolation (9c41cb02): Tests that made live calls to Discord and GitHub APIs introduced flakiness tied to external availability. All replaced with deterministic stubs.
What Failed (and Why)
Task runs (last 12 hours)
| Agent | Model | Success | Failed | Other |
|---|---|---|---|---|
| claude | opus | 14 | 1 | 1 parse_error |
| claude | sonnet | 9 | 1 | — |
| codex | gpt-5.3-codex | 8 | — | 1 parse_error |
| minimax | opus | 8 | — | — |
| glm | opus | 5 | 1 | 1 timeout |
| kimi | opus | 1 | — | — |
| opencode | github-copilot/claude-opus-4.6 | — | 1 | — |
| opencode | github-copilot/gpt-5.4 | — | 1 | — |
Overall failure rate is low (~10%). The two opencode failures are on dead Copilot model identifiers (claude-opus-4.6, gpt-5.4) — see issue #3010 below.
GLM is recovering slowly (5 successes today after yesterday's suppression from extended backoff). One timeout and one failure remain but the trend is positive.
Routing Accuracy
Routing decisions today were generally sound:
- High-volume success lanes:
claude/opus(14/16),codex/gpt-5.3-codex(8/9),minimax/opus(8/8) - Degraded lanes:
opencodeon Copilot models wasting dispatch slots;glmimproving but still unreliable
The dead Copilot model problem (#3010) is the main routing accuracy drag: the router dispatches github-copilot/gpt-5.4 and github-copilot/claude-opus-4.6 which fail immediately with Model not found. There are 52 such failures in the database.
Morning Review Priority Check-in
| Priority | Status |
|---|---|
| Monitor GLM recovery | ✅ Improving — 5 successes today, extended backoff working |
| Investigate bean SSH failures | ❌ No code change — root cause still open |
| Fix #2881 | Already resolved (yesterday's work) |
| Evening retro gap | ✅ This retro addresses it |
| Tune LLM routing budget | ❌ No change yet — watchdog stall risk persists |
Blocked Tasks
#2789— GLM artifact collection. Still waiting.internal:148540— Self-improvement task. Hit dead Copilot model (#3010), then review exceeded threshold.internal:148556— Twitter bookmarks research. Agent returnedblockeddue to inaccessible data sources.internal:148569— Trading update. Blocked after review.- Oblivion/Solana/keeper backlog (48 blocked total) — old tasks requiring human intervention.
New Issues Filed Today
#3010 — bug(router): model_map accepts invalid opencode model identifiers
github-copilot/gpt-5.3, github-copilot/gpt-5.4, github-copilot/claude-opus-4.6 are in the live model_map but return Model not found on every dispatch. 52 failed runs attributable to this. Root cause: RouterConfig::from_config() loads model identifiers verbatim with no validation.
#3011 — bug(audit): blocked task runs recorded as success
classify_run_outcome() maps status == "blocked" to "success". This makes retrospective and alerting queries that filter on outcome != 'success' silently miss agent-blocked tasks.
#3012 — bug(runner): agent-returned blocked reasons not persisted
When an agent responds with status=blocked and a human-readable summary, the response handler does not write the summary to tasks.last_error or tasks.block_reason. Operators see a blocked task with empty error fields; the explanation is buried in task_runs.parsed_response.
Issues #3011 and #3012 are closely related (both affect how blocked outcomes are represented) but have distinct code paths and can be fixed independently.
Performance and Operational Notes
- No watchdog stalls observed in this window (unlike yesterday's 80s stall).
- 4 tasks in progress at time of writing; 2664 done overall.
orch.error.logis 0B — no fatal errors.
Priorities for Tomorrow's Morning Review
- Fix #3010 first: dead Copilot model identifiers are actively burning dispatch slots. Either add model-pool validation to
RouterConfig::from_config()or remove the dead entries from the live config. 52 wasted runs is a concrete cost. - Fix #3011 + #3012 together: classify blocked runs as
"blocked"intask_runs.outcomeand persist the agent-provided summary intolast_error. These are straightforward response-handler patches with clear test coverage paths. - Investigate LLM routing budget: the 45s budget caused an 80s tick yesterday. Monitor if today's tick timing improved. If watchdog stalls continue, lower
llm_budget_secsto 30s. - Unblock
internal:148540(self-improvement): it was routed to a dead Copilot model before doing meaningful analysis. Once #3010 is fixed, requeue manually. - SSH auto-merge failures for bean project:
sign_and_send_pubkeyED25519 errors need investigation.GH_TOKENfix won't help SSH — separate auth path.
Prepared by Orch automation (internal task internal:148580).