Evening Retrospective -- 2026-03-27
Summary
High-volume day: 24 commits, 10 issues closed. The dominant theme is operational robustness — fixing silent failures, race conditions, and edge-case corruption in the dispatch, review, and worktree subsystems. The new task activity log table provides debugging visibility. A critical bug in opencode silent exit-0 handling was mitigated but not eliminated.
Accomplished Today
Silent Agent Detection & Recovery (4 fixes)
- Silence detection via tmux capture (#1136) — detect agents that exit 0 with no output; cooldown model and re-route after a grace period.
- Silence detection loops within same agent (#1144) — short agent cooldown to force re-route to a different agent.
- Retry opencode free model on silent exit-0 (#1135) — before falling through to claude, retry the same model once; reduces false failovers.
- Rate limit context extraction (#1105) — cooldown model on silent exit 0 when rate limit is detected.
Worktree & Branch Management (5 fixes)
- Propagate remove_worktree_and_branch failure (#1143) — prevents orphaned worktrees that remain marked as cleaned.
- Restore worktree from origin/
when local branch deleted (#1137) — recovers tasks after branch deletion. - Preserve remote branch on startup rebase failure when task has open PR (#1138) — avoids closing PRs prematurely.
- Clear stale push failures and add has_commits guard (#1145) — prevents false push_failed detection.
- Cleanup fallback and JSON decode retries (#1121) — robustness in worktree cleanup path.
Review Pipeline Hardening (5 fixes)
- Warn on git fetch failure before building review diff/log (#1134) — prevents silent stale refs (follow-up to #1039).
- Push review branch before comment (#1122) — ensures review comments are posted on up-to-date branch.
- Scan all JSON blobs and prefer best AgentResponse match (#1118) — parser picks the most complete response, reducing parse errors.
- Reset review_cycles to 0 when task transitions to needs_review — prevents stale counter blocking future reviews.
- Include internal tasks in check_merged_prs (#1103) — merged PRs now unblock NeedsReview internal tasks.
Cron & Job System (3 fixes)
- Support aliases with optional parameters (#1109) — cron aliases like
@dailycan take optional args. - Split 0-N DOW ranges to include Sunday correctly (#1104) — cron
0-5now includes Sunday. - Normalize DOW mapping (#1094) — "0-5" previously mapped to "1-5", dropping Sunday.
New Feature
- Task activity log table (#1133) — tracks all events per task in SQLite for debugging; enables timeline reconstruction.
Bug Fixes & Improvements
- Dispatchable tasks log fires before dispatch guard check (#1131) — fixes misleading count in logs.
- Early return in runner skips tmux cleanup (#1142) — prevents session leaks and secret exposure (still in progress).
- Malformed delegation JSON in store never cleared (#1141) — silently fails on re-dispatch (needs review).
- Self‑improvement: debug agent errors and fix root causes (#1148) — meta‑issue for improving error diagnostics.
Documentation
- Align sync interval defaults (#1132) — updated stale 120s references to 45s.
- Fix task status semantics in AGENTS.md — clarified
needs_reviewis automatic,blockedneeds human. - Audit and update documentation (#1108) — comprehensive pass over recent changes.
What Failed / Needed Escalation
Still Open
ID Status Title #1142 in_progress early return in runner skips tmux cleanup — leaks sessions and leaves secrets in tmux global env #1141 needs_review malformed delegation JSON in store is never cleared — silently fails on every re-dispatch #1149 in_review persistent chat sessions + research cross-agent session handoff #1142 is an opencode task that is still in progress — likely stuck due to the opencode silent exit‑0 pattern. #1141 is a delegation JSON corruption bug awaiting review. #1149 is a feature in review (chat sessions).
Recurring Pattern: opencode Silent Exit‑0
The morning review flagged opencode agents exiting with code 0 and no output. Today’s fixes (#1135, #1144, #1136) mitigate but do not eliminate the issue. The root cause appears to be opencode streaming non‑JSON lines or terminating silently. The mitigation is to cooldown the model and re‑route to a different agent (usually claude).
Routing Accuracy
Today’s closed issues used a mix of agents:
Agent Issues Closed claude ~7 opencode ~3 Complexity routing: mostly
simpleandmedium. Onecomplextask (#1148) — a meta‑issue about debugging agent errors.Routing appears accurate — opencode failures were caught by silence detection and re‑routed to claude. However, the router itself failed to parse a routing response from
opencode/minimax‑m2.5‑free(streaming JSON before result), causing a cooldown and fallback. This suggests opencode’s output format may be inconsistent.
Patterns & Health
Positive:
- High throughput: 24 commits, 10 issues closed — sustained pace of fixes.
- Systematic approach: Silent detection, worktree recovery, and review safety are being addressed in a coordinated way.
- New debugging tooling: Task activity log table enables post‑mortem analysis of task timelines.
Concerning:
- opencode reliability: The silent exit‑0 pattern is recurring and not fully resolved. Internal tasks are failing over to claude, increasing cost and latency.
- Three open issues: Two are bugs (#1142, #1141) that could affect dispatch reliability. One is a feature in review (#1149).
- No changes to SKILL.md: Operational learnings about opencode failures and silence detection are not yet reflected in the skill documentation.
Open at End of Day
ID Status Title #1149 in_review persistent chat sessions + research cross‑agent session handoff #1142 in_progress early return in runner skips tmux cleanup — leaks sessions and leaves secrets in tmux global env #1141 needs_review malformed delegation JSON in store is never cleared — silently fails on every re‑dispatch
Tomorrow's Priorities
- Monitor #1142 and #1141 — both are dispatch‑critical bugs. #1142 may need an opencode agent restart or timeout.
- Follow up on opencode silent exit‑0 — consider a dedicated investigation: is it a model issue (github‑copilot/*) or an opencode CLI bug? If persistent, route internal tasks away from opencode.
- Update SKILL.md — add notes about silence detection, cooldown behavior, and opencode reliability patterns discovered today.
- Review #1149 — persistent chat sessions feature is in review; ensure it aligns with control session architecture.
- CLI version drift — still unresolved (0.37.6 vs 0.37.11). Run
brew upgrade orch && brew services restart orch.