Evening Retrospective — 2026-03-02
Summary
Exceptionally productive day. 19 commits merged, 6 tasks closed, zero open issues at EOD (excluding this retro). The biggest structural change was the status-driven review workflow that eliminates the review_started sidecar flag. Agent prompts were hardened with a mandatory pre-output checklist. Service is stable and the backlog is empty.
Morning Review Recap
| Priority | Outcome |
|---|---|
| No specific bugs (service stable) | Confirmed — no bugs opened |
| Monitor: status-based review refactor | Landed this morning, all tasks passed through it successfully |
| Consider: reduce stuck threshold 30→10-15 min | Filing as issue today |
All priorities acknowledged.
Tasks Completed Today
| Issue | Title | Agent | Notes |
|---|---|---|---|
| #267 | Graceful restart — wait for running agents | claude | Clean first-attempt |
| #265 | branch_name() panics on non-ASCII titles | claude | Byte-offset panic fixed |
| #264 | Transient GitHub API error clears active_task_id | claude | active_task_id preserved on 502 |
| #263 | Code review: orch | claude | Routine scheduled review |
| #257 | Extract LLM routing into engine/llm_router.rs | claude | Clean architecture refactor |
| #271 | Morning review | claude | Completed, drove today's priorities |
Key Changes This Cycle
1. Status-Driven Review Workflow (95a2113)
The review_started sidecar flag — spread across 11 sites — is gone. Review lifecycle now uses status transitions exclusively:
- Agent done + PR exists →
needs_review - Engine spawns review agent → transitions to
in_reviewas the guard - Review failure → reset to
needs_review(was: stuckin_reviewforever) - Stale
in_reviewwith no tmux session → reset at startup and sync tick
This is a large behavioral change. Today's tasks all passed through it successfully, but edge cases (concurrent review + sync tick, rapid retry loops) should be watched.
2. Sidecar State Scoped Per-Repo (1dc75cb)
Sidecar files now land in ~/.orch/state/{repo}/{id}.json via tokio::task_local! { REPO_CONTEXT }. Brew service (cwd=/) previously caused all sidecars to write flat to ~/.orch/state/{id}.json, breaking worktree cleanup.
3. Graceful Restart (bb0e4f3)
orch restart now waits for all running agents to finish before restarting the service. Previously, an in-progress agent could be killed mid-task.
4. Agent Prompt Hardening
Two improvements to prevent the most common agent failure (work done but not pushed):
- Pre-output checklist (
3cee8fd): agents must verifygit status,git log,git push, andgh pr viewbefore writing output JSON - Route prompt cleanup (
1f8245d): removed hardcoded executor descriptions that caused routing bias toward specific agents
5. Review Prompt Rewritten (404b557)
Rewrote review_task.md to be more explicit about rebase-first, CI-must-run, and decision rules. Review agents were previously approving without running CI.
6. Auto-Close on Done (c934eeb)
Issues are now auto-closed when tasks complete. Labels that were stuck at status:in_review on closed issues are also corrected.
What Didn't Go Well
Stale Integration Test (036ddbf)
Codex autonomous mode changed from --ask-for-approval never to --full-auto but the integration test was never updated. The test was #[ignore]d in the repo, masking the drift. Root cause: #[ignore]d tests aren't run in CI and can silently become stale. Filed as part of commit #272 and fixed.
Labels Not Updating on Close (historical, fixed today)
Several older closed issues showed status:in_review label — the label update code had a gap for the no-PR path. Fixed by c934eeb.
Prompt Effectiveness
| Prompt | Assessment |
|---|---|
agent_system.md | Strong. Mandatory pre-output checklist is new and should eliminate push-forgetting. |
review_task.md | Improved. Rebase-first + explicit CI requirement are clear. Watch for edge cases with large diffs. |
route.md | Good. Hardcoded executor descriptions removed. Historical labels now explicitly ignored. |
agent_message.md | Not reviewed today — no issues observed. |
No prompt changes needed beyond what's already landed.
Routing Accuracy
All 6 tasks today were routed to claude. No mis-routes observed. The route prompt cleanup (1f8245d) removes historical bias — upcoming scheduled tasks will be the first real test of neutral routing.
Performance
- GitHub API: Transient 502s at ~11:21 UTC (last reported in morning review) recovered automatically.
fix: preserve active_task_id on transient GitHub API errorswas merged today — this exact scenario is now handled. - No lock contention observed in logs.
- Review latency: status-driven workflow eliminates the
review_startedflag polling delay. Expected improvement in review cycle time.
New Issues Filed
| Issue | Title | Priority |
|---|---|---|
| #274 | Reduce stuck-detection threshold for tasks with no active tmux session | Low |
Tomorrow's Priorities
- Monitor the new review workflow —
needs_review→in_reviewstatus transitions are live but new. Watch for any stuck tasks inin_reviewwithout an active review session, especially under concurrency. - Stuck-detection threshold — the newly filed issue. Once deployed: tasks with no tmux session should be reclaimed in 10-15 min, not 30 min.
- No other known bugs. Service is in the best shape it's been. The next scheduled jobs will reveal if the route prompt cleanup introduced any unexpected routing changes.