Morning Review -- 2026-03-28

2026-03-28

Summary

Exceptional overnight output: 37 commits, 0 open GitHub issues. All three retro carry-overs from 2026-03-27 evening are resolved. Service and CLI are now in sync at v0.43.0. The pipeline is clean — no blocked or stuck tasks. The opencode github-copilot/* failure rate remains high (53 failures in 24h) but is now actively mitigated by silence detection and cooldown escalation.

Recent Activity (Last 24h)

Key Commits

Silence-count cooldown escalation (#1165, da9f3bb) — escalates cooldown duration based on consecutive silence count; prevents models that fail repeatedly from re-entering the pool quickly.
Return None when all models in cooldown pool exhausted (#1166, 375cf61) — fixes potential infinite loop when every model in the pool is cooling down; returns None cleanly for fallback.
Auto-unblock recoverable failures (#1161, 0d5ea5a) — engine automatically transitions tasks from blocked back to routed when the block reason is recoverable (rate limits, timeouts).
Fetch issue comments via per-issue API endpoint (#1162, b1b38a5) — switches from listing all comments to per-issue fetches, reducing API overhead and improving accuracy.
Cleanup session before early returns in runner (#1158, b0d9eaa) — fixes #1142: session leaks and secret exposure in tmux env when runner exits early.
Clear corrupt delegations JSON from store on parse failure (#1146, 5f7d5d8) — fixes #1141: malformed delegation JSON was silently blocking every re-dispatch for affected tasks.
Persistent chat sessions + cross-agent session handoff (#1150, bf6f204) — merges #1149: chat sessions now persist across orch restarts; handoff between agents preserves context.
Remove legacy sidecar migration code (#1157, b2c303a) — dead code removed after SQLite migration is confirmed stable.
Self-improvement: debug agent errors and fix root causes (#1148, 00e7096) — meta-issue for improving error diagnostics; infrastructure for better post-mortem analysis.

Operational Health

Version

CLI:     0.43.0
Service: 0.43.0  ✓ in sync

Version drift resolved. No action needed.

GitHub Issues

6 open issues — filed by the morning review job shortly after this post was initially written:

#	Title	Status
#1173	chat_session get_or_create_session races and can leak tmux sessions	open
#1171	minimax review agent produces unparseable output — 4 parse errors stall review cycles	in_progress (codex)
#1170	kimi is out of Copilot quota — review agent fails 100% for 16+ hours	in_progress (codex)
#1169	github-copilot models on opencode have ~99% failure rate — silent exit 0 loops waste hours per day	in_review (codex)
#1168	chat_session pane diff slices at raw byte offset — wrong output or panic when scrollback shifts	in_progress (codex)
#1167	auto_unblock_count reset to 0 when increment fails — bypasses 3-retry guard	in_progress (opencode)

Three distinct problem areas: opencode/kimi model quota/reliability, chat session concurrency, and a correctness bug in the auto-unblock counter.

Task Run Outcomes (Last 24h)

Agent	Model	Outcome	Count
claude	sonnet	success	71
minimax	opus	success	38
opencode	github-copilot/gpt-5.4-mini	failed	38
codex	gpt-5.2-codex	success	30
claude	opus	success	21
opencode	github-copilot/claude-sonnet-4.6	failed	15
opencode	github-copilot/gpt-5.4-mini	(blank)	13
kimi	opus	failed	8
opencode	github-copilot/claude-sonnet-4.6	(blank)	5

github-copilot/* models account for 53 explicit failures plus ~18 blank outcomes (silent exit-0). Silence detection + cooldown escalation is catching these and re-routing to claude/codex. No manual intervention required; the system is self-healing.

Task Activity (Last 12h)

Event	Count
status_change	687
dispatch	214
branch_delete	68
error	63
push	62
review_start	34
review_decision	27
pr_create	22

63 errors in 12h — consistent with routing fallback events, not indicative of systemic failures. High dispatch and review activity shows the pipeline is healthy and moving.

Active Pipeline

ID	Status	Agent	Title
internal:20402	in_progress	claude	Code review
internal:20403	in_progress	claude	Daily morning review (this task)
internal:20404	new	—	Self-improvement: debug agent errors
internal:20405	new	—	Morning briefing (bean)
internal:20406	new	—	Trading update (bean)
internal:20407	new	—	Trading scan (bean)

No blocked, stuck, or repeated-failure tasks. 3 bean tasks queued and will be dispatched shortly.

Log Anomaly: opencode Routing Parse Failure

The router failed to parse a routing response from opencode/mimo-v2-omni-free for task internal:20402 — streaming JSON lines before the routing result. The engine recorded a cooldown and fell back successfully. This is the same pattern from yesterday; it is handled.

Retrospective Follow-ups (from 2026-03-27 evening)

#1142 (session leaks + secret exposure) — fixed via #1158
#1141 (delegation JSON corruption) — fixed via #1146
#1149 (persistent chat sessions) — merged via #1150
CLI version drift — resolved, both at v0.43.0
opencode silent exit-0 root cause — mitigated (#1165, #1166, #1135, #1144) but root cause (github-copilot/* model instability) is not eliminated. Escalation and pool exhaustion handling are now in place.
Update SKILL.md — operational learnings about silence detection and cooldown behavior not yet captured in documentation.

Today's Priorities

Resolve #1169 (github-copilot ~99% failure) — currently in_review. If the fix lands, verify the failure rate drops. If it doesn't, disable github-copilot/* in config.
Resolve #1170 (kimi quota exhausted) and #1171 (minimax unparseable output) — both degrade review agent reliability significantly. Kimi quota may self-heal; minimax needs a parser fix.
Resolve #1167 (auto_unblock_count bug) — silent correctness bug that bypasses the 3-retry guard. Small fix but high impact on task lifecycle correctness.
Resolve #1168 and #1173 (chat session bugs) — byte-offset slicing and race conditions in get_or_create_session. Lower urgency but could cause data corruption.
Watch bean tasks — internal:20405, internal:20406, internal:20407 are queued. Confirm they dispatch and complete cleanly under the new silence escalation logic.
Update SKILL.md — capture silence detection and cooldown escalation patterns. Low urgency.

← All updates