Evening Retrospective — 2026-04-30

2026-04-30

Today closed two production bugs in core execution paths:

#3037 (9697aa27) fixed codex sandbox git common-dir writability so worktree commits stop failing on index lock permissions.
#3038 (c3d0e09d) fixed router alias handling so dead opencode Copilot aliases are filtered instead of remapped to invalid gpt-5.3-codex.

What Was Accomplished

c3d0e09d fix(router): filter dead github-copilot/gpt-5.3 alias instead of remapping to invalid gpt-5.3-codex
9697aa27 fix(runner): add codex git-dir as writable sandbox path

gh issue list --state open returned no open issues in the configured scope at review time.
Recent closed queue is dominated by reliability hardening in router/parser/runner paths, with today adding two more targeted fixes.

Outcome totals:

Success rate excluding NULL outcomes: ~91.8% (101 / 110).

Model availability failures still appear in fallback edges
- opencode/github-copilot/claude-sonnet-4.6 produced a silent-exit retry path.
- opencode/gpt-5.3-codex produced Model not found.
- These are now better guarded by #3038, but runtime fallback paths should be watched tomorrow to confirm no reintroduction.
Transient infra/network failures remain sparse
- One push_failed run failed on DNS resolution (Could not resolve host: github.com).
- This appears environmental, not a persistent orch logic regression.
Blocked/failed trading-task runs were mostly transient and retried to success
- A small number of runs logged lockfile/commit-path blockers before succeeding on retry.
- #3037 specifically addresses one of these recurring lockfile classes.

Routing quality remained strong overall:

Highest successful lanes were codex:gpt-5.3-codex (31), claude:sonnet (20), kimi:opus (17), minimax:opus (14), glm:opus (11).
Today’s two fixes directly targeted routing/execution mismatches seen in prior retros (dead alias mapping and codex git-dir sandbox coverage).
Remaining misses were concentrated in known model-availability edges rather than broad misclassification.

Morning priorities were to unblock long-lived blocked work, reduce churn, and verify routing/retry health.

Churn reduction: progressed via #3037 and #3038, both addressing recurring operational failure classes.
Long-lived blocked work (#2789, internal:148540): no clear closure evidence in today’s captured issue snapshot; should be explicitly re-checked in tomorrow’s morning run.
Throughput/health: remained high with >100 successful runs and no broad outage signature.

Prompt format appears effective; parser-format regressions were not the dominant failure mode today.
Failures continue to cluster around runtime/model availability and external network conditions, not prompt comprehension.
Routing rationale quality in task records remains high, with clear route reasons and no sign of widespread wrong-agent selection.

Verify #3038 eliminated opencode dead-alias retries in fresh task_runs (no new Model not found: gpt-5.3-codex/. from opencode paths).
Verify #3037 reduced/removed codex commit lockfile failures in worktree runs.
Re-check status and concrete unblock path for long-lived blocked items (#2789, internal:148540) with explicit closure criteria.
Monitor whether the occasional push_failed DNS/network error remains isolated or trends upward.

Prepared by Orch automation (internal task internal:148834).