Morning Review — 2026-05-25
Recent Commits (Last 24h)
| Commit | Description |
|---|---|
28968b0a | docs(posts): update evening retrospective for 2026-05-24 with late-day fixes (#3191) |
e59a1dda | fix(codex): replace removed --ask-for-approval with -c approval_policy= for codex 0.133.0 (#3190) |
ec66fb1e | fix(router): stop cascading timeouts within a single tick (#3189) |
Notable: router timeout cascade and codex API compatibility fixes landed yesterday; jobs and budget removals remain in main.
Operational Health
Overall: Mostly healthy. Core agents are running and producing successes. Recent logs show frequent tick activity and task dispatches.
Highlights from the last 24 hours:
- opencode + opencode-hosted models show many successful runs (multiple successful attempts across gpt-5-mini, sonnet variants, and gpt-5.4).
- kimi/opus and claude/sonnet exhibit stable success rates.
Known noise and warnings:
- Config contains stale opencode model entries that are pruned at runtime:
github-copilot/gpt-5.3andgithub-copilot/claude-opus-4.6. These are pruned automatically but generate WARN lines on each dispatch. Operator action recommended to remove them from~/.orch/config.yml.
WATCHDOG / tick stalls:
- The WATCHDOG reported tick stalls earlier (70s–130s) correlated with router timeout cascade. The cascade fix (#3189) landed and should prevent multi-minute stalls going forward. Continue to monitor for recurrence.
Agent/model failure patterns (last 24h snapshot from task_runs):
- codex|gpt-5.3-codex: several failures (backoff will apply). Fixes for Codex CLI compatibility were deployed (#3190).
Stuck / Blocked Tasks
- internal:149337 — blocked (SSH signing error during auto-merge push). Pattern:
sign_and_send_pubkey: signing failed for ED25519 "/Users/gb/.ssh/default_id_ed25519.pub" from agent: communication with agent failed. This requires operator intervention: re-add SSH key (ssh-add ~/.ssh/default_id_ed25519) in the service's environment or switch push URL to HTTPS for the affected worktree.
Retro Follow-ups (carried forward)
- Operator: remove dead opencode model entries (
github-copilot/gpt-5.3,github-copilot/claude-opus-4.6) from~/.orch/config.ymlto eliminate persistent WARNs. - Operator: fix internal:149337 SSH agent signing failure — restart SSH agent and re-add keys, or reconfigure push method.
- Engineering: confirm router timeout cascade fix prevents WATCHDOG stalls in production; collect WATCHDOG metrics for the next 24h.
- Monitoring: observe Codex dispatch health after approval_policy fix (#3190).
Priorities For Today
- Operator: resolve SSH signing error for internal:149337 so blocked auto-merge can proceed.
- Operator: prune stale opencode models from config; verify WARN lines on dispatch decrease.
- Engineering: monitor WATCHDOG logs and task_runs for any repeated stalls or cascading timeouts.
- Engineering: spot-check Codex dispatches across a few representative tasks to ensure compatibility with codex 0.133.0.
Prepared by Orch automation (internal:150315)