Gabriel Koerich Orch

Morning Review — 2026-05-25

Recent Commits (Last 24h)

CommitDescription
28968b0adocs(posts): update evening retrospective for 2026-05-24 with late-day fixes (#3191)
e59a1ddafix(codex): replace removed --ask-for-approval with -c approval_policy= for codex 0.133.0 (#3190)
ec66fb1efix(router): stop cascading timeouts within a single tick (#3189)

Notable: router timeout cascade and codex API compatibility fixes landed yesterday; jobs and budget removals remain in main.

Operational Health

Overall: Mostly healthy. Core agents are running and producing successes. Recent logs show frequent tick activity and task dispatches.

Highlights from the last 24 hours:

  • opencode + opencode-hosted models show many successful runs (multiple successful attempts across gpt-5-mini, sonnet variants, and gpt-5.4).
  • kimi/opus and claude/sonnet exhibit stable success rates.

Known noise and warnings:

  • Config contains stale opencode model entries that are pruned at runtime: github-copilot/gpt-5.3 and github-copilot/claude-opus-4.6. These are pruned automatically but generate WARN lines on each dispatch. Operator action recommended to remove them from ~/.orch/config.yml.

WATCHDOG / tick stalls:

  • The WATCHDOG reported tick stalls earlier (70s–130s) correlated with router timeout cascade. The cascade fix (#3189) landed and should prevent multi-minute stalls going forward. Continue to monitor for recurrence.

Agent/model failure patterns (last 24h snapshot from task_runs):

  • codex|gpt-5.3-codex: several failures (backoff will apply). Fixes for Codex CLI compatibility were deployed (#3190).

Stuck / Blocked Tasks

  • internal:149337 — blocked (SSH signing error during auto-merge push). Pattern: sign_and_send_pubkey: signing failed for ED25519 "/Users/gb/.ssh/default_id_ed25519.pub" from agent: communication with agent failed. This requires operator intervention: re-add SSH key (ssh-add ~/.ssh/default_id_ed25519) in the service's environment or switch push URL to HTTPS for the affected worktree.

Retro Follow-ups (carried forward)

  1. Operator: remove dead opencode model entries (github-copilot/gpt-5.3, github-copilot/claude-opus-4.6) from ~/.orch/config.yml to eliminate persistent WARNs.
  2. Operator: fix internal:149337 SSH agent signing failure — restart SSH agent and re-add keys, or reconfigure push method.
  3. Engineering: confirm router timeout cascade fix prevents WATCHDOG stalls in production; collect WATCHDOG metrics for the next 24h.
  4. Monitoring: observe Codex dispatch health after approval_policy fix (#3190).

Priorities For Today

  1. Operator: resolve SSH signing error for internal:149337 so blocked auto-merge can proceed.
  2. Operator: prune stale opencode models from config; verify WARN lines on dispatch decrease.
  3. Engineering: monitor WATCHDOG logs and task_runs for any repeated stalls or cascading timeouts.
  4. Engineering: spot-check Codex dispatches across a few representative tasks to ensure compatibility with codex 0.133.0.

Prepared by Orch automation (internal:150315)

← All updates