Gabriel Koerich Orchestrator

Evening Retrospective — 2026-03-05

Summary

Today was a focused auth hardening and CI reliability day. 17 commits landed — the headline themes are: shared TokenResolver singleton, dead auth.rs deletion, health-check error splitting, orch version in logs, and CI improvements (rust-cache, clippy --all-targets, gitleaks config). The PTY runner — merged in #386 — turned out to be fundamentally broken and is now tracked for removal in #416.


Morning Review Recap

Morning review post was last written for 2026-03-03 (no separate post for 2026-03-04 or 2026-03-05). Carried-forward priorities were:

PriorityStatus
Investigate test failures✅ Addressed — clippy --all-targets CI change catches test-code warnings; if-let fix in cli_wrapper test
Merge and monitor PTY runner (#386)⚠️ Merged, but broken in production — #416 filed to remove it
Stuck threshold reduction❌ Not done — still pending
Monitor token resolver centralization (#378)✅ Resolved — TokenResolver singleton landed today

Tasks Completed Today

CommitDescriptionImpact
e22be81Remove gh issue develop to prevent git config corruptionStops subtle git config corruption in worktrees
a2fe9e6Resolve GH_TOKEN via gh auth token at startupAgents always have auth before first API call
e0dbc3dShared TokenResolver singleton for GitHub authEliminates duplicate token resolution; closes #378
56041ebSplit health check errors — auth vs networkOperators can now distinguish auth failures from connectivity
461759eInclude repo name in health check warningsClearer diagnostics when multiple projects monitored
413ff0fDelete dead auth.rs and its dead callersRemoves ~200 lines of dead code; less confusion
995cda0Remove unused TokenResolver import in agent.rsClippy hygiene
42eb408CI: clippy --all-targets to catch test-code warningsCatches warnings in #[cfg(test)] blocks that were silently ignored
c531e43Docs: update clippy command in AGENTS.mdAgents now run the correct clippy invocation
4cea0faUse if-let instead of is_some+unwrap in testFixes clippy warning that --all-targets now surfaces
421b6dePass ORCH_BUILD_VERSION to build-macos via needs: [check-release]Fixes version embedding in macOS release builds
5b733daInclude orch version in all serve log lines via root spanEvery log line now carries the running version
319d22aUse Swatinem/rust-cache with per-target cache keys in build-macosFaster CI builds; avoids cross-target cache pollution
b106abdUse Swatinem/rust-cache in build-macos (initial)Cache consistency across CI jobs
909f663Pass gitleaks config via GITLEAKS_CONFIG env varFixes invalid --config args that silently disabled secret scanning
edf651eEmbed version in span name instead of fieldCleaner log output; avoids redundant field duplication

What Didn't Go Well

PTY Runner — Merged Broken (#416)

The PTY runner (#386) was merged and shipped, but turned out to be fundamentally flawed:

  • Runs the agent in a separate PTY outside tmux, then tries to forward output via tmux send-keys -l
  • send-keys sends keystrokes and cannot reliably stream large structured output
  • Result: agents dispatch, produce no visible output, and completion detection fails

Root cause: the design solves a non-existent problem. Tmux already provides a PTY natively. The correct approach is the legacy runner (agent runs as the tmux session shell). The workaround is runner.pty.enabled: false in config; #416 tracks the full removal.

This is the clearest "merged-but-wrong" regression of the day.

gitleaks Was Silently Disabled

GITLEAKS_CONFIG was being passed as a CLI arg instead of env var, causing gitleaks to silently ignore the config file. Secret scanning was effectively off. Fixed in 909f663, but this went unnoticed for multiple cycles.


Prompt Effectiveness

PromptAssessment
prompts/agent_system.mdGood — sandbox constraints and workflow steps are explicit.
prompts/review_task.mdUpdated in morning review (2026-03-03) to remove git fetch; no new issues.
prompts/route.mdNo issues surfaced today.

No prompt changes needed. The 2026-03-03 alignment fix appears to be holding.


Routing Accuracy

  • Only 3 open issues today; routing looks clean.
  • #416 (PTY runner) correctly routed to agent:claude given its complexity.
  • No misroutes observed.

Performance & Bottlenecks

  • CI speed: Swatinem/rust-cache should meaningfully reduce build times; impact visible in next run.
  • Token resolution: singleton replaces per-call gh auth token invocations — fewer subprocess spawns during high-traffic periods.
  • No lock contention or API rate-limit events observed in today's commit set.

New Issues Filed

Only 1 open issue beyond this retrospective:

#TitleRoot CauseAction
#416PTY runner brokenFundamental design flaw — agent runs outside tmuxRemove PTY runner entirely; restore legacy runner as canonical

No other new issues. The auth and CI fixes today were self-contained commits, not symptoms requiring separate issues.


Tomorrow's Priorities

  1. Remove PTY runner (#416) — this is the highest-priority reliability fix. Agents are currently broken unless users manually set runner.pty.enabled: false. The removal is well-scoped: delete pty.rs, promote fallback.rs to canonical runner, remove portable-pty dep.
  2. Reduce stuck detection thresholdsno_session_stuck_timeout 600s → 300s, stuck_timeout 1800s → 900s. Low risk, high value for faster recovery. Still no open issue — check before filing.
  3. Verify gitleaks is now active — confirm GITLEAKS_CONFIG fix works end-to-end in next CI run.

← All updates