yoi/.yoi/tickets/00001KSKBP9YG/thread.md

39 KiB
Raw Blame History

Migrated

Migrated from tickets/e2e-harness.md. No legacy review file was present at migration time.


Decision

E2E scope refinement: TUI/Panel PTY 自動化もこの Ticket の範囲に含める。

背景:

  • Panel mouse selection / Panel Quit latency の直近不具合では、focused unit test と code-path review だけで done 判定し、実端末経路の positive validation / measured validation が不足していた。
  • 既存本文の「TUI バイナリを PTY で叩く方針は採らない」は、blind な固定入力スクリプトや GUI 代替としての ad hoc 操作を避ける意図として扱い、TUI/Panel の実プロセス・実端末入力を検証する automated PTY harness は本 Ticket に含める。
  • Pod protocol/subprocess E2E と TUI/Panel PTY E2E は harness の部品は違うが、どちらも「実プロセスを spawn して user-visible boundary を検証する」ため、別 umbrella に分けず、この E2E harness Ticket の phase として扱う。

方針:

  • 固定 sleep + 固定 input だけの PTY script は採用しない。Harness は UI からの structured feedback を待ってから入力を送る。
  • TUI/Panel には test-only / opt-in の observability route を追加する。これは UI action を bypass する command channel ではなく、状態観測・同期・失敗診断のための read-only probe とする。
  • 実際の keyboard / mouse / Ctrl+C 入力は PTY 経由で送る。Probe は first_drawpanel_snapshot_readyrows_renderedselection_changedactionbar_changedbackground_task_started/finished/abortedquit_requestedterminal_cleanup_started/finishedexit などの structured event を JSONL 等で吐く。
  • Mouse E2E は rows_rendered の row key と screen rect を待ち、SGR mouse sequence を PTY に送って、selection_changed と screen/actionbar/detail の変化を確認する。
  • Quit latency E2E は panel_ready / background work pending などの barrier event を待ってから Ctrl+C / Ctrl+D を送り、quit_requested -> exit の elapsed を測る。非本質 background work が abort/drop され、terminal cleanup が行われることも event で確認する。
  • Screen output は vt100/vte 等の terminal parser で secondary oracle / artifact として保存する。主要同期は structured event に寄せる。
  • Test probe は --tui-test-events <path> 等の明示的な hidden/dev/test flag か e2e feature 配下の構成で有効化し、通常実行・model context・Ticket authority・Pod protocol には影響させない。
  • Failure artifact として event JSONL、input log、screen dump、stdout/stderr、runtime/data/workspace tmpdir の relevant tree、timing summary を保存する。

受け入れ条件の追加案:

  • cargo test -p e2e --features e2e(または同等の opt-in commandで実 yoi panel を PTY 上で起動し、structured probe feedback を待ってから入力する harness が動く。
  • Panel row click E2E: rendered row rect を使って SGR mouse click を送り、selected row が変わることを assertion する。
  • Panel quit latency E2E: ready/pending background work barrier 後に Quit 入力を送り、exit latency が閾値内で、nonessential background work が quit を block しないことを assertion する。
  • Fixed sleep だけに依存する test は不可。ready/barrier event が来なければ screen dump と event log を artifact として失敗する。
  • Probe は read-only observability であり、input/action path を bypass しないことを reviewer が確認する。

Decision

E2E design decision: Playwright-like declarative test API と production binary 非混入を前提にする。

Decision:

  • E2E は ad hoc shell / fixed sleep script ではなく、Rust の独立 crate から宣言的に scenario を書ける構造にする。
  • 例: PanelHarness::spawn(...)panel.wait_for(PanelReady)panel.click(row("ticket", id))panel.expect_selection(...)panel.press(CtrlC)panel.expect_exit_within(...) のように、Playwright 的な wait/action/assertion API を提供する。
  • Harness crate は production binary / normal library API から独立させる。想定配置は tests/e2e/ または crates/e2e_harness + integration tests で、通常 build / release package / normal yoi binary に test harness logic を混ぜない。
  • 本番 binary に混ぜる必要があるものは、原則として「既存 TUI state から read-only diagnostic event を emit するための最小 test hook」に限定する。その hook も normal runtime では無効で、明示 feature / hidden dev flag / cfg(test/e2e) 等でしか有効化しない。
  • E2E harness は production code の内部関数を直接呼んで state mutation しない。入力は PTY、観測は structured test events / terminal screen parser、assertion は harness 側で行う。
  • Structured events は protocol authority ではなく test observability artifact として扱う。Ticket/Pod authority や user-visible semantics を変えない。

Rationale:

  • 今回の Panel mouse / Quit latency の失敗は、unit/focused tests と code-path review だけでは user-visible terminal behavior を保証できないことを示した。
  • 一方で fixed sleep + input script は再現性・診断性が低く、ready 状態や background work barrier を確認できない。
  • Playwright-like API なら、test は「何を待ち、何を入力し、何を観測するか」を宣言的に表現でき、失敗時に event log / screen dump / timing artifact を残せる。
  • Production binary への混入を避けることで、release behavior / binary size / authority surface / model-visible surfaces を汚さない。

Acceptance refinement:

  • E2E test author が fixed sleep ではなく wait_for / expect / within を使って Panel/TUI scenario を書ける。
  • Mouse selection と Quit latency の regression は、この declarative harness API 上の scenario として表現される。
  • Test-only observability route は opt-in であり、release/normal execution では無効または到達不能であることを reviewer が確認する。
  • Failure artifact に scenario step、last observed events、screen snapshot、timing、binary path、workspace/runtime dirs が含まれる。

Decision

Routing decision: implementation_ready

Reason:

  • ユーザーが E2E harness を 1 Ticket として扱い、Playwright-like declarative API、structured feedback、production binary 非混入を前提に進めることを明示した。
  • Ticket body は旧名/旧構成を含むが、thread decisions により現在の binding direction は明確化済み: Pod subprocess/protocol E2E と TUI/Panel PTY E2E を同じ harness Ticket の phase として扱う。
  • 直近の Panel mouse selection / Panel Quit latency の regression から、実プロセス・実 PTY・structured event feedback・failure artifact を最小スライスに含める必要がある。
  • TicketRelationQuery では durable blocker はなく、関連 Ticket は context link のみ。
  • Orchestrator worktree は clean。implementation side effect は state acceptance 後に dedicated child worktree で行う。

Evidence checked:

  • Ticket body / thread decisions。
  • relation records: 00001KV072V89 / 00001KV0723PC への related links。
  • orchestration plan records: なし。
  • current workspace state: Orchestrator worktree clean、queued/inprogress work なし、implementation child Pods なし。
  • project context: AGENTS guidance の E2E 未設計、prompt/resource boundary、production binary contamination 回避方針、直近 Panel validation failure records。

IntentPacket:

Intent:

  • Yoi の E2E testing foundation を、実プロセス spawn と TUI/Panel PTY automation の両方を扱える opt-in harness として導入する。
  • 最初の vertical slice は、Playwright-like declarative API、structured UI feedback、failure artifact、Panel mouse selection / Panel quit latency の regression scenario を実装できる形にする。

Binding decisions / invariants:

  • E2E harness は independent crate / test surface とし、normal release / normal yoi binary に harness logic を混ぜない。
  • 本番 binary 側に必要な変更は opt-in read-only observability hook に限定する。UI action/state mutation を test hook で bypass しない。
  • 実入力は PTY 経由で送る。structured event は synchronization / assertion / artifact のための観測情報であり、authority channel ではない。
  • fixed sleep + fixed input だけの blind script を acceptance にしない。
  • Pod/Ticket authority、prompt/resource boundary、public runtime behavior を E2E 都合で歪めない。

Requirements / acceptance criteria:

  • E2E author が Rust code で spawn / wait_for / click / press / expect_* / within を使って scenario を宣言的に書ける。
  • Opt-in command例: cargo test -p e2e --features e2e または同等)で通常 CI 既定から分離される。
  • TUI/Panel test は panel ready / rows rendered / selection changed / background task / quit events など structured feedback を待ってから PTY input を送る。
  • Panel mouse selection regression と Panel quit latency regression の少なくとも skeleton または minimal passing scenario が declarative harness 上で表現される。
  • Failure artifact として event log、input log、screen dump、timing、binary path、workspace/runtime dirs が残る。
  • Production binary contamination がないこと、または opt-in hook が normal runtime で無効/到達不能であることを reviewer が確認できる。

Implementation latitude:

  • tests/e2e/ crate か crates/e2e_harness + integration tests のどちらに置くかは Coder が codebase constraints を見て選んでよい。ただし normal build/release contamination は避ける。
  • PTY crate、terminal parser、event JSONL format、fixture workspace builder の具体設計は Coder が選んでよい。
  • 最初の slice は full provider E2E ではなく、Panel/TUI harness と minimal process lifecycle / artifact foundation を優先してよい。
  • 既存旧名 INSOMNIA_* / pod references は現在の yoi / config surface に合わせて整理してよい。

Escalate if:

  • read-only observability hook では足りず、production UI action path を test-only command channel で直接操作したくなる場合。
  • normal release binary / normal CLI surface に test-only options を露出させる必要がある場合。
  • workspace structure、Cargo package layout、Nix/package source filter に大きな変更が必要になる場合。
  • Provider stub / Pod protocol E2E まで同時に広げないと Panel slice が進められない場合。

Validation:

  • focused E2E harness tests / example scenarios。
  • cargo fmt --check
  • git diff --check
  • 変更範囲に応じて cargo check --workspace --all-targets または narrower package checks。
  • 新 E2E command が opt-in で実行可能であることを report する。

Current code map:

  • crates/yoi / CLI launch path: hidden/test-only flag injection の候補。
  • crates/tui/src/multi_pod.rs: Panel events / observable state emission の候補。
  • tests/e2e/ or new harness crate: declarative scenario API / PTY runner / artifact collector。
  • root Cargo.toml / package metadata: opt-in package registration と release contamination check。

Critical risks / reviewer focus:

  • Harness code が production binary に混ざっていないこと。
  • Observability hook が read-only で、input/action path を bypass していないこと。
  • Test が fixed sleep 依存ではなく structured feedback / timeouts / artifacts を持つこと。
  • Panel mouse / quit latency regression が今後「unit test だけで done」にならない程度の user-visible path を cover すること。

Intake summary

ユーザー確認により、既存 E2E harness Ticket は Pod subprocess E2E と TUI/Panel PTY E2E を一つの実装対象として扱う。Playwright-like declarative API、independent opt-in crate、production binary 非混入、read-only structured observability、PTY input、failure artifact、Panel mouse / quit latency regression scenario が受け入れ方向として明確化済み。


State changed

Ticket planning が完了しました。state planning -> ready。


State changed

Ticket を yoi ticket が queued にしました。


State changed

ユーザーが明示的に inprogress 化して進めることを承認した。Ticket evidence / relation records / OrchestrationPlan accepted plan / Orchestrator worktree clean state を確認済みで、blocking relation はない。Implementation side effect の前に state を inprogress として記録する。


Implementation report

Implementation handoff started.

  • Worktree: /home/hare/Projects/yoi/.worktree/e2e-harness
  • Branch: ticket-00001KSKBP9YG-e2e-harness
  • Coder Pod: coder-00001KSKBP9YG-e2e
  • Scope: child worktree read、root Cargo.toml / Cargo.lock write、tests/e2e write、selected crates (crates/tui, crates/yoi, crates/pod, crates/protocol) write、target write、this Ticket record write。.yoi/memory や local/runtime/log/lock/secret-like .yoi paths は write scope に含めていない。
  • Note: runtime launch validation のため /home/hare/Projects/yoi に非再帰 read grant を付けたが、Coder には root/original workspace を inspect/write/git/validate/merge/cleanup しないよう明示済み。

Next:

  • Coder は Playwright-like declarative E2E harness、read-only structured observability、PTY input、Panel mouse / quit latency regression scenario の first slice を実装する。
  • Coder の commit / implementation_report / validation evidence を確認後、Reviewer を read-only 基本で起動する。

Decision

Design note: Panel mouse E2E は raw SGR sequence を固定送信するだけでは不十分。

Rationale:

  • Harness が PTY に直接 ESC [ < ... M を書くと、実端末が mouse capture 有効時だけ mouse sequence を生成するという条件を bypass してしまい、今回のような「実端末ではイベントが来ない」系の不具合を見逃す可能性がある。
  • Mouse E2E は locator/row rect を使って coordinate を決めるだけでなく、TUI が mouse capture mode を有効化したことも検証する必要がある。

追加すべき test harness 要件:

  • virtual terminal / screen parser は app output の terminal mode sequence例: normal mouse + SGR mouse mode, ?1000h / ?1006h 相当)を tracking し、click helper は capture enabled が観測されていない場合に fail する。
  • もしくは explicit read-only structured event と terminal output assertion の両方で mouse_capture_enabled を確認する。
  • click(row_selector)rows_rendered の rect から座標を選ぶが、raw SGR injection だけで成功扱いにしない。
  • assertion は selection_changed event、screen/actionbar/detail の secondary oracle、そして workflow action が発火していないことを組み合わせる。
  • これにより mouse selection E2E は面倒になるが、capture setup / coordinate mapping / event handling の user-visible path を分離して検証できる。

Decision

Coder first-slice review: changes requested before external review.

The initial E2E harness direction is promising, but it does not yet satisfy the regression-prevention bar from this Ticket.

Required changes sent to Coder:

  1. Mouse E2E must not pass solely by injecting raw SGR mouse bytes. Harness must track terminal mouse capture enable output (?1000h / ?1006h equivalent) and fail click(...) / expose expect_mouse_capture_enabled() when capture was not observed.
  2. Production binary contamination must be reduced. The TUI JSONL observer currently appears to be available in normal builds via YOI_TUI_TEST_EVENTS; prefer explicit e2e-test feature/cfg gating so release/normal binary has the hook compiled out. Report if feature gating is too large.
  3. Quit latency E2E must wait for a real pending/background-work barrier at the moment of quit, not merely assert that background_task_started happened sometime earlier. Strengthen the scenario to prove Ctrl+C while pending work exists exits promptly.
  4. Update implementation_report and validation evidence after changes.

Reason:

  • The mouse selection regression specifically needs to catch missing terminal mouse capture, which raw SGR injection can bypass.
  • The user explicitly requested a structure where E2E harness logic does not mix into the production binary.
  • The quit latency regression needs measured user-visible behavior under a synchronized pending-work condition, not a loose startup smoke test.

Implementation report

Implemented an opt-in E2E testing foundation for real yoi panel process automation.

API / harness shape:

  • Added workspace package tests/e2e (yoi-e2e) with required feature e2e, so scenarios are opt-in.
  • Added PanelHarness::spawn(...), wait_for(...), wait_for_rows(...), click(...), press(...), expect_selection(...), expect_exit_within(...), and artifacts().
  • Harness drives the real yoi panel binary through a Unix PTY and sends SGR mouse / keyboard bytes through that PTY only.
  • Harness captures artifacts under a per-run artifact directory: events.jsonl, input.log, pty-output.log, and run.json.

Production / non-production boundary:

  • Harness logic stays in tests/e2e and is not mixed into production crates.
  • Production-side change is limited to an opt-in read-only TUI JSONL observer enabled only by YOI_TUI_TEST_EVENTS.
  • Observer records Panel/TUI synchronization and assertion events (panel_ready, rows_rendered, selection_changed, mouse_click, action_requested, quit_requested, background task lifecycle, terminal cleanup, exit).
  • The observer does not mutate UI state, inject input, bypass actions, or grant authority; real input remains PTY-only.

Scenarios added:

  • Panel mouse selection regression: waits for rendered rows, sends an SGR mouse click through PTY, asserts selection changed, and asserts no panel action was dispatched.
  • Panel quit latency regression: waits for Panel ready plus background-task barrier, sends Ctrl+C through PTY, asserts clean exit within the threshold, and verifies the quit event.

Files changed:

  • Cargo.toml, Cargo.lock
  • crates/tui/src/lib.rs
  • crates/tui/src/multi_pod.rs
  • crates/tui/src/e2e_observer.rs
  • tests/e2e/Cargo.toml
  • tests/e2e/src/lib.rs
  • tests/e2e/tests/panel.rs

Validation:

  • cargo build -p yoi — passed.
  • YOI_E2E_BIN=/home/hare/Projects/yoi/.worktree/e2e-harness/target/debug/yoi cargo test -p yoi-e2e --features e2e --test panel -- --nocapture — passed (2 tests).
  • cargo fmt --check — passed.
  • cargo check -p tui --all-targets — passed.
  • cargo check -p yoi --all-targets — passed.
  • cargo check -p yoi-e2e --all-targets --features e2e — passed.
  • git diff --check — passed.

Remaining gaps / risks:

  • The first slice is Unix PTY-based; cross-platform PTY support is not implemented.
  • The screen artifact is currently raw PTY output rather than a parsed terminal snapshot.
  • Fixture setup blocks Companion/Orchestrator role auto-spawn by placing intentionally invalid isolated test pod metadata in the fixture data dir; this avoids leaking live background Pods but should be replaced by a cleaner fixture strategy if the Panel gains an explicit test fixture mode.

Implementation report

Follow-up implementation update for review-required changes:

Mouse E2E false-positive prevention:

  • PanelHarness now tracks the PTY output for terminal mouse mode enable/disable escape sequences.
  • It requires both normal mouse tracking (ESC[?1000h, with disable ESC[?1000l) and SGR mouse mode (ESC[?1006h, with disable ESC[?1006l) to have been observed as currently enabled.
  • Added PanelHarness::expect_mouse_capture_enabled() and the Panel mouse E2E calls it before clicking.
  • PanelHarness::click(...) also fails with MouseCaptureNotEnabled if mouse capture has not been observed, so raw SGR mouse injection alone cannot make the test pass.

Production binary contamination reduction:

  • Added explicit feature gates:
    • tui/e2e-test
    • yoi/e2e-test = ["tui/e2e-test"]
  • The YOI_TUI_TEST_EVENTS JSONL event path and the background-task hold seam are compiled only under tui/e2e-test.
  • Normal builds still type-check a no-op facade so call sites stay simple, but env-var observability/hold behavior is compiled out unless the feature is enabled.
  • E2E binary build command is now: cargo build -p yoi --features e2e-test.

Quit latency barrier strengthening:

  • Added a feature-gated test-only seam YOI_TUI_TEST_HOLD_BACKGROUND_TASK=reload that holds the Panel reload task pending in the E2E binary until quit aborts it.
  • Added PanelHarness::expect_background_task_pending("reload"), which verifies that a background_task_started event exists and no matching background_task_finished or background_task_aborted event has superseded it.
  • The quit latency test now uses the hold seam and asserts pending reload work immediately before sending Ctrl+C through the PTY.

Harness/artifact adjustment:

  • E2E fixture artifacts now persist under workspace target/e2e-artifacts/... instead of package-local temporary dirs, so failure artifacts remain inspectable.

Validation after update:

  • cargo fmt --check — passed.
  • cargo build -p yoi --features e2e-test — passed.
  • YOI_E2E_BIN=/home/hare/Projects/yoi/.worktree/e2e-harness/target/debug/yoi cargo test -p yoi-e2e --features e2e --test panel -- --nocapture — passed (2 tests).
  • git diff --check — passed.
  • cargo check -p tui --all-targets — passed.
  • cargo check -p yoi --all-targets — passed.
  • cargo check -p tui --all-targets --features e2e-test — passed.
  • cargo check -p yoi --all-targets --features e2e-test — passed.
  • cargo check -p yoi-e2e --all-targets --features e2e — passed.

Remaining gaps / risks:

  • The E2E harness remains Unix PTY based.
  • The screen artifact remains raw PTY output, not a parsed terminal snapshot.
  • The test-only pending-task hold intentionally changes timing only for the e2e-test binary; it is not compiled into normal builds.

Review: request changes

Request changes.

Evidence reviewed:

  • Inspected Ticket record and git diff 134e8b8b..HEAD for commits 96561897 and 10a1c383.
  • tests/e2e provides a credible first declarative harness (PanelHarness::spawn, wait_for, wait_for_rows, click, press, expect_selection, expect_exit_within, artifacts/metadata/input/output/event logs). This is not merely a fixed-sleep shell script.
  • Mouse-selection scenario waits for rendered rows, verifies both normal mouse and SGR mouse capture before click, sends the click through PTY bytes, waits for selection_changed, and asserts no action_requested dispatch.
  • Quit-latency scenario creates a real feature-gated background-task hold barrier, waits until the task is actually waiting before sending Ctrl+C through the PTY, and measures bounded exit latency.
  • yoi-e2e is opt-in via package feature/test required-features = ["e2e"]; e2e tests are outside default members. YOI_TUI_TEST_EVENTS and YOI_TUI_TEST_HOLD_BACKGROUND_TASK env behavior is behind tui/e2e-test / yoi/e2e-test feature gates, and the hook is observability-only.

Required change:

  • The normal production build still contains/evaluates too much e2e harness glue. In non-e2e-test builds, crates/tui/src/e2e_observer.rs exposes no-op emit/hold functions, but call sites still execute test-specific data construction. In particular App::emit_rows_rendered and its panel row key/rect DTOs are compiled unconditionally and app.emit_rows_rendered() is called from the panel render path, causing row snapshots to be built every draw even though emission is a no-op. Selection/action/quit call sites also construct serde_json::json! payloads before the no-op facade. This violates the recorded boundary that production binaries should not contain harness logic and production-side hooks must be feature-gated/compiled out for normal builds.
    • Please cfg-gate the call sites/helpers/DTOs, or use a lazy cfg-gated macro/helper so normal builds do not evaluate or retain e2e event payload construction. A tiny compile-only facade is acceptable only if it does not execute or allocate e2e-specific work and does not keep harness DTO logic in the normal runtime path.

Validation run in /home/hare/Projects/yoi/.worktree/e2e-harness:

  • git diff --check 134e8b8b..HEAD — passed.
  • cargo fmt --check — passed.
  • cargo check -p tui --all-targets — passed.
  • cargo check -p yoi --all-targets — passed.
  • cargo build -p yoi --features e2e-test — passed.
  • YOI_E2E_BIN=/home/hare/Projects/yoi/.worktree/e2e-harness/target/debug/yoi cargo test -p yoi-e2e --features e2e --test panel -- --nocapture — passed.
  • cargo check -p tui --all-targets --features e2e-test — passed.
  • cargo check -p yoi --all-targets --features e2e-test — passed.
  • cargo check -p yoi-e2e --all-targets --features e2e — passed.

No source changes were made during review.


Implementation report

Follow-up implementation update for reviewer request 559adb9a ticket: request e2e harness changes:

Production/non-production boundary fix:

  • Removed the normal-build no-op e2e_observer module path and now compile the observer module only behind tui/e2e-test via #[cfg(feature = "e2e-test")] mod e2e_observer;.
  • Added #[cfg(feature = "e2e-test")] to all Panel/TUI observability call sites that build JSON payloads, including terminal cleanup/exit, panel ready, rows-rendered, mouse click, selection changes, action requests, quit request, and background task lifecycle events.
  • App::emit_rows_rendered, PanelE2eRowKey, PanelE2eRect, PanelE2eRenderedRow, PanelE2eRowsRendered, and the conversion helpers are now compiled only with tui/e2e-test.
  • Normal builds no longer call app.emit_rows_rendered(), no longer evaluate serde_json::json! e2e payloads, and no longer retain the Panel E2E DTO/helper logic in the runtime path.
  • The background reload hold seam remains compiled/called only under tui/e2e-test; YOI_TUI_TEST_EVENTS and YOI_TUI_TEST_HOLD_BACKGROUND_TASK behavior remains feature-gated.

Preserved E2E behavior:

  • Mouse E2E still verifies PTY output for normal mouse tracking + SGR mouse enable sequences before any raw SGR click can be sent.
  • PanelHarness::click(...) still fails if mouse capture was not observed.
  • Quit latency E2E still uses the feature-gated pending reload hold barrier and asserts the reload task is pending before Ctrl+C.

Validation:

  • cargo fmt --check — passed.
  • cargo check -p tui --all-targets — passed.
  • cargo check -p yoi --all-targets — passed.
  • cargo check -p tui --all-targets --features e2e-test — passed.
  • cargo check -p yoi --all-targets --features e2e-test — passed.
  • cargo build -p yoi --features e2e-test — passed.
  • YOI_E2E_BIN=/home/hare/Projects/yoi/.worktree/e2e-harness/target/debug/yoi cargo test -p yoi-e2e --features e2e --test panel -- --nocapture — passed (2 tests).
  • cargo check -p yoi-e2e --all-targets --features e2e — passed.
  • git diff --check — passed.

Remaining gaps / risks unchanged:

  • The E2E harness remains Unix PTY based.
  • The screen artifact remains raw PTY output rather than a parsed terminal snapshot.

Review: approve

Approve.

Delta reviewed:

  • Re-reviewed the fix commit b30b43b9 test: cfg-gate e2e observer payloads after the earlier request-changes review.
  • Inspected the updated observer module boundary and call sites in crates/tui/src/lib.rs and crates/tui/src/multi_pod.rs, plus the unchanged harness/tests in tests/e2e.

Evidence:

  • e2e_observer is now only compiled from crates/tui/src/lib.rs under #[cfg(feature = "e2e-test")]; the previous normal-build no-op facade was removed.
  • Observer payload construction is gated at call sites with #[cfg(feature = "e2e-test")], including panel_ready, selection_changed, action_requested, quit_requested, and emit_rows_rendered calls.
  • Panel E2E DTOs/helpers (PanelE2eRowKey, PanelE2eRect, PanelE2eRenderedRow, PanelE2eRowsRendered, App::emit_rows_rendered) are now behind #[cfg(feature = "e2e-test")], so the normal panel render path no longer builds row snapshots or retains that runtime helper path.
  • The background-task hold seam is still feature-gated: check_background_task_hold and release_background_task_hold calls are under #[cfg(feature = "e2e-test")], and YOI_TUI_TEST_HOLD_BACKGROUND_TASK behavior lives in the gated observer module.
  • Mouse capture tracking remains intact in the harness: it tracks ?1000h and ?1006h, click(...) requires both capture modes before injecting PTY bytes, the test waits for rendered rows, asserts selection_changed, and asserts no action_requested dispatch.
  • Quit-latency coverage remains intact: the test waits for panel_ready, then verifies an actual pending reload background-task barrier before sending Ctrl+C through the PTY and asserting bounded exit.
  • The production/non-production boundary now satisfies the Ticket intent: the harness remains opt-in, observability is read-only and feature-gated, and no UI input/action path is bypassed.

Validation run in /home/hare/Projects/yoi/.worktree/e2e-harness:

  • git diff --check 134e8b8b..HEAD — passed.
  • cargo fmt --check — passed.
  • cargo check -p tui --all-targets — passed.
  • cargo check -p yoi --all-targets — passed.
  • cargo check -p tui --all-targets --features e2e-test — passed.
  • cargo check -p yoi --all-targets --features e2e-test — passed.
  • cargo build -p yoi --features e2e-test — passed.
  • YOI_E2E_BIN=/home/hare/Projects/yoi/.worktree/e2e-harness/target/debug/yoi cargo test -p yoi-e2e --features e2e --test panel -- --nocapture — passed; 2 tests passed.
  • cargo check -p yoi-e2e --all-targets --features e2e — passed.

No source changes were made during re-review.


Implementation report

Integration outcome: merged and validated.

  • Implementation branch: ticket-00001KSKBP9YG-e2e-harness
  • Implementation commits: 96561897 test: add opt-in panel e2e harness, 10a1c383 test: harden panel e2e harness, b30b43b9 test: cfg-gate e2e observer payloads
  • Review commits/events: initial request_changes for production-boundary leakage, then approve after e2e-test cfg gating。
  • Orchestrator merge commit: b3bd6b11 merge: e2e harness

Implemented first vertical slice:

  • Added opt-in tests/e2e package yoi-e2e with Playwright-like PanelHarness API for real yoi panel process automation through PTY.
  • Added feature-gated read-only TUI observer behind tui/e2e-test / yoi/e2e-test; normal builds do not compile observer module, event payload construction, row DTOs, or background hold seam.
  • Added Panel mouse selection E2E that waits for rendered rows, verifies terminal mouse capture output (?1000h and ?1006h), sends click through PTY, asserts selection change, and asserts no action dispatch.
  • Added Panel quit latency E2E that creates a feature-gated pending reload barrier, sends Ctrl+C through PTY, and asserts bounded exit.
  • Artifacts include event log, input log, raw PTY output, and run metadata under target/e2e-artifacts.

Orchestrator validation after merge:

  • cargo fmt --check: PASS
  • cargo check -p tui --all-targets: PASS
  • cargo check -p yoi --all-targets: PASS
  • cargo check -p tui --all-targets --features e2e-test: PASS
  • cargo check -p yoi --all-targets --features e2e-test: PASS
  • cargo build -p yoi --features e2e-test: PASS
  • YOI_E2E_BIN=/home/hare/Projects/yoi/.worktree/orchestration/yoi-orchestrator/target/debug/yoi cargo test -p yoi-e2e --features e2e --test panel -- --nocapture: PASS2 tests
  • cargo check -p yoi-e2e --all-targets --features e2e: PASS
  • git diff --check: PASS

Remaining gaps / risks:

  • Harness is Unix PTY based。
  • Screen artifact is raw PTY output rather than parsed terminal snapshot。
  • This is a first vertical slice for Panel/TUI PTY E2E; broader Pod protocol/provider stub scenarios remain future work under this E2E harness direction。

Next:

  • Mark Ticket done and clean up child coder/reviewer Pods plus implementation worktree/branch. Closure remains separate.

State changed

E2E harness implementation branch was reviewed, approved after requested production-boundary changes, merged into the Orchestrator branch as b3bd6b11, and validated in the Orchestrator worktree. Opt-in Panel PTY E2E tests for mouse selection and quit latency passed, along with formatting, diff check, normal/e2e feature package checks, and E2E package check. Ticket implementation work is done; closure remains separate.


Comment

Post-merge design gap: E2E harness の yoi binary freshness はまだ自動保証されていない。

Current behavior:

  • tests/e2e/src/lib.rs::yoi_binary()YOI_E2E_BIN があればその path を使う。
  • YOI_E2E_BIN が無い場合は E2E test binary の current_exe() から target/{debug,release}/yoi を推測し、最後に target/debug/yoi へ fallback する。
  • Harness は PanelHarness::spawn と fixture setup commands の両方でその binary path を使い、YOI_POD_RUNTIME_COMMAND も同じ binary に向ける。
  • しかし harness 自身は cargo build -p yoi --features e2e-test を実行しない。したがって任意タイミングの cargo test -p yoi-e2e --features e2e だけでは、最新 source から rebuild された binary が使われる保証はない。

Gap:

  • 今回の validation は Orchestrator が事前に cargo build -p yoi --features e2e-test を実行したため正しい binary を使った。
  • ただし harness design としては freshness が runner/manual discipline に依存しており、stale target/debug/yoi や別 path の YOI_E2E_BIN を使っても test が走り得る。

Follow-up direction:

  • cargo xtask e2e / yoi-e2e-runner / documented just e2e など、必ず cargo build -p yoi --features e2e-test を実行してから YOI_E2E_BIN=<fresh target binary> cargo test -p yoi-e2e --features e2e ... する single entrypoint が必要。
  • さらに harness は起動 binary が e2e-test feature 有効であることを handshake/event/version で検証し、可能なら source commit/build timestamp/path metadata を artifact に残して stale/mismatched binary を diagnostic にするべき。

Decision

Follow-up design note: E2E の yoi binary freshness は cargo run 直起動より、harness 内 cargo build + built binary spawn を標準にする。

Decision candidate:

  • cargo test -p yoi-e2e --features e2e の test setup から cargo build -p yoi --features e2e-test --bin yoi を実行することは可能で、opt-in E2E では許容する。
  • ただし PTY scenario の process-under-test を cargo run ... -- panel にするのは避ける。Cargo wrapper の build output、process tree、signal forwarding、exit timing が混ざり、Panel quit latency の測定対象が曖昧になるため。
  • Harness には BinaryProvider::CargoBuild のような起動経路を持たせ、test 開始時に current workspace source から yoi を build し、得られた target/{profile}/yoi path を PTY で直接 spawn する。
  • これにより「任意タイミングの E2E 実行で最新 source から作った binary を使う」ことを起動経路として保証しつつ、実際の UI/latency 測定は Cargo wrapper ではなく yoi binary 本体を対象にできる。
  • 複数 test の重複 build は OnceLock/suite setup 等で 1 回にまとめる。parallel test 実行時の cargo target lock 待ちは opt-in E2E では許容し、必要なら serial 化する。

Rationale:

  • 起動後 handshake で正しさを検証するより、起動経路として build step を harness に内蔵する方が単純。
  • cargo run は可能だが、run は build + wrapper spawn を同時に行うため、PTY/Signal/timing の被測定経路に Cargo が入ってしまう。cargo build と direct binary spawn に分ける方が E2E の oracle が明確。

Decision

Follow-up requested by user: E2E harness should build the current yoi binary itself instead of relying on a prebuilt YOI_E2E_BIN / inferred target/debug/yoi.

Required correction:

  • Default E2E binary provider should run cargo build -p yoi --features e2e-test --bin yoi from the workspace root at test time, then spawn the resulting target/{profile}/yoi directly through PTY.
  • YOI_E2E_BIN may remain as an explicit override, but normal arbitrary cargo test -p yoi-e2e --features e2e ... should use a freshly built binary without requiring a separate manual build step.
  • Do not use cargo run as the process-under-test because that would put Cargo in the PTY/signal/quit-latency measurement path.
  • Preserve the existing production/non-production boundary and E2E feature gating.