yoi/.yoi/tickets/closed/20260608-015630-abort-podclient-reader-task-on-drop/thread.md

309 lines
15 KiB
Markdown

<!-- event: create author: LocalTicketBackend at: 2026-06-08T01:56:30Z -->
## Created
Created by LocalTicketBackend create.
---
<!-- event: intake_summary author: intake at: 2026-06-08T02:40:51Z -->
## Intake summary
Existing Ticket refined enough for Orchestrator routing. Request is an implementation-ready bug fix: make client::PodClient own/cancel its background Unix-socket reader task on drop so short-lived panel probes do not leak file descriptors. Acceptance focuses on Drop-triggered reader abort, preserving normal event receive behavior while alive, regression coverage for repeated connect/drop cleanup, and validation via focused client tests plus formatting/diff/ticket doctor checks. needs_preflight: false; risk_flags: [async-cancellation, pod-client-lifecycle, fd-leak].
---
<!-- event: state_changed author: intake at: 2026-06-08T02:40:51Z from: intake to: ready reason: intake_ready field: workflow_state -->
## State changed
Intake marked ready: requirements, acceptance criteria, implementation boundary, validation, and escalation surface are clear enough for Orchestrator routing. No implementation is started by Intake.
---
<!-- event: state_changed author: workspace-panel at: 2026-06-08T02:40:58Z from: ready to: queued reason: queued field: workflow_state -->
## State changed
Ticket queued for Orchestrator routing.
---
<!-- event: decision author: orchestrator at: 2026-06-08T02:41:57Z -->
## Decision
Routing decision: implementation_ready
Reason:
- Intake marked this as ready with `needs_preflight: false`; the Ticket describes a concrete bug and a narrow lifecycle fix.
- Requirements and acceptance criteria are observable: `PodClient` must own/cancel its background reader task, dropping the client must abort the reader and release the read half promptly, and existing send/receive behavior must remain intact.
- The relevant code map is narrow: `crates/client/src/pod_client.rs` owns `PodClient::connect`, the spawned reader task, send/recv methods, and existing client tests; `crates/tui/src/pod_list.rs` exercises repeated short-lived live status probes.
- Remaining uncertainty is bounded async test design, not a product/API/authority boundary decision.
Evidence checked:
- Ticket body: background, goal, requirements, acceptance criteria.
- Thread: intake summary and `ready -> queued` event.
- Workspace state: no matching branch/worktree exists; main has unrelated Ticket-record changes for `allow-spawnpod-child-workspace-cwd` that are understood and outside the implementation paths.
- Code map search for `PodClient`, `PodClient::connect`, reader task spawn, live status probing, and `try_connect_live_pod`.
- Ticket doctor: 0 errors; existing warnings are unrelated legacy closed-Ticket diagnostics.
IntentPacket:
Intent:
- Fix the `PodClient` reader-task lifecycle so short-lived clients used by panel/live Pod probing do not leak Unix socket file descriptors.
Binding decisions / invariants:
- `PodClient` owns the lifetime of the background reader task created by `PodClient::connect()`.
- Dropping `PodClient` must cancel/abort the reader task without relying on remote socket close.
- Aborting the reader task must drop the read half of the Unix socket promptly.
- While `PodClient` is alive, normal event receive behavior must continue to work.
- Existing one-shot clients, TUI clients, Ticket role launcher communication, and Pod-management tools must continue using `PodClient` normally.
- Do not change Pod socket protocol, event format, controller behavior, registry semantics, or panel polling policy unless a focused test shows a minimal caller adjustment is necessary.
Requirements / acceptance criteria:
- Store the reader task `JoinHandle` in `PodClient` or provide equivalent owned cancellation.
- Implement `Drop` for `PodClient` to abort/cancel the reader task.
- Preserve `send`, `try_recv_event`, and `recv_event` behavior while alive.
- Add focused regression coverage proving drop aborts/cleans up the reader task or otherwise closes repeated connect/drop clients without leaking stuck reader tasks.
- Add/adjust live status probe tests if needed.
- Avoid brittle FD-count-only tests if a deterministic task/connection cleanup assertion is more robust.
Implementation latitude:
- Coder may choose exact field names, test seams, and whether to expose test-only observability through existing module tests.
- Coder may use server-side connection closure or task completion probes in tests if they deterministically prove the dropped client no longer holds the socket read half.
- Coder may add a small helper if it improves readability, but should avoid broad refactors.
Escalate if:
- Fixing reader-task cleanup requires changing `PodClient` public API or Pod socket protocol.
- Drop-based cancellation conflicts with long-lived TUI event consumption.
- Deterministic regression coverage is impossible without invasive test-only hooks.
- The observed FD leak appears to come from a different source after inspecting `PodClient`.
Validation:
- Focused client tests for `pod_client` / reader-task drop behavior.
- Relevant TUI live status probe tests if changed.
- `cargo test -p client ... --lib` or focused equivalent selected by coder.
- `cargo test -p tui pod_list` if status probing is touched.
- `cargo fmt --check`.
- `git diff --check`.
- `cargo run -q -p yoi -- ticket doctor`.
- If runtime/client behavior is touched, final merge-completion should include `nix build .#yoi`.
Current code map:
- `crates/client/src/pod_client.rs`: `PodClient` struct, `connect`, spawned reader task, send/recv tests.
- `crates/tui/src/pod_list.rs`: repeated `PodClient::connect` live status probing and existing probe tests.
- `crates/tui/src/single_pod.rs` and `crates/client/src/ticket_role.rs`: long-lived / one-shot client users to preserve conceptually.
Critical risks / reviewer focus:
- Dropping `PodClient` must abort the reader task even if the server stays open and silent.
- The fix must not drop or abort the reader while a live client is still expected to receive events.
- Regression tests should fail against the old behavior rather than only checking that a field exists.
- Avoid masking leaks by relying only on remote close or timeout behavior.
---
<!-- event: state_changed author: orchestrator at: 2026-06-08T02:42:04Z from: queued to: inprogress reason: orchestrator_acceptance field: workflow_state -->
## State changed
Accepted queued implementation after reading the Ticket, workspace state, and `PodClient` code map. This acceptance precedes worktree creation and coder/reviewer Pod spawning.
---
<!-- event: plan author: orchestrator at: 2026-06-08T02:42:29Z -->
## Plan
Implementation routing accepted and worktree created.
Worktree plan:
- Branch: `abort-podclient-reader-task-on-drop`
- Worktree: `.worktree/abort-podclient-reader-task-on-drop`
- Base: `89b4369 ticket: route podclient reader cleanup`
- Scope: implement `PodClient` reader-task owned cancellation and focused cleanup tests.
Delegation plan:
- Spawn sibling coder Pod with write scope limited to `.worktree/abort-podclient-reader-task-on-drop`.
- Coder must run Bash commands from the child worktree, not from the main workspace.
- Coder must not edit main-workspace `.yoi` / Ticket / workflow / docs records, and must not create generated memory/local/runtime/secret-like files.
- Reviewer will be spawned read-only after coder reports diff/commits and validation evidence.
Concurrency note:
- Main workspace has unrelated Ticket-record edits for other queued/intake work. This implementation is isolated to the child worktree and should not edit those records.
Stop/completion policy:
- Because the human authorized cleanup after completion, this Orchestrator will proceed through reviewer approval, merge-completion validation, Ticket close, Pod stop, worktree removal, and branch deletion if the branch is approved and merge authority remains clear.
---
<!-- event: implementation_report author: orchestrator at: 2026-06-08T02:50:42Z -->
## Implementation report
Coder completed and reviewer delegated.
Coder result:
- Pod: `coder-abort-podclient-reader`
- Commit: `aec75b3 fix: abort PodClient reader task on drop`
- Worktree status: clean after commit.
- Diff against `develop`: 1 file changed, 143 insertions, 2 deletions.
Files touched by coder:
- `crates/client/src/pod_client.rs`
Coder reported implementation:
- Added owned `reader_task: JoinHandle<()>` field to `PodClient`.
- Stored the background reader task created by `PodClient::connect()`.
- Implemented `Drop` for `PodClient` to abort the reader task.
- Preserved send and receive behavior while `PodClient` is alive.
- Added focused tests for receiving while alive, sending while alive, repeated connect/drop connection cleanup, and aborting a blocked reader so the Unix socket read half is released.
Coder reported validation:
- `cargo test -p client pod_client --lib`
- `cargo test -p client --lib`
- `cargo test -p tui pod_list --lib`
- `cargo fmt --check`
- `git diff --check`
- `cargo run -q -p yoi -- ticket doctor`
- `nix build .#yoi`
Reviewer delegation:
- Spawned sibling reviewer Pod: `reviewer-abort-podclient-reader`.
- Reviewer scope: read-only child worktree plus non-recursive parent-root read required by launch cwd validation.
- Reviewer was instructed to judge against the recorded Ticket requirements and binding decisions.
Pending:
- Await reviewer verdict before merge-ready dossier / merge-completion.
- No merge, close, final approval, or cleanup has occurred for this Ticket.
---
<!-- event: implementation_report author: orchestrator at: 2026-06-08T02:52:51Z -->
## Implementation report
Merge-ready dossier: PodClient reader task cleanup
Ticket id/slug:
- `20260608-015630-abort-podclient-reader-task-on-drop` / `abort-podclient-reader-task-on-drop`
Branch/worktree:
- Branch: `abort-podclient-reader-task-on-drop`
- Worktree: `.worktree/abort-podclient-reader-task-on-drop`
- Current branch commit:
- `aec75b3 fix: abort PodClient reader task on drop`
Intent / invariant check:
- `PodClient` now owns the background reader task spawned by `PodClient::connect()`.
- Dropping `PodClient` aborts the reader task and releases the socket read half without relying on remote close.
- Live client send/receive behavior remains intact.
- No Pod socket protocol, event format, controller behavior, registry semantics, or panel polling policy changes were introduced.
Implementation summary:
- Added an owned `reader_task: JoinHandle<()>` field to `PodClient`.
- Stored the `tokio::spawn` handle for the background reader task.
- Implemented `Drop` for `PodClient` to call `reader_task.abort()`.
- Added focused regression tests for alive receive/send behavior, repeated connect/drop cleanup, and aborting a blocked reader task.
Files touched:
- `crates/client/src/pod_client.rs`
Coder / reviewer Pods:
- Coder: `coder-abort-podclient-reader`
- Reviewer: `reviewer-abort-podclient-reader`
Review evidence:
- Reviewer verdict: `approve`.
- Reviewer inspected `develop...aec75b3`, confirmed only `crates/client/src/pod_client.rs` changed, and confirmed the owned `JoinHandle` + `Drop::abort` implementation.
- Reviewer confirmed public client methods remain unchanged and the regression tests target the reported leak mode, including a blocked reader on incomplete JSON.
Validation performed by coder and/or reviewer:
- `cargo test -p client pod_client --lib`
- `cargo test -p client --lib`
- `cargo test -p tui pod_list --lib`
- `cargo fmt --check`
- `git diff --check develop...aec75b3`
- `cargo run -q -p yoi -- ticket doctor`
- `nix build .#yoi`
Blockers fixed or rejected findings:
- No reviewer blockers.
Residual risks:
- `Drop` aborts but cannot await task cancellation completion; reviewer judged this normal for `Drop`, and the observable socket cleanup tests cover the required behavior.
Dirty state:
- Child worktree is clean at `aec75b3`.
- Main workspace has unrelated Ticket-record edits for other active/new Tickets; they are outside this branch's touched paths and are understood.
Parent/human decision needs:
- User has authorized merge-completion and cleanup after approved work. Proceeding to merge-completion unless post-merge validation fails.
---
<!-- event: review author: orchestrator at: 2026-06-08T02:54:16Z status: approve -->
## Review: approve
Final merge-completion approval after merge to `develop` and post-merge validation.
Evidence:
- Merged branch `abort-podclient-reader-task-on-drop` with `--no-ff`.
- Reviewer `reviewer-abort-podclient-reader` approved the branch-local implementation.
- Post-merge validation passed: `cargo test -p client pod_client --lib`, `cargo test -p client --lib`, `cargo test -p tui pod_list --lib`, `cargo check -q`, `cargo fmt --check`, `git diff --check`, `cargo run -q -p yoi -- ticket doctor`, and `nix build .#yoi`.
- Coder/reviewer Pods stopped and delegated scope reclaimed.
- Merged worktree removed and branch deleted.
This approval is for the merged main-branch result, not merely the branch-local reviewer verdict.
---
<!-- event: state_changed author: orchestrator at: 2026-06-08T02:54:16Z from: inprogress to: done reason: merged_and_validated field: workflow_state -->
## State changed
Merged to `develop`, post-merge validation passed, final merge-completion approval recorded, and PodClient reader cleanup branch/worktree/Pods cleaned up.
---
<!-- event: close author: hare at: 2026-06-08T02:54:31Z status: closed -->
## Closed
Merged and completed the PodClient reader-task cleanup.
Summary:
- `PodClient` now stores the background reader task `JoinHandle` created during `PodClient::connect()`.
- `PodClient` implements `Drop` and aborts the reader task when the client is dropped.
- Dropping short-lived clients releases the Unix socket read half without waiting for remote socket close.
- Existing live-client send and receive behavior remains covered.
- No Pod socket protocol, event format, controller behavior, registry semantics, or panel polling policy changed.
Merged branch/worktree:
- Branch: `abort-podclient-reader-task-on-drop`
- Commit: `aec75b3 fix: abort PodClient reader task on drop`
- Merge commit on `develop`: `b0f37b5 merge: abort podclient reader task on drop`
Validation passed after merge:
- `cargo test -p client pod_client --lib`
- `cargo test -p client --lib`
- `cargo test -p tui pod_list --lib`
- `cargo check -q`
- `cargo fmt --check`
- `git diff --check`
- `cargo run -q -p yoi -- ticket doctor`
- `nix build .#yoi`
Cleanup completed:
- Stopped coder/reviewer Pods and reclaimed scope.
- Removed `.worktree/abort-podclient-reader-task-on-drop`.
- Deleted branch `abort-podclient-reader-task-on-drop`.
Residual note:
- `Drop` cannot await task cancellation completion, but reviewer accepted the normal `JoinHandle::abort()` pattern and focused tests verify observable socket cleanup behavior.
---