yoi/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/thread.md

12 KiB

Created

Created by LocalTicketBackend create.


Intake summary

Existing Ticket refined enough for Orchestrator routing. Request is an implementation-ready bug fix: make client::PodClient own/cancel its background Unix-socket reader task on drop so short-lived panel probes do not leak file descriptors. Acceptance focuses on Drop-triggered reader abort, preserving normal event receive behavior while alive, regression coverage for repeated connect/drop cleanup, and validation via focused client tests plus formatting/diff/ticket doctor checks. needs_preflight: false; risk_flags: [async-cancellation, pod-client-lifecycle, fd-leak].


State changed

Intake marked ready: requirements, acceptance criteria, implementation boundary, validation, and escalation surface are clear enough for Orchestrator routing. No implementation is started by Intake.


State changed

Ticket queued for Orchestrator routing.


Decision

Routing decision: implementation_ready

Reason:

  • Intake marked this as ready with needs_preflight: false; the Ticket describes a concrete bug and a narrow lifecycle fix.
  • Requirements and acceptance criteria are observable: PodClient must own/cancel its background reader task, dropping the client must abort the reader and release the read half promptly, and existing send/receive behavior must remain intact.
  • The relevant code map is narrow: crates/client/src/pod_client.rs owns PodClient::connect, the spawned reader task, send/recv methods, and existing client tests; crates/tui/src/pod_list.rs exercises repeated short-lived live status probes.
  • Remaining uncertainty is bounded async test design, not a product/API/authority boundary decision.

Evidence checked:

  • Ticket body: background, goal, requirements, acceptance criteria.
  • Thread: intake summary and ready -> queued event.
  • Workspace state: no matching branch/worktree exists; main has unrelated Ticket-record changes for allow-spawnpod-child-workspace-cwd that are understood and outside the implementation paths.
  • Code map search for PodClient, PodClient::connect, reader task spawn, live status probing, and try_connect_live_pod.
  • Ticket doctor: 0 errors; existing warnings are unrelated legacy closed-Ticket diagnostics.

IntentPacket:

Intent:

  • Fix the PodClient reader-task lifecycle so short-lived clients used by panel/live Pod probing do not leak Unix socket file descriptors.

Binding decisions / invariants:

  • PodClient owns the lifetime of the background reader task created by PodClient::connect().
  • Dropping PodClient must cancel/abort the reader task without relying on remote socket close.
  • Aborting the reader task must drop the read half of the Unix socket promptly.
  • While PodClient is alive, normal event receive behavior must continue to work.
  • Existing one-shot clients, TUI clients, Ticket role launcher communication, and Pod-management tools must continue using PodClient normally.
  • Do not change Pod socket protocol, event format, controller behavior, registry semantics, or panel polling policy unless a focused test shows a minimal caller adjustment is necessary.

Requirements / acceptance criteria:

  • Store the reader task JoinHandle in PodClient or provide equivalent owned cancellation.
  • Implement Drop for PodClient to abort/cancel the reader task.
  • Preserve send, try_recv_event, and recv_event behavior while alive.
  • Add focused regression coverage proving drop aborts/cleans up the reader task or otherwise closes repeated connect/drop clients without leaking stuck reader tasks.
  • Add/adjust live status probe tests if needed.
  • Avoid brittle FD-count-only tests if a deterministic task/connection cleanup assertion is more robust.

Implementation latitude:

  • Coder may choose exact field names, test seams, and whether to expose test-only observability through existing module tests.
  • Coder may use server-side connection closure or task completion probes in tests if they deterministically prove the dropped client no longer holds the socket read half.
  • Coder may add a small helper if it improves readability, but should avoid broad refactors.

Escalate if:

  • Fixing reader-task cleanup requires changing PodClient public API or Pod socket protocol.
  • Drop-based cancellation conflicts with long-lived TUI event consumption.
  • Deterministic regression coverage is impossible without invasive test-only hooks.
  • The observed FD leak appears to come from a different source after inspecting PodClient.

Validation:

  • Focused client tests for pod_client / reader-task drop behavior.
  • Relevant TUI live status probe tests if changed.
  • cargo test -p client ... --lib or focused equivalent selected by coder.
  • cargo test -p tui pod_list if status probing is touched.
  • cargo fmt --check.
  • git diff --check.
  • cargo run -q -p yoi -- ticket doctor.
  • If runtime/client behavior is touched, final merge-completion should include nix build .#yoi.

Current code map:

  • crates/client/src/pod_client.rs: PodClient struct, connect, spawned reader task, send/recv tests.
  • crates/tui/src/pod_list.rs: repeated PodClient::connect live status probing and existing probe tests.
  • crates/tui/src/single_pod.rs and crates/client/src/ticket_role.rs: long-lived / one-shot client users to preserve conceptually.

Critical risks / reviewer focus:

  • Dropping PodClient must abort the reader task even if the server stays open and silent.
  • The fix must not drop or abort the reader while a live client is still expected to receive events.
  • Regression tests should fail against the old behavior rather than only checking that a field exists.
  • Avoid masking leaks by relying only on remote close or timeout behavior.

State changed

Accepted queued implementation after reading the Ticket, workspace state, and PodClient code map. This acceptance precedes worktree creation and coder/reviewer Pod spawning.


Plan

Implementation routing accepted and worktree created.

Worktree plan:

  • Branch: abort-podclient-reader-task-on-drop
  • Worktree: .worktree/abort-podclient-reader-task-on-drop
  • Base: 89b4369 ticket: route podclient reader cleanup
  • Scope: implement PodClient reader-task owned cancellation and focused cleanup tests.

Delegation plan:

  • Spawn sibling coder Pod with write scope limited to .worktree/abort-podclient-reader-task-on-drop.
  • Coder must run Bash commands from the child worktree, not from the main workspace.
  • Coder must not edit main-workspace .yoi / Ticket / workflow / docs records, and must not create generated memory/local/runtime/secret-like files.
  • Reviewer will be spawned read-only after coder reports diff/commits and validation evidence.

Concurrency note:

  • Main workspace has unrelated Ticket-record edits for other queued/intake work. This implementation is isolated to the child worktree and should not edit those records.

Stop/completion policy:

  • Because the human authorized cleanup after completion, this Orchestrator will proceed through reviewer approval, merge-completion validation, Ticket close, Pod stop, worktree removal, and branch deletion if the branch is approved and merge authority remains clear.

Implementation report

Coder completed and reviewer delegated.

Coder result:

  • Pod: coder-abort-podclient-reader
  • Commit: aec75b3 fix: abort PodClient reader task on drop
  • Worktree status: clean after commit.
  • Diff against develop: 1 file changed, 143 insertions, 2 deletions.

Files touched by coder:

  • crates/client/src/pod_client.rs

Coder reported implementation:

  • Added owned reader_task: JoinHandle<()> field to PodClient.
  • Stored the background reader task created by PodClient::connect().
  • Implemented Drop for PodClient to abort the reader task.
  • Preserved send and receive behavior while PodClient is alive.
  • Added focused tests for receiving while alive, sending while alive, repeated connect/drop connection cleanup, and aborting a blocked reader so the Unix socket read half is released.

Coder reported validation:

  • cargo test -p client pod_client --lib
  • cargo test -p client --lib
  • cargo test -p tui pod_list --lib
  • cargo fmt --check
  • git diff --check
  • cargo run -q -p yoi -- ticket doctor
  • nix build .#yoi

Reviewer delegation:

  • Spawned sibling reviewer Pod: reviewer-abort-podclient-reader.
  • Reviewer scope: read-only child worktree plus non-recursive parent-root read required by launch cwd validation.
  • Reviewer was instructed to judge against the recorded Ticket requirements and binding decisions.

Pending:

  • Await reviewer verdict before merge-ready dossier / merge-completion.
  • No merge, close, final approval, or cleanup has occurred for this Ticket.

Implementation report

Merge-ready dossier: PodClient reader task cleanup

Ticket id/slug:

  • 20260608-015630-abort-podclient-reader-task-on-drop / abort-podclient-reader-task-on-drop

Branch/worktree:

  • Branch: abort-podclient-reader-task-on-drop
  • Worktree: .worktree/abort-podclient-reader-task-on-drop
  • Current branch commit:
    • aec75b3 fix: abort PodClient reader task on drop

Intent / invariant check:

  • PodClient now owns the background reader task spawned by PodClient::connect().
  • Dropping PodClient aborts the reader task and releases the socket read half without relying on remote close.
  • Live client send/receive behavior remains intact.
  • No Pod socket protocol, event format, controller behavior, registry semantics, or panel polling policy changes were introduced.

Implementation summary:

  • Added an owned reader_task: JoinHandle<()> field to PodClient.
  • Stored the tokio::spawn handle for the background reader task.
  • Implemented Drop for PodClient to call reader_task.abort().
  • Added focused regression tests for alive receive/send behavior, repeated connect/drop cleanup, and aborting a blocked reader task.

Files touched:

  • crates/client/src/pod_client.rs

Coder / reviewer Pods:

  • Coder: coder-abort-podclient-reader
  • Reviewer: reviewer-abort-podclient-reader

Review evidence:

  • Reviewer verdict: approve.
  • Reviewer inspected develop...aec75b3, confirmed only crates/client/src/pod_client.rs changed, and confirmed the owned JoinHandle + Drop::abort implementation.
  • Reviewer confirmed public client methods remain unchanged and the regression tests target the reported leak mode, including a blocked reader on incomplete JSON.

Validation performed by coder and/or reviewer:

  • cargo test -p client pod_client --lib
  • cargo test -p client --lib
  • cargo test -p tui pod_list --lib
  • cargo fmt --check
  • git diff --check develop...aec75b3
  • cargo run -q -p yoi -- ticket doctor
  • nix build .#yoi

Blockers fixed or rejected findings:

  • No reviewer blockers.

Residual risks:

  • Drop aborts but cannot await task cancellation completion; reviewer judged this normal for Drop, and the observable socket cleanup tests cover the required behavior.

Dirty state:

  • Child worktree is clean at aec75b3.
  • Main workspace has unrelated Ticket-record edits for other active/new Tickets; they are outside this branch's touched paths and are understood.

Parent/human decision needs:

  • User has authorized merge-completion and cleanup after approved work. Proceeding to merge-completion unless post-merge validation fails.