diff --git a/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/artifacts/.gitkeep b/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/artifacts/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/item.md b/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/item.md new file mode 100644 index 00000000..7d23b08f --- /dev/null +++ b/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/item.md @@ -0,0 +1,78 @@ +--- +id: '20260608-015630-abort-podclient-reader-task-on-drop' +slug: 'abort-podclient-reader-task-on-drop' +title: 'Abort PodClient reader task on drop' +status: 'open' +kind: 'task' +priority: 'P1' +labels: ['client', 'pod', 'tui', 'fd-leak', 'bug'] +workflow_state: 'inprogress' +created_at: '2026-06-08T01:56:30Z' +updated_at: '2026-06-08T02:42:04Z' +assignee: null +legacy_ticket: null +queued_by: 'workspace-panel' +queued_at: '2026-06-08T02:40:58Z' +--- + +## Background + +When `yoi panel` is left open, file descriptor count can increase steadily even while the user does nothing. Observed behavior: FD count rises by roughly 10 per poll cycle, and eventually panel diagnostics show unrelated-looking errors such as: + +```text +Panel local role registry unavailable: local role session registry I/O error: Too many open files (os error 24) +Ticket config is unusable: failed to read Ticket config .../.yoi/ticket.config.toml: Too many open files (os error 24) +``` + +The likely root cause is `client::PodClient::connect()`: it splits the Unix socket and spawns a background reader task, but `PodClient` does not keep the `JoinHandle` or abort the task on drop. + +Current shape: + +```rust +pub struct PodClient { + writer: JsonLineWriter>, + event_rx: mpsc::Receiver, +} + +pub async fn connect(path: &Path) -> Result { + let stream = UnixStream::connect(path).await?; + let (reader, writer) = tokio::io::split(stream); + ... + tokio::spawn(async move { + let mut reader = JsonLineReader::new(reader); + while let Ok(Some(event)) = reader.next::().await { + if event_tx.send(event).await.is_err() { + break; + } + } + }); + Ok(Self { writer, event_rx }) +} +``` + +Panel live Pod probing repeatedly creates short-lived `PodClient`s. Dropping the `PodClient` drops the writer and receiver, but the spawned reader task can remain blocked on socket read while holding the read half/FD. If there are ~10 live probe targets, each panel poll can leak ~10 FDs. + +## Goal + +Make `PodClient` own and clean up its background reader task so short-lived clients do not leak socket file descriptors. + +## Requirements + +- Store the reader task `JoinHandle` inside `PodClient` or otherwise provide owned cancellation. +- Implement `Drop` for `PodClient` to abort/cancel the reader task. +- Ensure aborting the reader task drops the read half of the Unix socket promptly. +- Preserve normal long-lived TUI client behavior: events should continue to be received while `PodClient` is alive. +- Ensure `PodClient` remains usable for existing one-shot clients and communication tools. +- Avoid relying on remote socket close to terminate the reader task. +- Add regression coverage that repeatedly creates/drops `PodClient` connections and verifies server-side/client-side tasks/connections are closed or at least that the reader task is aborted. +- Add/adjust tests for status probing if needed. + +## Acceptance criteria + +- Dropping a `PodClient` aborts its background reader task. +- Repeated live Pod probing no longer increases open FD count monotonically. +- Panel can remain open through multiple poll cycles without leaking one Unix socket per live Pod per cycle. +- Existing Pod client send/receive behavior still works. +- Focused tests cover reader task cleanup. +- `cargo test -p client ... --lib` and relevant TUI/pod tests pass. +- `cargo fmt --check`, `git diff --check`, and `target/debug/yoi ticket doctor` pass. diff --git a/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/thread.md b/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/thread.md new file mode 100644 index 00000000..ec6ce7e4 --- /dev/null +++ b/.yoi/tickets/open/20260608-015630-abort-podclient-reader-task-on-drop/thread.md @@ -0,0 +1,114 @@ + + +## Created + +Created by LocalTicketBackend create. + +--- + + + +## Intake summary + +Existing Ticket refined enough for Orchestrator routing. Request is an implementation-ready bug fix: make client::PodClient own/cancel its background Unix-socket reader task on drop so short-lived panel probes do not leak file descriptors. Acceptance focuses on Drop-triggered reader abort, preserving normal event receive behavior while alive, regression coverage for repeated connect/drop cleanup, and validation via focused client tests plus formatting/diff/ticket doctor checks. needs_preflight: false; risk_flags: [async-cancellation, pod-client-lifecycle, fd-leak]. + +--- + + + +## State changed + +Intake marked ready: requirements, acceptance criteria, implementation boundary, validation, and escalation surface are clear enough for Orchestrator routing. No implementation is started by Intake. + +--- + + + +## State changed + +Ticket queued for Orchestrator routing. + + +--- + + + +## Decision + +Routing decision: implementation_ready + +Reason: +- Intake marked this as ready with `needs_preflight: false`; the Ticket describes a concrete bug and a narrow lifecycle fix. +- Requirements and acceptance criteria are observable: `PodClient` must own/cancel its background reader task, dropping the client must abort the reader and release the read half promptly, and existing send/receive behavior must remain intact. +- The relevant code map is narrow: `crates/client/src/pod_client.rs` owns `PodClient::connect`, the spawned reader task, send/recv methods, and existing client tests; `crates/tui/src/pod_list.rs` exercises repeated short-lived live status probes. +- Remaining uncertainty is bounded async test design, not a product/API/authority boundary decision. + +Evidence checked: +- Ticket body: background, goal, requirements, acceptance criteria. +- Thread: intake summary and `ready -> queued` event. +- Workspace state: no matching branch/worktree exists; main has unrelated Ticket-record changes for `allow-spawnpod-child-workspace-cwd` that are understood and outside the implementation paths. +- Code map search for `PodClient`, `PodClient::connect`, reader task spawn, live status probing, and `try_connect_live_pod`. +- Ticket doctor: 0 errors; existing warnings are unrelated legacy closed-Ticket diagnostics. + +IntentPacket: + +Intent: +- Fix the `PodClient` reader-task lifecycle so short-lived clients used by panel/live Pod probing do not leak Unix socket file descriptors. + +Binding decisions / invariants: +- `PodClient` owns the lifetime of the background reader task created by `PodClient::connect()`. +- Dropping `PodClient` must cancel/abort the reader task without relying on remote socket close. +- Aborting the reader task must drop the read half of the Unix socket promptly. +- While `PodClient` is alive, normal event receive behavior must continue to work. +- Existing one-shot clients, TUI clients, Ticket role launcher communication, and Pod-management tools must continue using `PodClient` normally. +- Do not change Pod socket protocol, event format, controller behavior, registry semantics, or panel polling policy unless a focused test shows a minimal caller adjustment is necessary. + +Requirements / acceptance criteria: +- Store the reader task `JoinHandle` in `PodClient` or provide equivalent owned cancellation. +- Implement `Drop` for `PodClient` to abort/cancel the reader task. +- Preserve `send`, `try_recv_event`, and `recv_event` behavior while alive. +- Add focused regression coverage proving drop aborts/cleans up the reader task or otherwise closes repeated connect/drop clients without leaking stuck reader tasks. +- Add/adjust live status probe tests if needed. +- Avoid brittle FD-count-only tests if a deterministic task/connection cleanup assertion is more robust. + +Implementation latitude: +- Coder may choose exact field names, test seams, and whether to expose test-only observability through existing module tests. +- Coder may use server-side connection closure or task completion probes in tests if they deterministically prove the dropped client no longer holds the socket read half. +- Coder may add a small helper if it improves readability, but should avoid broad refactors. + +Escalate if: +- Fixing reader-task cleanup requires changing `PodClient` public API or Pod socket protocol. +- Drop-based cancellation conflicts with long-lived TUI event consumption. +- Deterministic regression coverage is impossible without invasive test-only hooks. +- The observed FD leak appears to come from a different source after inspecting `PodClient`. + +Validation: +- Focused client tests for `pod_client` / reader-task drop behavior. +- Relevant TUI live status probe tests if changed. +- `cargo test -p client ... --lib` or focused equivalent selected by coder. +- `cargo test -p tui pod_list` if status probing is touched. +- `cargo fmt --check`. +- `git diff --check`. +- `cargo run -q -p yoi -- ticket doctor`. +- If runtime/client behavior is touched, final merge-completion should include `nix build .#yoi`. + +Current code map: +- `crates/client/src/pod_client.rs`: `PodClient` struct, `connect`, spawned reader task, send/recv tests. +- `crates/tui/src/pod_list.rs`: repeated `PodClient::connect` live status probing and existing probe tests. +- `crates/tui/src/single_pod.rs` and `crates/client/src/ticket_role.rs`: long-lived / one-shot client users to preserve conceptually. + +Critical risks / reviewer focus: +- Dropping `PodClient` must abort the reader task even if the server stays open and silent. +- The fix must not drop or abort the reader while a live client is still expected to receive events. +- Regression tests should fail against the old behavior rather than only checking that a field exists. +- Avoid masking leaks by relying only on remote close or timeout behavior. + +--- + + + +## State changed + +Accepted queued implementation after reading the Ticket, workspace state, and `PodClient` code map. This acceptance precedes worktree creation and coder/reviewer Pod spawning. + +---