ticket: route podclient reader cleanup

This commit is contained in:
Keisuke Hirata 2026-06-08 11:42:09 +09:00
parent 9b3041ec3a
commit 89b4369f2a
No known key found for this signature in database
3 changed files with 192 additions and 0 deletions

View File

@ -0,0 +1,78 @@
---
id: '20260608-015630-abort-podclient-reader-task-on-drop'
slug: 'abort-podclient-reader-task-on-drop'
title: 'Abort PodClient reader task on drop'
status: 'open'
kind: 'task'
priority: 'P1'
labels: ['client', 'pod', 'tui', 'fd-leak', 'bug']
workflow_state: 'inprogress'
created_at: '2026-06-08T01:56:30Z'
updated_at: '2026-06-08T02:42:04Z'
assignee: null
legacy_ticket: null
queued_by: 'workspace-panel'
queued_at: '2026-06-08T02:40:58Z'
---
## Background
When `yoi panel` is left open, file descriptor count can increase steadily even while the user does nothing. Observed behavior: FD count rises by roughly 10 per poll cycle, and eventually panel diagnostics show unrelated-looking errors such as:
```text
Panel local role registry unavailable: local role session registry I/O error: Too many open files (os error 24)
Ticket config is unusable: failed to read Ticket config .../.yoi/ticket.config.toml: Too many open files (os error 24)
```
The likely root cause is `client::PodClient::connect()`: it splits the Unix socket and spawns a background reader task, but `PodClient` does not keep the `JoinHandle` or abort the task on drop.
Current shape:
```rust
pub struct PodClient {
writer: JsonLineWriter<tokio::io::WriteHalf<UnixStream>>,
event_rx: mpsc::Receiver<Event>,
}
pub async fn connect(path: &Path) -> Result<Self, io::Error> {
let stream = UnixStream::connect(path).await?;
let (reader, writer) = tokio::io::split(stream);
...
tokio::spawn(async move {
let mut reader = JsonLineReader::new(reader);
while let Ok(Some(event)) = reader.next::<Event>().await {
if event_tx.send(event).await.is_err() {
break;
}
}
});
Ok(Self { writer, event_rx })
}
```
Panel live Pod probing repeatedly creates short-lived `PodClient`s. Dropping the `PodClient` drops the writer and receiver, but the spawned reader task can remain blocked on socket read while holding the read half/FD. If there are ~10 live probe targets, each panel poll can leak ~10 FDs.
## Goal
Make `PodClient` own and clean up its background reader task so short-lived clients do not leak socket file descriptors.
## Requirements
- Store the reader task `JoinHandle` inside `PodClient` or otherwise provide owned cancellation.
- Implement `Drop` for `PodClient` to abort/cancel the reader task.
- Ensure aborting the reader task drops the read half of the Unix socket promptly.
- Preserve normal long-lived TUI client behavior: events should continue to be received while `PodClient` is alive.
- Ensure `PodClient` remains usable for existing one-shot clients and communication tools.
- Avoid relying on remote socket close to terminate the reader task.
- Add regression coverage that repeatedly creates/drops `PodClient` connections and verifies server-side/client-side tasks/connections are closed or at least that the reader task is aborted.
- Add/adjust tests for status probing if needed.
## Acceptance criteria
- Dropping a `PodClient` aborts its background reader task.
- Repeated live Pod probing no longer increases open FD count monotonically.
- Panel can remain open through multiple poll cycles without leaking one Unix socket per live Pod per cycle.
- Existing Pod client send/receive behavior still works.
- Focused tests cover reader task cleanup.
- `cargo test -p client ... --lib` and relevant TUI/pod tests pass.
- `cargo fmt --check`, `git diff --check`, and `target/debug/yoi ticket doctor` pass.

View File

@ -0,0 +1,114 @@
<!-- event: create author: LocalTicketBackend at: 2026-06-08T01:56:30Z -->
## Created
Created by LocalTicketBackend create.
---
<!-- event: intake_summary author: intake at: 2026-06-08T02:40:51Z -->
## Intake summary
Existing Ticket refined enough for Orchestrator routing. Request is an implementation-ready bug fix: make client::PodClient own/cancel its background Unix-socket reader task on drop so short-lived panel probes do not leak file descriptors. Acceptance focuses on Drop-triggered reader abort, preserving normal event receive behavior while alive, regression coverage for repeated connect/drop cleanup, and validation via focused client tests plus formatting/diff/ticket doctor checks. needs_preflight: false; risk_flags: [async-cancellation, pod-client-lifecycle, fd-leak].
---
<!-- event: state_changed author: intake at: 2026-06-08T02:40:51Z from: intake to: ready reason: intake_ready field: workflow_state -->
## State changed
Intake marked ready: requirements, acceptance criteria, implementation boundary, validation, and escalation surface are clear enough for Orchestrator routing. No implementation is started by Intake.
---
<!-- event: state_changed author: workspace-panel at: 2026-06-08T02:40:58Z from: ready to: queued reason: queued field: workflow_state -->
## State changed
Ticket queued for Orchestrator routing.
---
<!-- event: decision author: orchestrator at: 2026-06-08T02:41:57Z -->
## Decision
Routing decision: implementation_ready
Reason:
- Intake marked this as ready with `needs_preflight: false`; the Ticket describes a concrete bug and a narrow lifecycle fix.
- Requirements and acceptance criteria are observable: `PodClient` must own/cancel its background reader task, dropping the client must abort the reader and release the read half promptly, and existing send/receive behavior must remain intact.
- The relevant code map is narrow: `crates/client/src/pod_client.rs` owns `PodClient::connect`, the spawned reader task, send/recv methods, and existing client tests; `crates/tui/src/pod_list.rs` exercises repeated short-lived live status probes.
- Remaining uncertainty is bounded async test design, not a product/API/authority boundary decision.
Evidence checked:
- Ticket body: background, goal, requirements, acceptance criteria.
- Thread: intake summary and `ready -> queued` event.
- Workspace state: no matching branch/worktree exists; main has unrelated Ticket-record changes for `allow-spawnpod-child-workspace-cwd` that are understood and outside the implementation paths.
- Code map search for `PodClient`, `PodClient::connect`, reader task spawn, live status probing, and `try_connect_live_pod`.
- Ticket doctor: 0 errors; existing warnings are unrelated legacy closed-Ticket diagnostics.
IntentPacket:
Intent:
- Fix the `PodClient` reader-task lifecycle so short-lived clients used by panel/live Pod probing do not leak Unix socket file descriptors.
Binding decisions / invariants:
- `PodClient` owns the lifetime of the background reader task created by `PodClient::connect()`.
- Dropping `PodClient` must cancel/abort the reader task without relying on remote socket close.
- Aborting the reader task must drop the read half of the Unix socket promptly.
- While `PodClient` is alive, normal event receive behavior must continue to work.
- Existing one-shot clients, TUI clients, Ticket role launcher communication, and Pod-management tools must continue using `PodClient` normally.
- Do not change Pod socket protocol, event format, controller behavior, registry semantics, or panel polling policy unless a focused test shows a minimal caller adjustment is necessary.
Requirements / acceptance criteria:
- Store the reader task `JoinHandle` in `PodClient` or provide equivalent owned cancellation.
- Implement `Drop` for `PodClient` to abort/cancel the reader task.
- Preserve `send`, `try_recv_event`, and `recv_event` behavior while alive.
- Add focused regression coverage proving drop aborts/cleans up the reader task or otherwise closes repeated connect/drop clients without leaking stuck reader tasks.
- Add/adjust live status probe tests if needed.
- Avoid brittle FD-count-only tests if a deterministic task/connection cleanup assertion is more robust.
Implementation latitude:
- Coder may choose exact field names, test seams, and whether to expose test-only observability through existing module tests.
- Coder may use server-side connection closure or task completion probes in tests if they deterministically prove the dropped client no longer holds the socket read half.
- Coder may add a small helper if it improves readability, but should avoid broad refactors.
Escalate if:
- Fixing reader-task cleanup requires changing `PodClient` public API or Pod socket protocol.
- Drop-based cancellation conflicts with long-lived TUI event consumption.
- Deterministic regression coverage is impossible without invasive test-only hooks.
- The observed FD leak appears to come from a different source after inspecting `PodClient`.
Validation:
- Focused client tests for `pod_client` / reader-task drop behavior.
- Relevant TUI live status probe tests if changed.
- `cargo test -p client ... --lib` or focused equivalent selected by coder.
- `cargo test -p tui pod_list` if status probing is touched.
- `cargo fmt --check`.
- `git diff --check`.
- `cargo run -q -p yoi -- ticket doctor`.
- If runtime/client behavior is touched, final merge-completion should include `nix build .#yoi`.
Current code map:
- `crates/client/src/pod_client.rs`: `PodClient` struct, `connect`, spawned reader task, send/recv tests.
- `crates/tui/src/pod_list.rs`: repeated `PodClient::connect` live status probing and existing probe tests.
- `crates/tui/src/single_pod.rs` and `crates/client/src/ticket_role.rs`: long-lived / one-shot client users to preserve conceptually.
Critical risks / reviewer focus:
- Dropping `PodClient` must abort the reader task even if the server stays open and silent.
- The fix must not drop or abort the reader while a live client is still expected to receive events.
- Regression tests should fail against the old behavior rather than only checking that a field exists.
- Avoid masking leaks by relying only on remote close or timeout behavior.
---
<!-- event: state_changed author: orchestrator at: 2026-06-08T02:42:04Z from: queued to: inprogress reason: orchestrator_acceptance field: workflow_state -->
## State changed
Accepted queued implementation after reading the Ticket, workspace state, and `PodClient` code map. This acceptance precedes worktree creation and coder/reviewer Pod spawning.
---