From 97df1a40865c219960dfcffb27a2ae07c81fa835 Mon Sep 17 00:00:00 2001 From: Hare Date: Sat, 30 May 2026 14:45:39 +0900 Subject: [PATCH] close: multi pod status probes --- .../artifacts/.gitkeep | 0 .../item.md | 4 +- .../resolution.md | 45 +++++++ .../thread.md | 116 ++++++++++++++++++ .../thread.md | 63 ---------- 5 files changed, 163 insertions(+), 65 deletions(-) rename work-items/{open => closed}/20260530-053259-multi-pod-parallel-status-probes/artifacts/.gitkeep (100%) rename work-items/{open => closed}/20260530-053259-multi-pod-parallel-status-probes/item.md (98%) create mode 100644 work-items/closed/20260530-053259-multi-pod-parallel-status-probes/resolution.md create mode 100644 work-items/closed/20260530-053259-multi-pod-parallel-status-probes/thread.md delete mode 100644 work-items/open/20260530-053259-multi-pod-parallel-status-probes/thread.md diff --git a/work-items/open/20260530-053259-multi-pod-parallel-status-probes/artifacts/.gitkeep b/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/artifacts/.gitkeep similarity index 100% rename from work-items/open/20260530-053259-multi-pod-parallel-status-probes/artifacts/.gitkeep rename to work-items/closed/20260530-053259-multi-pod-parallel-status-probes/artifacts/.gitkeep diff --git a/work-items/open/20260530-053259-multi-pod-parallel-status-probes/item.md b/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/item.md similarity index 98% rename from work-items/open/20260530-053259-multi-pod-parallel-status-probes/item.md rename to work-items/closed/20260530-053259-multi-pod-parallel-status-probes/item.md index 68d1b0c9..df01366d 100644 --- a/work-items/open/20260530-053259-multi-pod-parallel-status-probes/item.md +++ b/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/item.md @@ -2,12 +2,12 @@ id: 20260530-053259-multi-pod-parallel-status-probes slug: multi-pod-parallel-status-probes title: Parallelize multi-Pod live status probes -status: open +status: closed kind: task priority: P2 labels: [tui, pod-dashboard, performance] created_at: 2026-05-30T05:32:59Z -updated_at: 2026-05-30T05:45:14Z +updated_at: 2026-05-30T05:45:37Z assignee: null legacy_ticket: null --- diff --git a/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/resolution.md b/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/resolution.md new file mode 100644 index 00000000..df01366d --- /dev/null +++ b/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/resolution.md @@ -0,0 +1,45 @@ +--- +id: 20260530-053259-multi-pod-parallel-status-probes +slug: multi-pod-parallel-status-probes +title: Parallelize multi-Pod live status probes +status: closed +kind: task +priority: P2 +labels: [tui, pod-dashboard, performance] +created_at: 2026-05-30T05:32:59Z +updated_at: 2026-05-30T05:45:37Z +assignee: null +legacy_ticket: null +--- + +## Background + +The `--multi` dashboard frequently shows `[live unknown]` for reachable Pods. Current code probes each runtime-registry socket with a very short `LIVE_STATUS_PROBE_TIMEOUT` of 25ms in `crates/tui/src/pod_list.rs`. A live row becomes `status = None` when the socket connects but no `Event::Snapshot` / `Event::Status` is read before that deadline. + +That label is misleading: the Pod is reachable, but status probing timed out or did not receive a status event quickly enough. Raising the timeout alone risks making dashboard reload latency scale linearly with the number of live Pods, because status probes are currently performed sequentially. + +## Requirements + +- Increase the live status probe timeout to a more realistic value, likely in the 150ms–250ms range. +- Run live status probes concurrently so reload latency does not become the sum of all per-Pod timeouts. +- Keep reachable Pods with missing status as live/attachable; do not treat status timeout as unreachable. +- Keep restoreability separate from live attachability; this ticket must not make runtime-only Pods restorable. +- Replace or soften the `live unknown` label in `--multi` so it communicates reachable-live-with-unreported-status rather than broken state. Candidate labels: `live`, `live probing`, or similar. +- Keep the implementation in shared `PodList` / live probe code where possible; avoid duplicating dashboard-specific discovery logic. +- Preserve existing behavior for explicitly reported `Idle`, `Running`, and `Paused` statuses. + +## Non-goals + +- Do not redesign Pod notification or run completion delivery. +- Do not persist last-known status in pod-store. +- Do not change `AttachOrRestorePod` or restore semantics. +- Do not make unreachable registry allocations appear attachable. + +## Acceptance criteria + +- Multiple live Pod status probes wait concurrently, not strictly one after another. +- The per-Pod timeout is long enough to significantly reduce false `status = None` cases compared to 25ms. +- A reachable Pod whose status probe times out remains displayed as live and openable/attachable. +- The multi-Pod row label for `status = None` is less misleading than `live unknown`. +- Tests cover concurrent probing behavior, timeout/none-status handling, and label rendering. +- `cargo test -p tui pod_list`, `cargo test -p tui multi_pod`, `cargo test -p tui`, `cargo fmt --check`, and `./tickets.sh doctor` pass. diff --git a/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/thread.md b/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/thread.md new file mode 100644 index 00000000..8ba12837 --- /dev/null +++ b/work-items/closed/20260530-053259-multi-pod-parallel-status-probes/thread.md @@ -0,0 +1,116 @@ + + +## Created + +Created by tickets.sh create. + +--- + + + +## Plan + +## Preflight + +Classification: implementation-ready. + +The issue is localized to TUI live status probing and dashboard labeling. The product decision is settled: reachable Pods with missing status remain live/attachable, while status probe timeout should be less frequent and less alarming. No restore semantics or Pod runtime protocol changes are required. + +Current code map: +- `crates/tui/src/pod_list.rs`: `LIVE_STATUS_PROBE_TIMEOUT`, `load_live_pods`, `probe_live_status`, `LivePodInfo`, `PodList` merge/sort tests. +- `crates/tui/src/multi_pod.rs`: row label rendering for live status, currently `live unknown` when `LivePodInfo.status` is `None`. + +Critical risks: +- Do not make reload latency scale as N * timeout. +- Do not treat `status = None` as unreachable. +- Do not change restoreability/attachability semantics. +- Avoid duplicating discovery logic outside shared `PodList`. + + +--- + + + +## Review: approve + +Approve. + +The implementation addresses the ticket in the shared `PodList` live-probe path. Live status probes now run concurrently with a more realistic per-Pod timeout, reachable sockets without reported status remain live/attachable, and the multi-Pod label is softened from `live unknown` to `live` while explicit statuses remain unchanged. + +Blocker findings: none. + +Requirement coverage: +- Per-Pod timeout increased to 200ms. +- Probes are concurrent and bounded by the slowest bounded probe rather than `N * timeout`. +- Reachable sockets with no status remain reachable/live/attachable. +- Restoreability remains separate and unchanged. +- No last-known status is persisted to pod-store. +- Explicit `live idle`, `live running`, and `live paused` labels remain. +- `status = None` renders as `live`. +- The implementation changes shared `PodList` live-probe logic rather than duplicating dashboard-only logic. +- Tests cover concurrency, timeout/no-status reachable handling, and label rendering. + +Validation reviewed: +- `cargo fmt --check` — passed. +- `cargo test -p tui pod_list` — passed. +- `cargo test -p tui multi_pod` — passed. +- `cargo test -p tui` — passed. +- Reviewer additionally ran `./tickets.sh doctor` — passed. + +Final verdict: approve. + + +--- + + + +## Closed + +--- +id: 20260530-053259-multi-pod-parallel-status-probes +slug: multi-pod-parallel-status-probes +title: Parallelize multi-Pod live status probes +status: closed +kind: task +priority: P2 +labels: [tui, pod-dashboard, performance] +created_at: 2026-05-30T05:32:59Z +updated_at: 2026-05-30T05:45:37Z +assignee: null +legacy_ticket: null +--- + +## Background + +The `--multi` dashboard frequently shows `[live unknown]` for reachable Pods. Current code probes each runtime-registry socket with a very short `LIVE_STATUS_PROBE_TIMEOUT` of 25ms in `crates/tui/src/pod_list.rs`. A live row becomes `status = None` when the socket connects but no `Event::Snapshot` / `Event::Status` is read before that deadline. + +That label is misleading: the Pod is reachable, but status probing timed out or did not receive a status event quickly enough. Raising the timeout alone risks making dashboard reload latency scale linearly with the number of live Pods, because status probes are currently performed sequentially. + +## Requirements + +- Increase the live status probe timeout to a more realistic value, likely in the 150ms–250ms range. +- Run live status probes concurrently so reload latency does not become the sum of all per-Pod timeouts. +- Keep reachable Pods with missing status as live/attachable; do not treat status timeout as unreachable. +- Keep restoreability separate from live attachability; this ticket must not make runtime-only Pods restorable. +- Replace or soften the `live unknown` label in `--multi` so it communicates reachable-live-with-unreported-status rather than broken state. Candidate labels: `live`, `live probing`, or similar. +- Keep the implementation in shared `PodList` / live probe code where possible; avoid duplicating dashboard-specific discovery logic. +- Preserve existing behavior for explicitly reported `Idle`, `Running`, and `Paused` statuses. + +## Non-goals + +- Do not redesign Pod notification or run completion delivery. +- Do not persist last-known status in pod-store. +- Do not change `AttachOrRestorePod` or restore semantics. +- Do not make unreachable registry allocations appear attachable. + +## Acceptance criteria + +- Multiple live Pod status probes wait concurrently, not strictly one after another. +- The per-Pod timeout is long enough to significantly reduce false `status = None` cases compared to 25ms. +- A reachable Pod whose status probe times out remains displayed as live and openable/attachable. +- The multi-Pod row label for `status = None` is less misleading than `live unknown`. +- Tests cover concurrent probing behavior, timeout/none-status handling, and label rendering. +- `cargo test -p tui pod_list`, `cargo test -p tui multi_pod`, `cargo test -p tui`, `cargo fmt --check`, and `./tickets.sh doctor` pass. + + +--- diff --git a/work-items/open/20260530-053259-multi-pod-parallel-status-probes/thread.md b/work-items/open/20260530-053259-multi-pod-parallel-status-probes/thread.md deleted file mode 100644 index 736ad865..00000000 --- a/work-items/open/20260530-053259-multi-pod-parallel-status-probes/thread.md +++ /dev/null @@ -1,63 +0,0 @@ - - -## Created - -Created by tickets.sh create. - ---- - - - -## Plan - -## Preflight - -Classification: implementation-ready. - -The issue is localized to TUI live status probing and dashboard labeling. The product decision is settled: reachable Pods with missing status remain live/attachable, while status probe timeout should be less frequent and less alarming. No restore semantics or Pod runtime protocol changes are required. - -Current code map: -- `crates/tui/src/pod_list.rs`: `LIVE_STATUS_PROBE_TIMEOUT`, `load_live_pods`, `probe_live_status`, `LivePodInfo`, `PodList` merge/sort tests. -- `crates/tui/src/multi_pod.rs`: row label rendering for live status, currently `live unknown` when `LivePodInfo.status` is `None`. - -Critical risks: -- Do not make reload latency scale as N * timeout. -- Do not treat `status = None` as unreachable. -- Do not change restoreability/attachability semantics. -- Avoid duplicating discovery logic outside shared `PodList`. - - ---- - - - -## Review: approve - -Approve. - -The implementation addresses the ticket in the shared `PodList` live-probe path. Live status probes now run concurrently with a more realistic per-Pod timeout, reachable sockets without reported status remain live/attachable, and the multi-Pod label is softened from `live unknown` to `live` while explicit statuses remain unchanged. - -Blocker findings: none. - -Requirement coverage: -- Per-Pod timeout increased to 200ms. -- Probes are concurrent and bounded by the slowest bounded probe rather than `N * timeout`. -- Reachable sockets with no status remain reachable/live/attachable. -- Restoreability remains separate and unchanged. -- No last-known status is persisted to pod-store. -- Explicit `live idle`, `live running`, and `live paused` labels remain. -- `status = None` renders as `live`. -- The implementation changes shared `PodList` live-probe logic rather than duplicating dashboard-only logic. -- Tests cover concurrency, timeout/no-status reachable handling, and label rendering. - -Validation reviewed: -- `cargo fmt --check` — passed. -- `cargo test -p tui pod_list` — passed. -- `cargo test -p tui multi_pod` — passed. -- `cargo test -p tui` — passed. -- Reviewer additionally ran `./tickets.sh doctor` — passed. - -Final verdict: approve. - - ----