From 84819e2bf77c362ad515762e0a1905f3e231e160 Mon Sep 17 00:00:00 2001 From: Hare Date: Sun, 31 May 2026 11:14:38 +0900 Subject: [PATCH] review: workspace memory lint cli --- ...05-31-child-pod-visibility-restore-loss.md | 65 +++++++++++++++++++ .../item.md | 2 +- .../thread.md | 58 +++++++++++++++++ 3 files changed, 124 insertions(+), 1 deletion(-) create mode 100644 docs/report/2026-05-31-child-pod-visibility-restore-loss.md diff --git a/docs/report/2026-05-31-child-pod-visibility-restore-loss.md b/docs/report/2026-05-31-child-pod-visibility-restore-loss.md new file mode 100644 index 00000000..4d8d4b0b --- /dev/null +++ b/docs/report/2026-05-31-child-pod-visibility-restore-loss.md @@ -0,0 +1,65 @@ +# Child Pod visibility/restore loss during review flow + +Date: 2026-05-31 + +## Summary + +During the `workspace-memory-lint-cli` review flow, a spawned reviewer Pod appeared to stop producing notifications/output and then became impossible to attach/restore from the parent Pod. The parent later saw no spawned Pods at all, while a restore/prune notification reported that missing or unreachable delegated child Pods had been reclaimed. + +This looks like a control-plane visibility/restore issue rather than an implementation-review issue. The lost Pod was read-only and the review was safely re-run in a new reviewer Pod, but the incident is worth recording because it undermines long-running multi-agent workflows. + +## Observed sequence + +1. `workspace-memory-lint-coder-20260531` completed implementation and reported commit `7a717f2 cli: add workspace memory lint`. +2. A read-only reviewer Pod was spawned: + - `workspace-memory-lint-reviewer-20260531` + - read scope: main workspace and `.worktree/workspace-memory-lint-cli` +3. Repeated `ReadPodOutput` calls returned: + - `running; no new assistant text` +4. `InspectPod` still saw the reviewer as live/reachable/running at one point: + - socket: `/run/user/1000/insomnia/workspace-memory-lint-reviewer-20260531/sock` + - restore impossible only because the segment was locked by that live Pod +5. Later, after the user asked to restore it, `AttachOrRestorePod` failed: + - `pod workspace-memory-lint-reviewer-20260531 is not visible to this Pod` +6. `ListPods` then reported no spawned Pods, and `ListVisiblePods` only showed the self Pod `insomnia`. +7. A notification appeared: + - `Restored Pod state contained missing or unreachable delegated child Pods; their delegated write scopes were reclaimed before resume.` +8. The review had to be re-run by spawning a new read-only reviewer: + - `workspace-memory-lint-reviewer-rerun-20260531` + +## Impact + +- Parent-side orchestration lost track of a child reviewer Pod that had previously been visible. +- The parent could not attach/restore by name because the child was no longer visible to the parent Pod. +- Any review result already produced by the lost child would have been hard to recover through normal parent tools. +- Multi-agent workflows that rely on long-running reviewer/coder Pods become less reliable if spawned-child visibility can disappear during parent resume/restore/prune. + +In this instance the practical impact was low because the reviewer had read-only scope and the review could be re-run. The incident would be more serious for implementation Pods with unmerged write-scope work or for expensive/long review tasks. + +## Why this matters + +The current design intent is that Pod metadata is durable current state and spawned child registry persistence reuses Pod metadata. Parent-side tools should be able to inspect/attach/restore visible spawned children where durable state still records them, and pruning should be conservative enough not to erase reachable or recoverable child work prematurely. + +This incident suggests at least one of these paths needs inspection: + +- parent spawned-child registry persistence/restoration; +- pruning of unreachable children during parent restore; +- visibility rules for previously spawned child Pods after parent resume; +- distinction between live socket reachability, durable pod-store metadata, and parent-visible child registry; +- notification/read-output cursor behavior when a child is still running but no output arrives. + +## Notes for follow-up + +- The failure mode was not simply “child stopped”; the parent tool reported “not visible to this Pod,” which is different from stopped/unreachable. +- `InspectPod` had previously seen the child as live and locked; later `ListPods` returned no spawned Pods. +- The prune/reclaim notification may have happened after parent restore and may have removed child visibility state. +- A useful regression test would simulate parent restore with a child that is pending/running/unreachable at different phases and assert whether it remains visible, attachable, or intentionally pruned with a recoverable diagnostic. +- A workflow-level mitigation is to write important reviewer/coder outputs into ticket threads/artifacts promptly after reading them, and to re-run read-only reviewers if child visibility is lost. + +## Current workaround + +For `workspace-memory-lint-cli`, a replacement reviewer Pod was spawned with the same read-only task: + +- `workspace-memory-lint-reviewer-rerun-20260531` + +The original reviewer Pod was treated as lost/unrecoverable from the parent after `AttachOrRestorePod` reported it was not visible. diff --git a/work-items/open/20260527-000019-workspace-memory-lint-cli/item.md b/work-items/open/20260527-000019-workspace-memory-lint-cli/item.md index 0276a564..22accb64 100644 --- a/work-items/open/20260527-000019-workspace-memory-lint-cli/item.md +++ b/work-items/open/20260527-000019-workspace-memory-lint-cli/item.md @@ -7,7 +7,7 @@ kind: task priority: P2 labels: [migrated, memory, cli] created_at: 2026-05-27T00:00:19Z -updated_at: 2026-05-31T00:51:55Z +updated_at: 2026-05-31T02:14:28Z assignee: null legacy_ticket: null --- diff --git a/work-items/open/20260527-000019-workspace-memory-lint-cli/thread.md b/work-items/open/20260527-000019-workspace-memory-lint-cli/thread.md index d9a46c0c..36937ccc 100644 --- a/work-items/open/20260527-000019-workspace-memory-lint-cli/thread.md +++ b/work-items/open/20260527-000019-workspace-memory-lint-cli/thread.md @@ -19,4 +19,62 @@ Planning note: - Keep crate rename / single-binary architecture as future cleanup, not part of this ticket. +--- + + + +## Implementation report + +Implementation report from coder Pod `workspace-memory-lint-coder-20260531`: + +- Branch: `workspace-memory-lint-cli` +- Commit: `7a717f2d259563df562913e0c3ceb388b094b697` (`cli: add workspace memory lint`) +- Added `insomnia memory lint [--workspace ] [--json] [--warnings-as-errors]` as a headless mode in the existing `tui` crate/user-facing `insomnia` binary. +- `insomnia memory` alone remains a positional Pod name. +- The lint command resolves workspace root, collects existing summary/decisions/requests/knowledge records through `memory::WorkspaceLayout`, and lints with existing `memory::Linter` using `WriteMode::Update`. +- The command prints deterministic human output by default and stable JSON with workspace/files/errors/warnings/counts when `--json` is requested. +- Exit codes follow the ticket: 0 clean, 1 lint failures or warnings-as-errors, 2 usage/I/O/output/runtime failures. +- The headless path returns before raw terminal setup or Pod connection/spawn logic. + +Validation reported by coder: + +- `cargo fmt --check` passed +- `cargo test -p tui memory_lint -- --nocapture` passed +- `cargo test -p tui` passed +- `cargo check -p tui` passed +- `./tickets.sh doctor` passed +- `git diff --check` passed + +Unresolved issues: none. + + +--- + + + +## Review: approve + +External review by reviewer Pod `workspace-memory-lint-reviewer-rerun-20260531`: approve. + +The original reviewer Pod `workspace-memory-lint-reviewer-20260531` became non-visible to the parent before output could be recovered; this review was rerun with a replacement read-only reviewer Pod. + +Reviewer summary: + +- The implementation adds `insomnia memory lint` as a headless mode in the existing user-facing `insomnia` binary. +- The memory lint path branches before raw terminal setup and Pod connection/spawn logic. +- Parser tests preserve `insomnia memory` as positional Pod name behavior. +- The collector targets summary, decisions, requests, and knowledge records while ignoring opaque memory subsystem directories and workflow files. +- Existing `memory::Linter` and `WriteMode::Update` are used, and the code only reads files / writes reports. +- Human and JSON outputs are deterministic enough for the ticket, and exit code mapping matches requirements. + +Blockers: none. + +Non-blocking follow-ups: + +- Add broader fixture coverage for `_staging`, `_usage`, knowledge, and decisions if desired. +- Add process-level exit-code integration tests if a CLI test harness is introduced later. + +Validation adequacy: coder-reported validation is sufficient for this ticket. Reviewer additionally checked `git diff --check develop...HEAD` read-only. + + ---