6.2 KiB
| id | slug | title | status | kind | priority | labels | created_at | updated_at | assignee | legacy_ticket | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20260529-205844-session-pod-state-boundary | session-pod-state-boundary | Clarify session log and Pod metadata persistence boundaries | open | task | P2 |
|
2026-05-29T20:58:44Z | 2026-05-29T20:58:44Z | null | null |
Background
The current persistence design intentionally has two durable surfaces:
- append-only session/segment logs, which are the authority for conversation/history state and segment lineage;
- name-keyed Pod metadata, which is supposed to be a thin pointer layer for Pod-name attach/restore and spawned-child bookkeeping.
That boundary has become blurry. The session-store crate is named and documented primarily as session persistence, but it also owns Pod metadata types, the PodMetadataStore trait, validation of Pod names, and the filesystem layout {sessions_root}/pods/{pod_name}/metadata.json. In addition, Pod metadata currently stores spawned_children and resolved_manifest_snapshot, while session logs also store Pod scope snapshots as LogEntry::Extension entries. This creates a risk that session-log authority, Pod-state authority, and runtime mirrors drift or become hard to reason about.
Observed code points:
crates/session-store/src/lib.rsdocuments session persistence via append-only JSONL logs, but also exportspod_metadatatypes.crates/session-store/src/fs_store.rsstores segment logs under{root}/{session_id}/{segment_id}.jsonland Pod metadata under{root}/pods/{pod_name}/metadata.jsonin the sameFsStore.crates/session-store/src/pod_metadata.rssays metadata is a lightweight name-keyed pointer, butPodMetadataalso includesspawned_childrenandresolved_manifest_snapshot.crates/pod/src/pod.rswrites Pod metadata from run/restore/fork/compact paths (write_pod_metadata_active,write_pod_metadata_pending) and preserves existingspawned_childrenvia a read-modify-write helper.crates/pod/src/spawn/registry.rstreats durable spawned-child state as living in Pod metadata and runtimespawned_pods.jsonas a live mirror, while scope snapshots for resume live in the session log.crates/tui/src/pod_list.rsreads{store_dir}/pods/*/metadata.jsondirectly in some paths rather than using only thePodMetadataStoretrait.
Goal
Audit and clarify the architectural boundary between session logs, Pod metadata/state, and runtime mirrors. If the current placement is acceptable, document the boundary precisely. If it is not, refactor toward clearer ownership without changing user-visible restore/attach semantics.
Desired boundary
The resulting design should make these responsibilities explicit:
- Session log authority:
- conversation history and system prompt replay;
- segment lineage (
forked_from,compacted_from); - request config / usage / metrics / memory extension records;
- Pod runtime scope snapshots required to restore the same session without silently reclaiming delegated writes.
- Pod metadata authority:
- name-keyed active
(SessionId, SegmentId)pointer; - resolved manifest snapshot needed for Pod-name restore when the source profile/manifest should not be re-evaluated;
- spawned-child registry state, if retained here, with a documented reason why it is Pod state rather than session state.
- name-keyed active
- Runtime mirrors:
- sockets, lock-file allocations, and
spawned_pods.jsonare live runtime views, not durable authority.
- sockets, lock-file allocations, and
Acceptance criteria
- Audit every public type/function in
session-storerelated to Pod metadata and classify whether it is genuinely session-store responsibility or Pod-state responsibility. - Decide whether Pod metadata should remain inside
session-store, move to a separate crate/module, or be renamed/split so the session-store boundary is less misleading. - Document the decision in code comments and/or crate/module docs.
- If keeping Pod metadata in
session-store, update docs so the crate is explicitly described as the durable store for both session logs and Pod name metadata, and explain why they share a root/backend. - If moving/splitting, introduce a clear API boundary so session log APIs do not need to expose Pod metadata concepts unnecessarily.
- Remove or justify direct filesystem reads of
pods/*/metadata.jsonoutside the store abstraction, especially in TUI Pod list/discovery paths. - Clarify the authority of
resolved_manifest_snapshot: whether it belongs in Pod metadata, session log, or another Pod-state record, and ensure restore paths follow the documented authority. - Clarify the authority of
spawned_children: whether it belongs in Pod metadata, session log, or runtime registry, and ensure restore/prune/reclaim behavior follows the documented authority. - Ensure read-modify-write preservation of unrelated Pod metadata fields does not silently lose data when active pointer updates and spawned-child updates occur near each other; either make the update semantics explicit or add a safer merge/update API.
- Preserve the current durable behavior unless deliberately changed:
- Pod-name restore resolves active metadata then restores the session log;
- session restore uses session log state and scope snapshots;
- runtime
spawned_pods.jsonremains a mirror; - stopped or unreachable child Pod metadata is not deleted merely because its socket is gone.
- Add focused tests for whichever boundary is chosen, including active pointer updates preserving spawned children / manifest snapshot, spawned-child updates preserving active pointer / manifest snapshot, and discovery/restore behavior when one surface exists without the other.
- Update any relevant docs or workflow notes if the persistence model changes.
Non-goals
- Do not redesign the session-log schema unless the audit proves it is necessary.
- Do not add backward compatibility for obsolete persistence layouts unless explicitly required by a chosen migration plan.
- Do not change live Pod registry lock semantics except where necessary to align with the clarified durable authority.
- Do not implement broader database storage or transactional storage in this ticket; if the boundary audit reveals a need for transactions, record it as a follow-up unless a minimal update API suffices.