yoi/work-items/open/20260529-205844-session-pod-state-boundary/item.md

6.2 KiB

id slug title status kind priority labels created_at updated_at assignee legacy_ticket
20260529-205844-session-pod-state-boundary session-pod-state-boundary Clarify session log and Pod metadata persistence boundaries open task P2
session-store
pod
persistence
architecture
2026-05-29T20:58:44Z 2026-05-29T20:58:44Z null null

Background

The current persistence design intentionally has two durable surfaces:

  • append-only session/segment logs, which are the authority for conversation/history state and segment lineage;
  • name-keyed Pod metadata, which is supposed to be a thin pointer layer for Pod-name attach/restore and spawned-child bookkeeping.

That boundary has become blurry. The session-store crate is named and documented primarily as session persistence, but it also owns Pod metadata types, the PodMetadataStore trait, validation of Pod names, and the filesystem layout {sessions_root}/pods/{pod_name}/metadata.json. In addition, Pod metadata currently stores spawned_children and resolved_manifest_snapshot, while session logs also store Pod scope snapshots as LogEntry::Extension entries. This creates a risk that session-log authority, Pod-state authority, and runtime mirrors drift or become hard to reason about.

Observed code points:

  • crates/session-store/src/lib.rs documents session persistence via append-only JSONL logs, but also exports pod_metadata types.
  • crates/session-store/src/fs_store.rs stores segment logs under {root}/{session_id}/{segment_id}.jsonl and Pod metadata under {root}/pods/{pod_name}/metadata.json in the same FsStore.
  • crates/session-store/src/pod_metadata.rs says metadata is a lightweight name-keyed pointer, but PodMetadata also includes spawned_children and resolved_manifest_snapshot.
  • crates/pod/src/pod.rs writes Pod metadata from run/restore/fork/compact paths (write_pod_metadata_active, write_pod_metadata_pending) and preserves existing spawned_children via a read-modify-write helper.
  • crates/pod/src/spawn/registry.rs treats durable spawned-child state as living in Pod metadata and runtime spawned_pods.json as a live mirror, while scope snapshots for resume live in the session log.
  • crates/tui/src/pod_list.rs reads {store_dir}/pods/*/metadata.json directly in some paths rather than using only the PodMetadataStore trait.

Goal

Audit and clarify the architectural boundary between session logs, Pod metadata/state, and runtime mirrors. If the current placement is acceptable, document the boundary precisely. If it is not, refactor toward clearer ownership without changing user-visible restore/attach semantics.

Desired boundary

The resulting design should make these responsibilities explicit:

  • Session log authority:
    • conversation history and system prompt replay;
    • segment lineage (forked_from, compacted_from);
    • request config / usage / metrics / memory extension records;
    • Pod runtime scope snapshots required to restore the same session without silently reclaiming delegated writes.
  • Pod metadata authority:
    • name-keyed active (SessionId, SegmentId) pointer;
    • resolved manifest snapshot needed for Pod-name restore when the source profile/manifest should not be re-evaluated;
    • spawned-child registry state, if retained here, with a documented reason why it is Pod state rather than session state.
  • Runtime mirrors:
    • sockets, lock-file allocations, and spawned_pods.json are live runtime views, not durable authority.

Acceptance criteria

  • Audit every public type/function in session-store related to Pod metadata and classify whether it is genuinely session-store responsibility or Pod-state responsibility.
  • Decide whether Pod metadata should remain inside session-store, move to a separate crate/module, or be renamed/split so the session-store boundary is less misleading.
  • Document the decision in code comments and/or crate/module docs.
  • If keeping Pod metadata in session-store, update docs so the crate is explicitly described as the durable store for both session logs and Pod name metadata, and explain why they share a root/backend.
  • If moving/splitting, introduce a clear API boundary so session log APIs do not need to expose Pod metadata concepts unnecessarily.
  • Remove or justify direct filesystem reads of pods/*/metadata.json outside the store abstraction, especially in TUI Pod list/discovery paths.
  • Clarify the authority of resolved_manifest_snapshot: whether it belongs in Pod metadata, session log, or another Pod-state record, and ensure restore paths follow the documented authority.
  • Clarify the authority of spawned_children: whether it belongs in Pod metadata, session log, or runtime registry, and ensure restore/prune/reclaim behavior follows the documented authority.
  • Ensure read-modify-write preservation of unrelated Pod metadata fields does not silently lose data when active pointer updates and spawned-child updates occur near each other; either make the update semantics explicit or add a safer merge/update API.
  • Preserve the current durable behavior unless deliberately changed:
    • Pod-name restore resolves active metadata then restores the session log;
    • session restore uses session log state and scope snapshots;
    • runtime spawned_pods.json remains a mirror;
    • stopped or unreachable child Pod metadata is not deleted merely because its socket is gone.
  • Add focused tests for whichever boundary is chosen, including active pointer updates preserving spawned children / manifest snapshot, spawned-child updates preserving active pointer / manifest snapshot, and discovery/restore behavior when one surface exists without the other.
  • Update any relevant docs or workflow notes if the persistence model changes.

Non-goals

  • Do not redesign the session-log schema unless the audit proves it is necessary.
  • Do not add backward compatibility for obsolete persistence layouts unless explicitly required by a chosen migration plan.
  • Do not change live Pod registry lock semantics except where necessary to align with the clarified durable authority.
  • Do not implement broader database storage or transactional storage in this ticket; if the boundary audit reveals a need for transactions, record it as a follow-up unless a minimal update API suffices.