docs: memoryシステムの仕様変更と、動的Tool・VCSの話

bashツール一旦完了
bashツール実装
2026-05-01 18:47:52 +09:00 · 2026-05-01 18:47:09 +09:00 · 2026-05-01 18:14:13 +09:00
23 changed files with 1339 additions and 181 deletions
--- a/.insomnia/.gitignore
+++ b/.insomnia/.gitignore
@ -1 +1 @@
-_memory
+_staging
--- a/TODO.md
+++ b/TODO.md
@ -17,7 +17,6 @@
  - [ ] TUI 補完 + 型付き atom 化 → [tickets/submit-tui-completion.md](tickets/submit-tui-completion.md)
  - [ ] FileRef リゾルバ → [tickets/submit-file-ref-resolver.md](tickets/submit-file-ref-resolver.md)
 - [ ] メモリ機構
-  - [ ] Phase 2 consolidation → [tickets/memory-phase2-consolidation.md](tickets/memory-phase2-consolidation.md)
+  - [ ] Phase 2 consolidation + 整理 → [tickets/memory-phase2-consolidation.md](tickets/memory-phase2-consolidation.md)
  - [ ] 使用頻度メトリクス + Knowledge 化候補レポート → [tickets/memory-usage-metrics.md](tickets/memory-usage-metrics.md)
-  - [ ] GC（定期再評価） → [tickets/memory-gc.md](tickets/memory-gc.md)
 - ワークスペースのメモリーをLintするヘッドレスCLI
--- a/crates/manifest/src/scope.rs
+++ b/crates/manifest/src/scope.rs
@ -156,6 +156,23 @@ impl Scope {
            .collect()
    }

+    /// Deny rules with their targets resolved to absolute paths.
+    ///
+    /// Counterpart to [`allow_rules`](Self::allow_rules); together they
+    /// round-trip through [`ScopeConfig`] for callers that need to
+    /// rebuild a scope after layering extra rules on top of an
+    /// already-constructed [`Scope`].
+    pub fn deny_rules(&self) -> Vec<ScopeRule> {
+        self.deny
+            .iter()
+            .map(|r| ScopeRule {
+                target: r.target.clone(),
+                permission: r.permission,
+                recursive: r.recursive,
+            })
+            .collect()
+    }
+
    /// Iterate over absolute paths granted `Write` by an allow rule.
    /// Subset of [`readable_paths`](Self::readable_paths).
    pub fn writable_paths(&self) -> impl Iterator<Item = &Path> {
--- a/crates/pod/src/controller.rs
+++ b/crates/pod/src/controller.rs
@ -221,20 +221,49 @@ impl PodController {
            });

            // Register the builtin file-manipulation tools (Read / Write /
-            // Edit / Glob / Grep). `ScopedFs` carries the pod-lifetime
-            // scope/pwd; `Tracker` is session-scoped — a fresh instance per
-            // controller spawn ensures state from a previous process
-            // lifetime cannot be reused after a resume. The tracker is
-            // also handed to the Pod itself so Pod-level operations (e.g.
+            // Edit / Glob / Grep / Bash). `ScopedFs` carries the pod-
+            // lifetime scope/pwd; `Tracker` is session-scoped — a fresh
+            // instance per controller spawn ensures state from a previous
+            // process lifetime cannot be reused after a resume. The tracker
+            // is also handed to the Pod itself so Pod-level operations (e.g.
            // context compaction) can ask which files the agent has been
            // touching.
-            let fs = tools::ScopedFs::new(scope_for_tools, pwd_for_tools.clone());
+            //
+            // Bash spills long outputs to a per-pod subdir under the
+            // runtime dir. We layer a recursive `allow(Read)` rule for
+            // that path on top of the user-facing scope so the agent can
+            // `Read` the saved files without polluting the workspace.
+            // Same approach memory takes for its deny rules: round-trip
+            // through `ScopeConfig` and rebuild via `from_config`.
+            let bash_output_dir = runtime_dir.path().join("bash-output");
+            std::fs::create_dir_all(&bash_output_dir).map_err(|e| {
+                std::io::Error::other(format!(
+                    "create bash output dir {}: {e}",
+                    bash_output_dir.display()
+                ))
+            })?;
+            let mut scope_config = manifest::ScopeConfig {
+                allow: scope_for_tools.allow_rules(),
+                deny: scope_for_tools.deny_rules(),
+            };
+            scope_config.allow.push(manifest::ScopeRule {
+                target: bash_output_dir.clone(),
+                permission: manifest::Permission::Read,
+                recursive: true,
+            });
+            let scope_with_bash = manifest::Scope::from_config(&scope_config)
+                .map_err(std::io::Error::other)?;
+            let fs = tools::ScopedFs::new(scope_with_bash, pwd_for_tools.clone());
            let tracker = tools::Tracker::new();
            // The same ScopedFs also powers the IPC `ListCompletions`
            // query — keep a clone for the FS view we attach below,
            // since the tools consume `fs` itself.
            fs_for_view = fs.clone();
-            worker.register_tools(tools::builtin_tools(fs, tracker.clone()));
+            worker.register_tools(tools::builtin_tools(
+                fs,
+                tracker.clone(),
+                bash_output_dir,
+            ));

            // Memory subsystem opt-in. When `[memory]` is present in
            // the manifest, register the memory-specific Read/Write/Edit
--- a/crates/tools/Cargo.toml
+++ b/crates/tools/Cargo.toml
@ -19,7 +19,7 @@ serde_json = "1.0.149"
 sha2 = "0.11.0"
 tempfile = "3.27.0"
 thiserror = "2.0.18"
-tokio = { version = "1.51.1", features = ["rt"] }
+tokio = { version = "1.51.1", features = ["process", "rt", "time"] }
 tracing = "0.1.44"

 [dev-dependencies]
--- a/crates/tools/src/bash.rs
+++ b/crates/tools/src/bash.rs
@ -0,0 +1,581 @@
+//! `Bash` tool — execute shell commands in a one-shot, stateless way.
+//!
+//! Each call runs `bash -c <command>` via [`tokio::process::Command`].
+//! The wrapper redirects all output to a file so we never have to read
+//! from a pipe (which would expose us to bg-pipe hangs). There is no
+//! shell session: every call starts fresh at `cwd`, so the agent must
+//! chain `cd <dir> && cmd` when it wants to operate elsewhere. This
+//! mirrors Claude Code's own Bash tool — predictable, no hidden state.
+//!
+//! Output handling: when output is short (≤ 80 lines, ≤ 12 KiB) it is
+//! returned inline and the file is cleaned up. When it is longer the
+//! full output is left on disk and only the **last 80 lines** are
+//! returned, prefixed with the saved file's path. This sidesteps the
+//! Worker's blanket `ToolOutputLimits` (default 16 KiB), which would
+//! otherwise drop the *tail* of the output — usually the most useful
+//! part (errors, exit messages, summary). The saved file lives under
+//! a caller-supplied directory that the parent has added to the
+//! `ScopedFs` allow set, so the agent can inspect it via either Read
+//! or a follow-up Bash call.
+//!
+//! Filesystem and network access are NOT mediated by `ScopedFs`: the
+//! child process can touch any path. Safety is delegated to the
+//! Permission layer (deny/allow rules on the command string).
+
+use std::path::{Path, PathBuf};
+use std::process::Stdio;
+use std::sync::Arc;
+use std::time::Duration;
+
+use async_trait::async_trait;
+use llm_worker::tool::{Tool, ToolDefinition, ToolError, ToolMeta, ToolOutput};
+use serde::Deserialize;
+use tokio::process::Command;
+
+use crate::scoped_fs::ScopedFs;
+
+const DESCRIPTION: &str = "Execute a shell command via bash. Supports the \
+full shell — pipes, redirects, command substitution, `&&`/`||`. Each call \
+runs in a fresh shell rooted at the workspace; chain `cd <subdir> && cmd` \
+when you need to operate elsewhere. stdout and stderr are merged. Default \
+timeout 120s, max 600s.\n\n\
+Output handling: when the command produces more than 80 lines (or ~12 KiB), \
+the full output is saved to a file and only the LAST 80 lines are returned, \
+prefixed with the saved path. The path is readable by Read; you can also \
+inspect it from a follow-up Bash call (`grep ... <path>`, etc.).\n\n\
+Prefer dedicated tools when one fits: Read instead of `cat`/`head`/`tail` \
+on workspace files, Edit instead of `sed`/`awk` rewrites, Glob instead of \
+`find <name>`, Grep instead of `grep`/`rg`. Reach for Bash when the task \
+is shell-shaped: building, testing, version control, package management.";
+
+const DEFAULT_TIMEOUT_SECS: u64 = 120;
+const MAX_TIMEOUT_SECS: u64 = 600;
+
+/// Number of trailing lines returned when output spills to a file.
+const TAIL_LINES: usize = 80;
+
+/// Inline-return budget. Outputs at or below this are returned in full;
+/// above it triggers the spill-to-file path. Sized to leave headroom under
+/// the Worker's 16 KiB default `ToolOutputLimits` cap so the inline path
+/// reliably reaches the model intact.
+const INLINE_BYTE_BUDGET: usize = 12 * 1024;
+
+/// Maximum bytes loaded into memory from the spilled output file. The
+/// file itself can be arbitrarily large; we only ever read the tail end
+/// since that is what we return.
+const TAIL_READ_BUDGET: usize = 256 * 1024;
+
+#[derive(Debug, Deserialize, schemars::JsonSchema)]
+pub(crate) struct BashParams {
+    /// Shell command to execute. Passed verbatim to `bash -c`.
+    pub command: String,
+    /// Timeout in seconds. Defaults to 120, capped at 600.
+    #[serde(default)]
+    pub timeout: Option<u64>,
+}
+
+pub(crate) struct BashTool {
+    /// Workspace root that every invocation starts in. Snapshot of
+    /// `ScopedFs::pwd()` at registration time; never mutated, since we
+    /// don't track `cd` across calls.
+    cwd: PathBuf,
+    /// Directory to spill long outputs into. Caller is expected to have
+    /// added this path to the readable scope so the agent can Read the
+    /// saved files. The directory itself is created lazily.
+    output_dir: PathBuf,
+    /// Files we left on disk for follow-up inspection. Cleaned up on
+    /// `Drop` (= session end). `std::sync::Mutex` because access is
+    /// always synchronous and very brief.
+    spilled_outputs: std::sync::Mutex<Vec<PathBuf>>,
+}
+
+impl Drop for BashTool {
+    fn drop(&mut self) {
+        if let Ok(mut paths) = self.spilled_outputs.lock() {
+            for p in paths.drain(..) {
+                let _ = std::fs::remove_file(&p);
+            }
+        }
+    }
+}
+
+#[async_trait]
+impl Tool for BashTool {
+    async fn execute(&self, input_json: &str) -> Result<ToolOutput, ToolError> {
+        let params: BashParams = serde_json::from_str(input_json)
+            .map_err(|e| ToolError::InvalidArgument(format!("invalid Bash input: {e}")))?;
+        let timeout_secs = params
+            .timeout
+            .unwrap_or(DEFAULT_TIMEOUT_SECS)
+            .clamp(1, MAX_TIMEOUT_SECS);
+
+        // Persistent output file in the caller-supplied directory.
+        // `keep()` opts out of auto-delete so the agent can inspect the
+        // full output later; cleanup is deferred to `Drop` on this tool.
+        std::fs::create_dir_all(&self.output_dir).map_err(|e| {
+            ToolError::Internal(format!(
+                "create bash output dir {}: {e}",
+                self.output_dir.display()
+            ))
+        })?;
+        let output_path: PathBuf = tempfile::Builder::new()
+            .prefix("bash-")
+            .suffix(".log")
+            .tempfile_in(&self.output_dir)
+            .map_err(|e| ToolError::Internal(format!("output tempfile: {e}")))?
+            .into_temp_path()
+            .keep()
+            .map_err(|e| ToolError::Internal(format!("persist output tempfile: {e}")))?;
+
+        let output_path_str = output_path
+            .to_str()
+            .ok_or_else(|| ToolError::Internal("output path is not UTF-8".into()))?;
+
+        // Wrapper:
+        //   exec >file 2>&1     redirect stdout/stderr to the output file
+        //   { user_cmd }        run in a brace group (no subshell, so any
+        //                       `cd` inside still affects $? capture below)
+        //   __exit=$?           preserve the user command's exit code…
+        //   wait 2>/dev/null    …since `wait` clobbers $?. Reaping bg jobs
+        //                       guarantees the output file's writers all
+        //                       close before bash itself exits.
+        //   exit $__exit        propagate the user's exit
+        let wrapped = format!(
+            "exec >{out} 2>&1\n{{ {user_cmd}\n}}\n__insomnia_exit=$?\nwait 2>/dev/null\nexit $__insomnia_exit\n",
+            out = shell_single_quote(output_path_str),
+            user_cmd = params.command,
+        );
+
+        tracing::debug!(cmd = %params.command, cwd = %self.cwd.display(), timeout_secs, "Bash");
+
+        let mut child = Command::new("bash")
+            .arg("-c")
+            .arg(&wrapped)
+            .current_dir(&self.cwd)
+            .stdin(Stdio::null())
+            .stdout(Stdio::null()) // bash inherits — but the wrapper redirected via `exec`
+            .stderr(Stdio::null())
+            .kill_on_drop(true)
+            .spawn()
+            .map_err(|e| {
+                let _ = std::fs::remove_file(&output_path);
+                ToolError::ExecutionFailed(format!("spawn bash: {e}"))
+            })?;
+
+        let timeout_dur = Duration::from_secs(timeout_secs);
+        let wait_result = tokio::time::timeout(timeout_dur, child.wait()).await;
+        let (status, timed_out) = match wait_result {
+            Ok(Ok(s)) => (Some(s), false),
+            Ok(Err(e)) => {
+                let _ = std::fs::remove_file(&output_path);
+                return Err(ToolError::ExecutionFailed(format!("bash wait: {e}")));
+            }
+            Err(_) => (None, true),
+        };
+
+        // Inspect the on-disk output: total size first, tail bytes second.
+        let total_bytes = std::fs::metadata(&output_path)
+            .map(|m| m.len() as usize)
+            .unwrap_or(0);
+        let tail_bytes = read_tail_bytes(&output_path, TAIL_READ_BUDGET).unwrap_or_default();
+        let tail_text = String::from_utf8_lossy(&tail_bytes).into_owned();
+
+        let cmd_summary = truncate_for_summary(&params.command);
+
+        if timed_out {
+            // Preserve the partial output file — even cut-short logs help
+            // diagnose hangs.
+            let content = if total_bytes > 0 {
+                let last = take_last_n_lines(&tail_text, TAIL_LINES);
+                self.remember_spilled(&output_path);
+                Some(format!(
+                    "[partial output before timeout — full at {}]\n{last}",
+                    output_path.display()
+                ))
+            } else {
+                let _ = std::fs::remove_file(&output_path);
+                None
+            };
+            return Ok(ToolOutput {
+                summary: format!("$ {cmd_summary} (timed out after {timeout_secs}s)"),
+                content,
+            });
+        }
+
+        let status = status.expect("status set on the success branch");
+        let summary = match status.code() {
+            Some(0) => format!("$ {cmd_summary}"),
+            Some(c) => format!("$ {cmd_summary} (exit {c})"),
+            None => format!("$ {cmd_summary} (terminated by signal)"),
+        };
+
+        if total_bytes == 0 {
+            let _ = std::fs::remove_file(&output_path);
+            return Ok(ToolOutput {
+                summary,
+                content: None,
+            });
+        }
+
+        // Inline if the whole output fits in our tail-read window AND is
+        // small enough to ride under the Worker's default cap.
+        let line_count = tail_text.lines().count();
+        let fully_loaded = total_bytes <= tail_bytes.len();
+        let fits_inline =
+            fully_loaded && total_bytes <= INLINE_BYTE_BUDGET && line_count <= TAIL_LINES;
+
+        let content = if fits_inline {
+            let _ = std::fs::remove_file(&output_path);
+            Some(tail_text)
+        } else {
+            let last = take_last_n_lines(&tail_text, TAIL_LINES);
+            // When `fully_loaded` we know the exact line count; otherwise
+            // the file is bigger than our read window so we report bytes
+            // and an "approximate" disclaimer.
+            let header = if fully_loaded {
+                format!(
+                    "[showing last {TAIL_LINES} of {line_count} lines — full output ({total_bytes} bytes) at {}]",
+                    output_path.display()
+                )
+            } else {
+                format!(
+                    "[showing last {TAIL_LINES} lines (tail of {total_bytes}-byte output) — full at {}]",
+                    output_path.display()
+                )
+            };
+            self.remember_spilled(&output_path);
+            Some(format!("{header}\n{last}"))
+        };
+
+        Ok(ToolOutput { summary, content })
+    }
+}
+
+impl BashTool {
+    fn remember_spilled(&self, path: &Path) {
+        if let Ok(mut v) = self.spilled_outputs.lock() {
+            v.push(path.to_path_buf());
+        }
+    }
+}
+
+/// Read up to `max_bytes` from the end of `path`. If the file is smaller
+/// than `max_bytes`, the entire file is returned.
+fn read_tail_bytes(path: &Path, max_bytes: usize) -> std::io::Result<Vec<u8>> {
+    use std::io::{Read, Seek, SeekFrom};
+    let mut f = std::fs::File::open(path)?;
+    let len = f.seek(SeekFrom::End(0))?;
+    let start = if len > max_bytes as u64 {
+        len - max_bytes as u64
+    } else {
+        0
+    };
+    f.seek(SeekFrom::Start(start))?;
+    let mut buf = Vec::with_capacity((len - start) as usize);
+    f.read_to_end(&mut buf)?;
+    Ok(buf)
+}
+
+/// Return the last `n` lines of `text`. If `text` has `n` or fewer lines
+/// (per [`str::lines`]), the input is returned as-is (no allocation).
+fn take_last_n_lines(text: &str, n: usize) -> String {
+    if text.is_empty() {
+        return String::new();
+    }
+    let total = text.lines().count();
+    if total <= n {
+        return text.to_owned();
+    }
+    let skip = total - n;
+    let mut count = 0usize;
+    for (i, b) in text.bytes().enumerate() {
+        if b == b'\n' {
+            count += 1;
+            if count == skip {
+                return text[i + 1..].to_owned();
+            }
+        }
+    }
+    text.to_owned()
+}
+
+fn truncate_for_summary(command: &str) -> String {
+    let one_line = command.lines().next().unwrap_or("");
+    let mut chars = one_line.chars();
+    let head: String = chars.by_ref().take(80).collect();
+    if chars.next().is_some() {
+        let mut shortened = head;
+        while shortened.chars().count() > 77 {
+            shortened.pop();
+        }
+        shortened.push_str("...");
+        shortened
+    } else {
+        head
+    }
+}
+
+/// Wrap a string in single quotes for safe inclusion in a bash command.
+fn shell_single_quote(s: &str) -> String {
+    let escaped = s.replace('\'', "'\\''");
+    format!("'{escaped}'")
+}
+
+/// Factory for the `Bash` tool.
+///
+/// `output_dir` is where long outputs spill to; the caller is responsible
+/// for arranging that the path is in the agent's readable scope. Every
+/// invocation starts at `fs.pwd()` — the tool is intentionally stateless
+/// w.r.t. the working directory.
+pub fn bash_tool(fs: ScopedFs, output_dir: PathBuf) -> ToolDefinition {
+    Arc::new(move || {
+        let schema = schemars::schema_for!(BashParams);
+        let schema_value = serde_json::to_value(schema).unwrap_or(serde_json::json!({}));
+        let meta = ToolMeta::new("Bash")
+            .description(DESCRIPTION)
+            .input_schema(schema_value);
+        let tool: Arc<dyn Tool> = Arc::new(BashTool {
+            cwd: fs.pwd().to_path_buf(),
+            output_dir: output_dir.clone(),
+            spilled_outputs: std::sync::Mutex::new(Vec::new()),
+        });
+        (meta, tool)
+    })
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use manifest::Scope;
+    use tempfile::TempDir;
+
+    /// Test harness: workspace tempdir + a separate spill tempdir kept
+    /// alive for the test's lifetime. The spill dir is added to the
+    /// scope as readable so callers exercise the production path.
+    struct Harness {
+        _workspace: TempDir,
+        spill: TempDir,
+        fs: ScopedFs,
+    }
+
+    fn setup() -> Harness {
+        let workspace = TempDir::new().unwrap();
+        let spill = TempDir::new().unwrap();
+        let base = Scope::writable(workspace.path()).unwrap();
+        let mut config = manifest::ScopeConfig {
+            allow: base.allow_rules(),
+            deny: base.deny_rules(),
+        };
+        config.allow.push(manifest::ScopeRule {
+            target: spill.path().to_path_buf(),
+            permission: manifest::Permission::Read,
+            recursive: true,
+        });
+        let scope = Scope::from_config(&config).unwrap();
+        let fs = ScopedFs::new(scope, workspace.path().to_path_buf());
+        Harness {
+            _workspace: workspace,
+            spill,
+            fs,
+        }
+    }
+
+    fn make_tool(h: &Harness) -> Arc<dyn Tool> {
+        let def = bash_tool(h.fs.clone(), h.spill.path().to_path_buf());
+        let (_, tool) = def();
+        tool
+    }
+
+    #[tokio::test]
+    async fn runs_simple_command() {
+        let h = setup();
+        let def = bash_tool(h.fs.clone(), h.spill.path().to_path_buf());
+        let (meta, tool) = def();
+        assert_eq!(meta.name, "Bash");
+
+        let inp = serde_json::json!({ "command": "echo hello" });
+        let out = tool.execute(&inp.to_string()).await.unwrap();
+        assert_eq!(out.summary, "$ echo hello");
+        assert_eq!(out.content.as_deref().map(str::trim), Some("hello"));
+    }
+
+    #[tokio::test]
+    async fn merges_stdout_and_stderr() {
+        let h = setup();
+        let tool = make_tool(&h);
+
+        let inp = serde_json::json!({
+            "command": "echo out; echo err 1>&2",
+        });
+        let out = tool.execute(&inp.to_string()).await.unwrap();
+        let body = out.content.unwrap();
+        assert!(body.contains("out"));
+        assert!(body.contains("err"));
+    }
+
+    #[tokio::test]
+    async fn nonzero_exit_is_reported() {
+        let h = setup();
+        let tool = make_tool(&h);
+
+        let inp = serde_json::json!({ "command": "exit 7" });
+        let out = tool.execute(&inp.to_string()).await.unwrap();
+        assert!(out.summary.contains("exit 7"), "summary: {}", out.summary);
+        assert!(
+            out.content.is_none(),
+            "no output expected, got {:?}",
+            out.content
+        );
+    }
+
+    #[tokio::test]
+    async fn cd_does_not_persist_across_calls() {
+        // Stateless: a `cd` in one call must NOT leak into the next.
+        let h = setup();
+        let sub = h._workspace.path().join("nested");
+        std::fs::create_dir(&sub).unwrap();
+        let tool = make_tool(&h);
+
+        tool.execute(
+            &serde_json::json!({
+                "command": format!("cd {}", sub.to_str().unwrap()),
+            })
+            .to_string(),
+        )
+        .await
+        .unwrap();
+
+        let pwd_out = tool
+            .execute(&serde_json::json!({ "command": "pwd" }).to_string())
+            .await
+            .unwrap();
+        let body = pwd_out.content.unwrap();
+        let actual = std::fs::canonicalize(body.trim()).unwrap();
+        let workspace = std::fs::canonicalize(h._workspace.path()).unwrap();
+        assert_eq!(
+            actual, workspace,
+            "second call should start at workspace root, not the previous cd target"
+        );
+    }
+
+    #[tokio::test]
+    async fn timeout_kills_long_command() {
+        let h = setup();
+        let tool = make_tool(&h);
+
+        let inp = serde_json::json!({
+            "command": "sleep 30",
+            "timeout": 1,
+        });
+        let out = tool.execute(&inp.to_string()).await.unwrap();
+        assert!(
+            out.summary.contains("timed out"),
+            "summary: {}",
+            out.summary
+        );
+    }
+
+    #[tokio::test]
+    async fn invalid_json_is_invalid_argument() {
+        let h = setup();
+        let tool = make_tool(&h);
+
+        let err = tool.execute("not json").await.unwrap_err();
+        assert!(matches!(err, ToolError::InvalidArgument(_)));
+    }
+
+    #[tokio::test]
+    async fn long_output_spills_and_returns_tail() {
+        let h = setup();
+        let spill_dir = h.spill.path().to_path_buf();
+        let tool = make_tool(&h);
+
+        // 200 lines: "line 1" .. "line 200". Tail of 80 keeps lines 121-200.
+        let inp = serde_json::json!({
+            "command": "for i in $(seq 1 200); do echo line $i; done",
+        });
+        let out = tool.execute(&inp.to_string()).await.unwrap();
+        let body = out.content.expect("expected content");
+
+        assert!(
+            body.contains(&format!("showing last {TAIL_LINES} of 200 lines")),
+            "tail header missing in: {}",
+            &body[..body.len().min(300)]
+        );
+        assert!(
+            body.contains(spill_dir.to_str().unwrap()),
+            "spill dir path missing: {body}"
+        );
+        // Last 80 lines are 121..200.
+        assert!(body.contains("\nline 200\n"));
+        assert!(body.contains("\nline 121\n"));
+        // line 120 is the last *elided* line.
+        assert!(!body.contains("\nline 120\n"), "elided line leaked: {body}");
+    }
+
+    #[tokio::test]
+    async fn wide_short_output_still_spills_when_byte_budget_exceeded() {
+        let h = setup();
+        let spill_dir = h.spill.path().to_path_buf();
+        let tool = make_tool(&h);
+
+        // One single line of ~20 KiB (over INLINE_BYTE_BUDGET = 12 KiB).
+        let inp = serde_json::json!({
+            "command": "printf 'x%.0s' {1..20480}",
+        });
+        let out = tool.execute(&inp.to_string()).await.unwrap();
+        let body = out.content.unwrap();
+        assert!(
+            body.contains(spill_dir.to_str().unwrap()),
+            "expected spill marker in: {}",
+            &body[..body.len().min(200)]
+        );
+    }
+
+    #[tokio::test]
+    async fn background_job_does_not_hang() {
+        let h = setup();
+        let tool = make_tool(&h);
+
+        // The wrapper's `wait` ensures we don't hang on a stray bg pipe.
+        let inp = serde_json::json!({
+            "command": "(sleep 0.05; echo bg) &",
+            "timeout": 5,
+        });
+        let out = tool.execute(&inp.to_string()).await.unwrap();
+        assert!(
+            !out.summary.contains("timed out"),
+            "summary: {}",
+            out.summary
+        );
+    }
+
+    #[tokio::test]
+    async fn spilled_files_are_cleaned_up_on_drop() {
+        let h = setup();
+        let spill_dir = h.spill.path().to_path_buf();
+        let tool = make_tool(&h);
+
+        let inp = serde_json::json!({
+            "command": "for i in $(seq 1 200); do echo $i; done",
+        });
+        tool.execute(&inp.to_string()).await.unwrap();
+
+        // The spill dir should now contain exactly one bash-*.log file.
+        let files_before: Vec<_> = std::fs::read_dir(&spill_dir)
+            .unwrap()
+            .filter_map(Result::ok)
+            .map(|e| e.path())
+            .collect();
+        assert_eq!(files_before.len(), 1, "expected one spilled file");
+        let path = files_before.into_iter().next().unwrap();
+        assert!(path.exists());
+
+        drop(tool);
+        // Drop runs synchronously; file should be gone.
+        assert!(
+            !path.exists(),
+            "spilled file should be cleaned up on drop: {path:?}"
+        );
+    }
+}
--- a/crates/tools/src/lib.rs
+++ b/crates/tools/src/lib.rs
@ -1,8 +1,8 @@
 //! Built-in tools for the Insomnia LLM agent.
 //!
-//! Implements Read / Write / Edit / Glob / Grep on top of the `llm-worker`
-//! `Tool` infrastructure. Filesystem access is mediated by two orthogonal
-//! concerns:
+//! Implements Read / Write / Edit / Glob / Grep / Bash on top of the
+//! `llm-worker` `Tool` infrastructure. Filesystem access is mediated by
+//! two orthogonal concerns:
 //!
 //! - [`ScopedFs`] — pod-lifetime, expresses the write-block boundary for
 //!   the current scope. Derived from the manifest and shareable across
@ -13,17 +13,23 @@
 //!
 //! The Pod layer owns both instances and passes them to
 //! [`builtin_tools`] when registering tools on a `Worker`.
+//!
+//! `Bash` is the lone exception — its child processes bypass `ScopedFs`
+//! entirely. Safety for arbitrary command execution is delegated to the
+//! Permission layer (deny/allow rules on the command string).

 pub mod error;
 pub mod scoped_fs;
 pub mod tracker;

+mod bash;
 mod edit;
 mod glob;
 mod grep;
 mod read;
 mod write;

+pub use bash::bash_tool;
 pub use edit::edit_tool;
 pub use error::ToolsError;
 pub use glob::glob_tool;
@ -39,12 +45,22 @@ pub use write::write_tool;
 /// All returned factories share the same tracker instance so that
 /// `Read` / `Write` / `Edit` see a consistent history across tool
 /// invocations within a single session.
-pub fn builtin_tools(fs: ScopedFs, tracker: Tracker) -> Vec<llm_worker::tool::ToolDefinition> {
+///
+/// `bash_output_dir` is where the Bash tool spills long outputs. The
+/// caller is responsible for adding that path to the readable scope
+/// (see [`manifest::Scope::with_extra_read`]) so the agent can `Read`
+/// the saved files.
+pub fn builtin_tools(
+    fs: ScopedFs,
+    tracker: Tracker,
+    bash_output_dir: std::path::PathBuf,
+) -> Vec<llm_worker::tool::ToolDefinition> {
    vec![
        read_tool(fs.clone(), tracker.clone()),
        write_tool(fs.clone(), tracker.clone()),
-        edit_tool(fs.clone(), tracker.clone()),
+        edit_tool(fs.clone(), tracker),
        glob_tool(fs.clone()),
-        grep_tool(fs),
+        grep_tool(fs.clone()),
+        bash_tool(fs, bash_output_dir),
    ]
 }
--- a/crates/tools/src/tracker.rs
+++ b/crates/tools/src/tracker.rs
@ -31,7 +31,8 @@
 //! let scope = Scope::writable("/workspace").unwrap();
 //! let fs = ScopedFs::new(scope, PathBuf::from("/workspace")); // pod lifetime
 //! let tracker = Tracker::new();    // session lifetime
-//! let defs = builtin_tools(fs, tracker);
+//! let bash_outputs = PathBuf::from("/run/insomnia/bash-output");
+//! let defs = builtin_tools(fs, tracker, bash_outputs);
 //! ```

 use std::collections::{HashMap, VecDeque};
--- a/crates/tools/tests/edge_cases.rs
+++ b/crates/tools/tests/edge_cases.rs
@ -3,7 +3,7 @@
 use std::sync::Arc;

 use llm_worker::tool::{Tool, ToolDefinition};
-use manifest::Scope;
+use manifest::{Permission, Scope, ScopeConfig, ScopeRule};
 use serde_json::json;
 use tempfile::TempDir;
 use tools::{ScopedFs, Tracker, builtin_tools};
@ -27,19 +27,29 @@ impl Registry {
    }
 }

-fn setup() -> (TempDir, Registry) {
+fn setup() -> (TempDir, TempDir, Registry) {
    let dir = TempDir::new().unwrap();
-    let fs = ScopedFs::new(
-        Scope::writable(dir.path()).unwrap(),
-        dir.path().to_path_buf(),
-    );
+    let spill = TempDir::new().unwrap();
+    let base = Scope::writable(dir.path()).unwrap();
+    let mut config = ScopeConfig {
+        allow: base.allow_rules(),
+        deny: base.deny_rules(),
+    };
+    config.allow.push(ScopeRule {
+        target: spill.path().to_path_buf(),
+        permission: Permission::Read,
+        recursive: true,
+    });
+    let scope = Scope::from_config(&config).unwrap();
+    let fs = ScopedFs::new(scope, dir.path().to_path_buf());
    let tracker = Tracker::new();
-    (dir, Registry::new(builtin_tools(fs, tracker)))
+    let reg = Registry::new(builtin_tools(fs, tracker, spill.path().to_path_buf()));
+    (dir, spill, reg)
 }

 #[tokio::test]
 async fn unicode_path_and_content() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("日本語ファイル.txt");
    let content = "こんにちは 🦀 世界\nabc\n";

@ -70,7 +80,7 @@ async fn unicode_path_and_content() {
 async fn symlink_to_outside_scope_is_rejected_for_write() {
    use std::os::unix::fs::symlink;

-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let outside = TempDir::new().unwrap();
    let outside_target = outside.path().join("secret.txt");
    std::fs::write(&outside_target, "secret").unwrap();
@ -114,7 +124,7 @@ async fn symlink_to_outside_scope_is_rejected_for_write() {

 #[tokio::test]
 async fn empty_file_read_and_edit() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("empty.txt");
    std::fs::write(&file, "").unwrap();

@ -144,7 +154,7 @@ async fn empty_file_read_and_edit() {

 #[tokio::test]
 async fn very_long_single_line() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("long.txt");
    let big: String = "x".repeat(1024 * 1024); // 1 MiB, no newlines
    std::fs::write(&file, &big).unwrap();
@ -160,7 +170,7 @@ async fn very_long_single_line() {

 #[tokio::test]
 async fn relative_path_is_rejected() {
-    let (_dir, reg) = setup();
+    let (_dir, _spill, reg) = setup();
    let read = reg.get("Read");
    let err = read
        .execute(&json!({ "file_path": "relative.txt" }).to_string())
@ -171,7 +181,7 @@ async fn relative_path_is_rejected() {

 #[tokio::test]
 async fn directory_target_is_rejected_for_read() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let read = reg.get("Read");
    let err = read
        .execute(&json!({ "file_path": dir.path().to_str().unwrap() }).to_string())
@ -182,7 +192,7 @@ async fn directory_target_is_rejected_for_read() {

 #[tokio::test]
 async fn deeply_nested_new_file_is_created() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let deep = dir.path().join("a/b/c/d/e/deep.txt");
    let write = reg.get("Write");
    write
@ -200,7 +210,7 @@ async fn deeply_nested_new_file_is_created() {

 #[tokio::test]
 async fn replace_preserves_unicode() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("u.txt");
    std::fs::write(&file, "🦀 rust 🦀\n").unwrap();

@ -225,7 +235,7 @@ async fn replace_preserves_unicode() {

 #[tokio::test]
 async fn grep_handles_unicode_pattern() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("u.txt");
    std::fs::write(&file, "English\n日本語\nрусский\n").unwrap();

--- a/crates/tools/tests/integration.rs
+++ b/crates/tools/tests/integration.rs
@ -8,11 +8,25 @@ use std::path::Path;
 use std::sync::Arc;

 use llm_worker::tool::{Tool, ToolDefinition, ToolMeta};
-use manifest::Scope;
+use manifest::{Permission, Scope, ScopeConfig, ScopeRule};
 use serde_json::json;
 use tempfile::TempDir;
 use tools::{ScopedFs, Tracker, builtin_tools};

+fn scope_with_spill(workspace: &Path, spill: &Path) -> Scope {
+    let base = Scope::writable(workspace).unwrap();
+    let mut config = ScopeConfig {
+        allow: base.allow_rules(),
+        deny: base.deny_rules(),
+    };
+    config.allow.push(ScopeRule {
+        target: spill.to_path_buf(),
+        permission: Permission::Read,
+        recursive: true,
+    });
+    Scope::from_config(&config).unwrap()
+}
+
 struct Registry {
    entries: Vec<(ToolMeta, Arc<dyn Tool>)>,
 }
@ -36,15 +50,14 @@ impl Registry {
    }
 }

-fn setup() -> (TempDir, Registry) {
+fn setup() -> (TempDir, TempDir, Registry) {
    let dir = TempDir::new().unwrap();
-    let fs = ScopedFs::new(
-        Scope::writable(dir.path()).unwrap(),
-        dir.path().to_path_buf(),
-    );
+    let spill = TempDir::new().unwrap();
+    let scope = scope_with_spill(dir.path(), spill.path());
+    let fs = ScopedFs::new(scope, dir.path().to_path_buf());
    let tracker = Tracker::new();
-    let reg = Registry::new(builtin_tools(fs, tracker));
-    (dir, reg)
+    let reg = Registry::new(builtin_tools(fs, tracker, spill.path().to_path_buf()));
+    (dir, spill, reg)
 }

 async fn call(tool: &Arc<dyn Tool>, input: serde_json::Value) -> llm_worker::tool::ToolOutput {
@ -60,16 +73,16 @@ async fn call_err(tool: &Arc<dyn Tool>, input: serde_json::Value) -> llm_worker:
 }

 #[test]
-fn builtin_tools_registers_all_five() {
-    let (_dir, reg) = setup();
+fn builtin_tools_registers_full_set() {
+    let (_dir, _spill, reg) = setup();
    let mut names = reg.names();
    names.sort();
-    assert_eq!(names, vec!["Edit", "Glob", "Grep", "Read", "Write"]);
+    assert_eq!(names, vec!["Bash", "Edit", "Glob", "Grep", "Read", "Write"]);
 }

 #[test]
 fn meta_has_description_and_schema() {
-    let (_dir, reg) = setup();
+    let (_dir, _spill, reg) = setup();
    for (meta, _) in &reg.entries {
        assert!(
            !meta.description.is_empty(),
@ -87,7 +100,7 @@ fn meta_has_description_and_schema() {

 #[tokio::test]
 async fn read_then_edit_then_read_roundtrip() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("a.txt");
    std::fs::write(&file, "hello world\n").unwrap();
    let p = file.to_str().unwrap();
@ -119,7 +132,7 @@ async fn read_then_edit_then_read_roundtrip() {

 #[tokio::test]
 async fn write_then_grep_finds_content() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let write = reg.get("Write");
    let grep = reg.get("Grep");

@ -148,7 +161,7 @@ async fn write_then_grep_finds_content() {

 #[tokio::test]
 async fn glob_finds_written_files() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let write = reg.get("Write");
    let glob = reg.get("Glob");

@ -172,7 +185,7 @@ async fn glob_finds_written_files() {

 #[tokio::test]
 async fn out_of_scope_write_is_rejected() {
-    let (_dir, reg) = setup();
+    let (_dir, _spill, reg) = setup();
    let outside = TempDir::new().unwrap();
    let write = reg.get("Write");

@ -191,7 +204,7 @@ async fn out_of_scope_write_is_rejected() {

 #[tokio::test]
 async fn write_to_existing_without_read_fails() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("exists.txt");
    std::fs::write(&file, "preexisting").unwrap();

@ -212,7 +225,7 @@ async fn write_to_existing_without_read_fails() {
 async fn shared_scoped_fs_across_tools() {
    // The key invariant: all builtin tools share the same ScopedFs instance,
    // so read-history set by Read is visible to Edit and Write.
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("shared.txt");
    std::fs::write(&file, "one\n").unwrap();

@ -235,7 +248,7 @@ async fn shared_scoped_fs_across_tools() {

 #[tokio::test]
 async fn edit_requires_read_across_tools() {
-    let (dir, reg) = setup();
+    let (dir, _spill, reg) = setup();
    let file = dir.path().join("a.txt");
    std::fs::write(&file, "foo\n").unwrap();

@ -256,17 +269,17 @@ async fn edit_requires_read_across_tools() {

 #[tokio::test]
 async fn deterministic_tool_order_is_registration_order() {
-    let (_dir, reg) = setup();
-    // Registration order from builtin_tools(): Read, Write, Edit, Glob, Grep
+    let (_dir, _spill, reg) = setup();
+    // Registration order from builtin_tools(): Read, Write, Edit, Glob, Grep, Bash
    let names: Vec<&str> = reg.entries.iter().map(|(m, _)| m.name.as_str()).collect();
-    assert_eq!(names, vec!["Read", "Write", "Edit", "Glob", "Grep"]);
+    assert_eq!(names, vec!["Read", "Write", "Edit", "Glob", "Grep", "Bash"]);
 }

 // Regression: tool name capitalization matches Claude Code reference
 #[test]
 fn tool_names_match_reference_spec() {
-    let (_dir, reg) = setup();
-    for expected in ["Read", "Write", "Edit", "Glob", "Grep"] {
+    let (_dir, _spill, reg) = setup();
+    for expected in ["Read", "Write", "Edit", "Glob", "Grep", "Bash"] {
        assert!(
            reg.entries.iter().any(|(m, _)| m.name == expected),
            "missing tool {expected}"
@ -278,12 +291,11 @@ fn tool_names_match_reference_spec() {
 async fn tracker_recent_files_tracks_read_write_edit() {
    // Build a fresh registry that shares a tracker we can query afterwards.
    let dir = TempDir::new().unwrap();
-    let fs = ScopedFs::new(
-        Scope::writable(dir.path()).unwrap(),
-        dir.path().to_path_buf(),
-    );
+    let spill = TempDir::new().unwrap();
+    let scope = scope_with_spill(dir.path(), spill.path());
+    let fs = ScopedFs::new(scope, dir.path().to_path_buf());
    let tracker = Tracker::new();
-    let reg = Registry::new(builtin_tools(fs, tracker.clone()));
+    let reg = Registry::new(builtin_tools(fs, tracker.clone(), spill.path().to_path_buf()));

    let a = dir.path().join("a.txt");
    let b = dir.path().join("b.txt");
@ -324,5 +336,52 @@ async fn tracker_recent_files_tracks_read_write_edit() {
    );
 }

+#[tokio::test]
+async fn bash_inherits_scoped_fs_pwd() {
+    // The Bash tool starts at the ScopedFs's pwd. Without any `cd`, its
+    // `pwd` should canonicalize to the workspace root we set up.
+    let (dir, _spill, reg) = setup();
+    let bash = reg.get("Bash");
+    let out = call(&bash, json!({ "command": "pwd" })).await;
+    let body = out.content.unwrap();
+    let actual = std::fs::canonicalize(body.trim()).unwrap();
+    let expected = std::fs::canonicalize(dir.path()).unwrap();
+    assert_eq!(actual, expected);
+}
+
+#[tokio::test]
+async fn bash_spilled_file_is_readable_via_read_tool() {
+    // Long Bash output spills to a path that the controller has added to
+    // the readable scope. The agent should be able to Read that path
+    // exactly like any in-scope file.
+    let (_dir, spill, reg) = setup();
+    let bash = reg.get("Bash");
+    let out = call(
+        &bash,
+        json!({ "command": "for i in $(seq 1 200); do echo line $i; done" }),
+    )
+    .await;
+    let body = out.content.unwrap();
+    let spill_str = spill.path().to_str().unwrap();
+
+    // Extract the spilled path from the marker line.
+    let marker = body.lines().next().unwrap();
+    let prefix_pos = marker
+        .find(spill_str)
+        .expect("marker should reference the spill dir");
+    let path_end_rel = marker[prefix_pos..]
+        .find(".log")
+        .expect("marker should end the path with .log");
+    let spilled = &marker[prefix_pos..prefix_pos + path_end_rel + 4];
+
+    // Read the file via the Read tool — must succeed (in scope).
+    let read_out = call(&reg.get("Read"), json!({ "file_path": spilled })).await;
+    let read_body = read_out.content.expect("Read returned content");
+    // The full 200 lines should be in the saved file even though Bash
+    // returned only the tail of 80.
+    assert!(read_body.contains("line 1\n"), "missing line 1: {read_body}");
+    assert!(read_body.contains("line 200"), "missing line 200");
+}
+
 // Sanity: unused Path import guard
 const _: fn() -> &'static Path = || Path::new("/");
--- a/crates/tui/src/tool.rs
+++ b/crates/tui/src/tool.rs
@ -590,22 +590,31 @@ fn render_default(tc: &ToolCallBlock, mode: Mode) -> Vec<Line<'static>> {
            .add_modifier(Modifier::ITALIC),
    );

-    let summary_source: String = match &tc.state {
+    // Body source: prefer the full output (e.g. Bash's stdout/stderr) so
+    // Detail mode can expose it. Fall back to the summary when the tool
+    // didn't emit any content.
+    let body_source: String = match &tc.state {
+        ToolCallState::Done {
+            output: Some(out), ..
+        }
+        | ToolCallState::Error {
+            output: Some(out), ..
+        } => out.clone(),
        ToolCallState::Done { summary, .. } | ToolCallState::Error { summary, .. } => {
            summary.clone()
        }
        _ => String::new(),
    };
-    let summary_cap = match mode {
+    let body_cap = match mode {
        Mode::Normal => 3,
        Mode::Detail => usize::MAX,
        Mode::Overview => unreachable!(),
    };
-    if !summary_source.is_empty() {
+    if !body_source.is_empty() {
        emit_capped_lines(
            &mut lines,
-            &summary_source,
-            summary_cap,
+            &body_source,
+            body_cap,
            Style::default().fg(Color::Gray),
        );
    }
--- a/docs/plan/memory-prompts.md
+++ b/docs/plan/memory-prompts.md
@ -29,18 +29,31 @@ Phase 1 は「派生物を作る」段階ではなく、「起きたことを抽
 - 出力は schema 準拠の構造化データのみ。自由文の補足説明で schema 外情報を足さない
 - 対象が無ければ空配列を返す

-### Phase 2: 統合 prompt
+### Phase 2: 統合 + 整理 prompt

-Phase 2 は既存 `memory/*`、`knowledge/*`、staging を見て、追加・更新・統合を agentic に判断する:
+Phase 2 は既存 `memory/*`、`knowledge/*`、staging を見て、統合 phase と整理 phase を 1 セッション内で続けて回す。両 phase に共通する原則:

- 入力には staging の活動ログ、既存 `memory/*`（summary / decisions / requests）の全文、Knowledge 化候補レポートを含める
+- 入力には staging の活動ログ、既存 `memory/*`（summary / decisions / requests）の全文、Knowledge 化候補レポート、整理材料（使用頻度メトリクス、Linter Warn、`replaced` chain、sources 過多情報）を含める
 - 既存 `knowledge/*` は prompt に埋めず、Knowledge 検索ツール経由で agent が必要分を引く。まず候補レポートの source や staging の話題に近い slug を検索し、ヒットした slug / description / kind / `model_invokation` を見て適合先を探す
 - 新規作成より update を優先し、既存 slug に自然に統合できる場合は新規 file を増やさない
 - Decisions / Requests は staging の `source` をそのまま使い、LLM が `sources` を組み立てない
 - summary は必要なときだけ rewrite し、常に 1-5k tokens 目安に圧縮する
- 削除は直接行わず、Decision の置き換えは `status: replaced` と `replaced_by` で表現する
+- Decision の置き換えは `status: replaced` と `replaced_by` で表現する
 - 人間編集との不整合が見える rewrite は避け、衝突しそうなら保守的に統合する

+統合 phase の追加指示:
+
+- staging の活動ログを decisions / requests / summary / Knowledge update に落とし込む
+- Knowledge 新規作成は候補レポート掲載 source 由来に限る（詳細は §Phase 2: Knowledge 書き込み prompt）
+
+整理 phase の追加指示（統合 phase 完了後、余力で実行）:
+
+- 既存 record 群を `outdated`、`superseded`、`unused`、`noisy` の観点で評価し、なぜ整理対象なのかを分類する
+- 明示 invoke の保護閾値超過 record は drop / 大幅圧縮の対象外とする
+- `similar-slug`、`sources-overflow`、`replaced` 滞留は主に `superseded` または `noisy` の材料として扱う
+- merge / split / trim / drop の理由を git diff から読める形で残す
+- 直接削除してよいが、git で可逆である前提に甘えすぎず、誤判定しやすいものは merge / trim を優先する
+
 ### Phase 2: Knowledge 書き込み prompt

 Knowledge の新規作成 / 更新では、Phase 2 全体の原則に加えて以下を明示する:
@ -63,17 +76,6 @@ Knowledge の新規作成 / 更新では、Phase 2 全体の原則に加えて

 初期範囲では専用の監査 LLM は持たない（`memory.md` §書き込み経路と Linter / §将来検討 参照）。意味破壊の抑制は Phase 2 prompt 側の情報損失最小化指示と git diff レビューに寄せる。後から 2 層目として挟む際の入力・check 項目・pass-fail 返却形式はそのときに詰める。

-### GC prompt
-
-GC は Phase 2 より攻撃的に整理してよいが、可逆性と説明可能性を保つ:
-
- 入力には GC 対象 record 群に加えて、Linter Warn、使用頻度メトリクス、`replaced` chain、sources 過多情報を含める
- 明示 invoke 保護閾値を超える record は drop / 大幅圧縮の対象外とする
- 各 record を `outdated`、`superseded`、`unused`、`noisy` の観点で評価し、なぜ GC 対象なのかを分類する
- `similar-slug`、`sources-overflow`、`replaced` 滞留は主に `superseded` または `noisy` の材料として扱う
- merge / split / trim / drop の理由を diff から読める形で残す
- 直接削除してよいが、git で可逆である前提に甘えすぎず、誤判定しやすいものは merge / trim を優先する
-
 ## 関連

 - `docs/plan/memory.md`: memory 全体方針
--- a/docs/plan/memory.md
+++ b/docs/plan/memory.md
@ -44,7 +44,7 @@ Knowledge は Phase 2 が自律的に新規作成 / 更新 / フラグ切替を

 - **採択 gate**: Knowledge 新規作成は使用頻度メトリクスの Knowledge 化候補レポート（後述）に載った source から派生する場合に限る。閾値未満のうちは decisions / requests に留める
 - **Linter**: 構造違反を watch（詳細は後述）。意味破壊の自動検出は初期は持たず、挙動を見てから監査 LLM 層を追加する（将来検討）
- **OS ファイル権限**: 人間が書き換えさせたくない record は `-r--` にしてロック。Phase 2 / GC の write は OS レベルで弾かれる
+- **OS ファイル権限**: 人間が書き換えさせたくない record は `-r--` にしてロック。Phase 2 の write は OS レベルで弾かれる

 Workflow も同じフラグ仕様（`workflow.md` 参照）。per-record 保護フラグを提供する拡張は将来検討、初期は OS 権限で足りる。

@ -135,15 +135,16 @@ Workflow 保護は専用 tool schema のトリックではなく Linter ルー
 - **Compact との順序**: 同一 turn 完了後の post-run チェックで Phase 1 を **compact より前** に走らせる。compact は history を組み替えるので、extract の入力範囲（session log 上の entry index）は compact 前のほうが安定する
 - **並走防止 (Phase 1 同士)**: Pod 上の `extract_in_flight` フラグで in-flight 中の新規 trigger を skip。完了時点で閾値超過していれば直ちに次回を発火し、新 pointer 以降の最大範囲を回収する（pending 状態は保持しない＝完了時の閾値再評価で coalesce 相当の挙動を成立させる）

-#### Phase 2: 永続化への統合
+#### Phase 2: 永続化への統合 + 整理

- **Trigger**: staging の累積ファイル数 or bytes が閾値超過、または compact 発火時（必ず flush）
+- **Trigger**: staging の累積ファイル数 or bytes が閾値超過
 - **実行主体**: Phase 1 を終えた pod が consolidation Worker を spawn。並走防止は staging 配下の進行状況ファイル（後述）で担保
- **入力**: 起動時スナップショットで確定した consumed ID list 分の staging エントリ（活動ログ + `source`）+ 既存 `memory/*`（summary / decisions / requests）の全文 + **Knowledge 化候補レポート**（後述の使用頻度メトリクスから機械集計、閾値超過の source 一覧）。既存 `knowledge/*` は全文を prompt に埋めず、Knowledge 検索ツール経由で agent が必要分を引く
+- **入力**: 起動時スナップショットで確定した consumed ID list 分の staging エントリ（活動ログ + `source`）+ 既存 `memory/*`（summary / decisions / requests）の全文 + **Knowledge 化候補レポート**（後述の使用頻度メトリクスから機械集計、閾値超過の source 一覧）+ **整理材料**（明示 invoke の使用頻度メトリクス、Linter Warn、`replaced` chain、sources 過多情報）。既存 `knowledge/*` は全文を prompt に埋めず、Knowledge 検索ツール経由で agent が必要分を引く
 - **処理**: sub-Worker に **memory 専用 Tool（read / write / edit、Linter 内蔵）+ Knowledge 検索ツール + memory 検索ツール** を渡し、agentic に以下を自律判断:
-  - 新規 decisions / requests を 1 件 1 ファイルで追加。`sources` は staging の `source` をコピー（LLM 推論ではない）
+  - **統合**: 新規 decisions / requests を 1 件 1 ファイルで追加。`sources` は staging の `source` をコピー（LLM 推論ではない）
  - 活動ログから派生する Knowledge（用語定義 / 運用方針 / ルール / 事実 / ノウハウ）を新規作成 or 既存 patch。**新規作成は候補レポート掲載の source から派生する場合に限る**。`kind` を frontmatter に持ち、`last_sources` を更新
  - summary を必要に応じて rewrite
+  - **整理（余力 phase）**: 既存 record 群を §評価カテゴリ で評価し、保護閾値外の対象を drop / merge / split / trim / rewrite。Linter Warn で検出した類似 slug 乱立 / sources 過多 / `replaced` 滞留はここで収斂させる
 - **書き込み先**: `memory/*` と `knowledge/*`。Workflow 禁止は Linter で担保（`workflow.md` 参照）
 - **完了処理**: consumed ID list の staging のみ cleanup（実行中に Phase 1 が追加した分は残す）。Phase 2 完了時に staging に新着があれば次を発火（Coalesce）
 - **モデル**: `memory.consolidation_model`。reasoning 系
@ -164,7 +165,9 @@ Workflow 保護は専用 tool schema のトリックではなく Linter ルー

 - **rewrite は許可**。既存内容と新規情報を統合・再構成して情報密度を上げることを優先。単純 append（追記で増やすだけ）は避ける
 - rewrite 時は**情報損失を最小化**する: 既存の主張・根拠・sources を保持。表現を整理・短縮しても、含まれている要素は落とさない
- 削除は置き換え記録（`status: replaced` + `replaced_by: <slug>`）で表現、直接削除しない
+- Decision の置き換えは `status: replaced` + `replaced_by: <slug>` で表現、直接削除しない
+- 整理 phase での drop は許可。ただし保護閾値（§判断ルール）超過 record は drop / 大幅圧縮の対象外。誤判定しやすいものは merge / trim を優先
+- 各 record の整理理由は `outdated | superseded | unused | noisy` の §評価カテゴリ で説明可能にし、git diff から読み取れる粒度の操作にする
 - Knowledge は既存 record 群の slug / description / kind / `model_invokation` を入口に適合先を探し、自然に統合できるなら新規 slug を増やさない
 - 人間編集は git diff で顕在化する前提。整合しない rewrite は避け、衝突時は git で解決

@ -176,11 +179,11 @@ Memory record の書き込みは Phase 2 が自律判断し、Offer は設けな

 #### Compact との関係

-基本分離（memory は独立トリガー、compact は `input_tokens` 既存閾値のまま）。ただし **compact 発火時は Phase 2 を必ず同時 flush**（compact で失われる raw を漏らさないため）。
+基本分離（memory は独立トリガー、compact は `input_tokens` 既存閾値のまま）。compact で失われる session log の raw は **Phase 1 が compact より前に走ることで staging に保全**される（§Phase 1 §Compact との順序 参照）。Phase 2 を compact に同期させる義務はなく、staging 累積閾値で独立に発火する。

-### GC（定期再評価）
+### 整理（GC 相当）の扱い

-Phase 2 とは別経路で memory を再評価する定期ジョブ。Phase 2 は rewrite 許可で情報統合寄りの働きをするが、それでも残る以下の課題の出口として機能する:
+Phase 2 は rewrite 許可で情報統合寄りの働きをするが、それでも残る以下の課題は **Phase 2 の余力 phase で同じ agent が処理**する（独立 trigger / 独立 Agent は持たない）:

 - 重要度の低い record が累積する
 - 類似 slug が乱立する（Linter Warn で検出したものをまとめて処理）
@ -190,25 +193,23 @@ Phase 2 とは別経路で memory を再評価する定期ジョブ。Phase 2

 他プロジェクトの GC 設計の横断比較は `docs/ref/memory-systems.md` §8。

-#### 権限と操作粒度
+#### 操作粒度

-GC Agent は **drop / merge / split を自律実行**（削除まで含む）。人間 offer はかけず、結果は git diff で検証する建て付け。operation 粒度は以下の両方:
+整理 phase は Phase 2 統合 phase と同じ memory 専用 Tool（read / write / edit、内部で pre-write Linter）を使う。operation 粒度は自然にサポートされる（専用 API は用意しない）:

 - **ファイル単位**: 丸ごと drop、複数ファイルの merge、1 ファイルの分割（split）
 - **ファイル内の部分削除**: 本文の一部節・箇条を削除 or 圧縮。frontmatter の `sources` 古いエントリの trim も含む

-Phase 2 と同じ memory 専用 Tool（read / write / edit、内部で pre-write Linter）を使うので、operation 粒度は自然にサポートされる（専用 API は用意しない）。
+#### 評価カテゴリ

-#### GC の評価カテゴリ
-
-GC は record を一律に「stale」とみなさず、少なくとも次の 4 カテゴリで評価する:
+整理対象 record は一律に「stale」とみなさず、少なくとも次の 4 カテゴリで評価する:

 - `outdated`: 以前は妥当だったが、現在の実装・方針・運用と不整合になっている
 - `superseded`: 別 record が実質的な正本になっており、元の record は置き換え済みに近い
 - `unused`: 誤りではないが、明示 invoke や検索でほとんど参照されずノイズ化している
 - `noisy`: 内容自体は有効でも、粒度・重複・冗長さ・sources 過多などで discovery / retrieval を悪化させている

-これらは **保護条件ではなく GC 理由の分類**。保護条件は別に持ち、その上で `drop / merge / split / trim / rewrite` のどれを選ぶかをこのカテゴリで説明可能にする。
+これらは **保護条件ではなく整理理由の分類**。保護条件は別に持ち、その上で `drop / merge / split / trim / rewrite` のどれを選ぶかをこのカテゴリで説明可能にする。

 #### 使用頻度メトリクス

@ -226,12 +227,12 @@ GC は record を一律に「stale」とみなさず、少なくとも次の 4

 **累積方式**（後集計アプローチ）: 上記 invoke 記録に対して最大 10 回前の invoke から現在までの時系列窓でフィルタして集計する。

-**Knowledge 化候補レポート**: Phase 2 が入力に受け取る、Knowledge 新規作成 gate 用の機械集計。対象は `memory/*` 配下の record（Phase 1 成果物である decisions / requests / 既存 knowledge）で、明示 invoke 頻度が閾値超過のものを列挙する。spike 除外のため、同一 session 内の連続参照は 1 count に丸め、複数 session での再参照を要件とする。閾値の具体値は運用で調整、設定ファイルで tune。
+**Knowledge 化候補レポート**: Phase 2 統合 phase が入力に受け取る、Knowledge 新規作成 gate 用の機械集計。対象は `memory/*` 配下の record（Phase 1 成果物である decisions / requests / 既存 knowledge）で、明示 invoke 頻度が閾値超過のものを列挙する。spike 除外のため、同一 session 内の連続参照は 1 count に丸め、複数 session での再参照を要件とする。閾値の具体値は運用で調整、設定ファイルで tune。

 #### 判断ルール

 - 保護閾値: **明示 invoke** の `frequency >= 1.0 invokes/Mtoken` の record は drop / 大幅圧縮の対象外（初期値 1.0、workspace 設定でカスタマイズ可）。`model_invokation` 注入による常駐は計数対象外（別指標として後段で参照）
- GC の評価カテゴリは `outdated | superseded | unused | noisy` を使う。単一 record が複数カテゴリに該当してもよい
+- 整理 phase の評価カテゴリは `outdated | superseded | unused | noisy` を使う。単一 record が複数カテゴリに該当してもよい

 ### ファイル形式

--- a/docs/plan/tool_dispatch.md
+++ b/docs/plan/tool_dispatch.md
@ -0,0 +1,135 @@
+# 複数ツール動的読み込み機構の設計
+
+## Context
+
+INSOMNIA はエージェントが扱うツール数の増加 (built-in tools + MCP サーバ + ユーザ定義) を想定する必要がある。すべてを upfront に context へ展開すると以下が問題になる:
+
+- **入力トークン消費**: 30-50 ツールで 10-20K tokens を消費しうる (Anthropic 公式ガイド)
+- **ツール選択精度の低下**: 数十個を超えるとモデルの tool selection accuracy が落ちる
+- **KV cache 効率**: ローカル推論では prefill コストが重く、prefix が動くと再計算が走る
+
+Claude Code が採用する deferred tools 機構 (`docs/ref/claude-code-deferred-tools.md`) と OpenAI Harmony のアプローチ (`docs/ref/tool_approach_comparison.md`) を比較すると、**全モデルで同じ deferred 方式は通用しない**。モデルファミリごとに戦略を切り替えられる抽象が必要。
+
+## 決定事項
+
+### 二層分離: Registry / ContextRenderer
+
+ツール抽象を以下の二層に分ける。両者の責務を厳密に切り離す。
+
+| 層 | 責務 | 単一の真実 |
+|---|---|---|
+| Registry | ツール実装・schema・名前解決・引数バリデーション | 全ツール常時登録 |
+| ContextRenderer | モデルへ渡す prompt にどの tool 定義を、どの形式で、どこに置くか | モデル戦略ごとに差し替え |
+
+**重要**: バリデーションは **Registry 側で実引数 vs 登録 schema の照合**だけで行う。「context にスキーマテキストが現れているか」は検証条件にしない (Claude Code 実演で確認済み: `claude-code-deferred-tools.md` §10)。これにより、ContextRenderer がどんな戦略で schema を見せていようと、registry の真実性が一本化される。
+
+### 戦略 (RenderStrategy) のモデル系統別マッピング
+
+| モデル系統 | 戦略 | 根拠 |
+|---|---|---|
+| Claude 系 (Anthropic API + ローカル Anthropic 互換) | **Deferred** + ToolSearch 相当 | XML+JSON のテキスト表現で tool_result からも schema 注入可。prefix 安定 |
+| OpenAI 系 (gpt-oss / Harmony / Responses) | **Upfront** + MCP-style dispatcher | namespace ブロックが構造化されており、後追い注入が訓練分布外。汎用 dispatcher で外部解決 |
+| Hermes / Qwen / Llama 等独自系 | **Upfront** または **Rolling Developer Message** | モデル個別 chat template に従い、必要なら境界で書き換え |
+
+### 戦略を決める軸
+
+`RenderStrategy` は以下の組み合わせで表現:
+
+- **配置**: `SystemPrompt` (固定) / `DeveloperMessage` (cache 境界) / `ToolResultStream` (Claude 流注入)
+- **発見手段**: `AlwaysVisible` / `MetaTool { search, describe }` / `Static`
+- **変更時の cache 影響**: `PrefixStable` / `RewindToBoundary`
+
+Claude 系 = `(ToolResultStream, MetaTool, PrefixStable)`、OpenAI 系 = `(DeveloperMessage, Static, RewindToBoundary)` または `(SystemPrompt + dispatcher, AlwaysVisible, PrefixStable)`。
+
+## 設計詳細
+
+### Registry インターフェース
+
+```rust
+pub trait ToolRegistry {
+    fn list(&self) -> Vec<ToolMeta>;            // 名前+description のみ
+    fn schema(&self, name: &str) -> Option<&ToolSchema>;
+    fn dispatch(&self, name: &str, args: Value) -> Result<ToolResult, DispatchError>;
+}
+```
+
+- `list()` は常に全ツール返す (戦略は ContextRenderer 側の責務)
+- `schema()` は ToolSearch 相当の動線で使用
+- `dispatch()` は schema 照合+実行。**context に schema text があるかは見ない**
+
+### ContextRenderer インターフェース
+
+```rust
+pub trait ContextRenderer {
+    fn initial_render(&self, registry: &dyn ToolRegistry) -> InitialContext;
+    fn on_tool_load(&self, name: &str, registry: &dyn ToolRegistry) -> Option<ContextDelta>;
+    fn parse_call(&self, raw_output: &str) -> Result<ToolCall, ParseError>;
+    fn format_result(&self, name: &str, result: &ToolResult) -> String;
+}
+```
+
+戦略ごとに実装を差し替える:
+
+- `ClaudeDeferredRenderer`: 初期 prompt に core tools のみ展開、`tool_search` メタツールを常設、ロード時は tool_result として `<function>{schema}</function>` を流す
+- `HarmonyUpfrontRenderer`: developer メッセージに namespace で全 tool 展開、ロード概念なし
+- `HarmonyDispatcherRenderer`: namespace は `call_mcp(server, tool, args)` だけ、サブツール解決は外部 MCP
+- `RollingDeveloperRenderer`: 一定境界 (compaction 等) で developer メッセージを再描画。cache 損失は境界で吸収
+
+### Validation / Retry レイヤ
+
+ツール呼び出しの失敗ハンドリングは ContextRenderer / Registry の上に置く独立層:
+
+```rust
+pub struct ToolDispatcher<R: ToolRegistry, C: ContextRenderer> {
+    registry: R,
+    renderer: C,
+    retry_policy: RetryPolicy,
+}
+```
+
+責務:
+
+1. モデル出力をパース (`renderer.parse_call`)
+2. registry で schema 照合 → invalid なら error tool_result を返す
+3. dispatch → 結果を `renderer.format_result` で整形してモデルへ
+4. malformed 出力時は error フィードバックして同一ターン内修正を促す
+
+これは `tool_approach_comparison.md` §4 で議論した「プロバイダ側がやっている (フォーマット規約 / バリデーション / リトライ / 訓練投資)」のうち、**ローカルモデル向けには (4) が効かないため (1)-(3) を自前で組む**ことに対応する。
+
+### KV cache / prompt cache 整合
+
+戦略の選択は cache 効率に直結する:
+
+- **PrefixStable 戦略 (Claude Deferred / Dispatcher パターン)**: 初期 prefix が固定。ToolSearch 結果や dispatcher 経由の動的解決は **会話末尾の tool_result** に積まれるため、前方プレフィックスが揺らがない
+- **RewindToBoundary 戦略 (Rolling Developer)**: tool セット変更が cache 全消し。compaction 境界に同期させて損失を抑える
+
+Anthropic API の `prompt caching` は Explicit (cache_control) で、`llm_providers.md` §Prompt caching の `CacheStrategy::Explicit { max_breakpoints }` と整合する。ローカル推論の KV cache は基本 prefix-only のため `Auto` 相当。両方とも「安定 prefix 設計」に効く。
+
+## 根拠
+
+- **Registry vs Context 分離**: Claude Code の実演で「context に schema があるかは validation に無関係」と判明 (`claude-code-deferred-tools.md` §10)。同じ抽象で Claude / OpenAI / ローカル系を統一できる
+- **戦略の差し替え可能性**: Harmony は構造化トークン+namespace 前提で、Claude の deferred 方式が直接通用しない (`tool_approach_comparison.md` §1, §2)。モデル系統ごとの戦略切り替えは避けられない
+- **MCP-style dispatcher**: OpenAI が MCP 統合で採用している方向。namespace に汎用 entry point だけ置き、サブツール解決を外部化する。upfront にせず、訓練分布も逸脱しない
+- **検証レイヤを別層に**: モデル側のフォーマットの強さ (Harmony の特殊トークン) と弱さ (Claude の正規表現パース) で fail mode が違うため、ContextRenderer に閉じ込めずに上位で統一的に扱う
+
+## 実装原則
+
+- **registry は llm-worker の上位層に置く** (低レベル基盤に留める方針: `feedback_llm_worker_scope.md`)
+- **MCP サーバ統合は registry のバックエンドの一つ**として扱い、Harmony 側 dispatcher と内部実装を共有
+- **ContextRenderer の選択は ProviderScheme と一対多**: `scheme/anthropic` → ClaudeDeferred、`scheme/openai_responses` → HarmonyUpfront 等。`llm_providers.md` のプロバイダカタログにレンダラ指定を載せる
+- **ToolSearch 相当は MetaTool として実装**: registry 側に `__tool_search` / `__tool_describe` を登録し、ContextRenderer の戦略によって core tools に含めるか否かを決める
+- **テスト**: registry のバリデーションが context schema 有無に依存しないことを property test で保証する
+
+## Scope 外
+
+- 個別の MCP プロトコル実装 / サーバ連携の詳細
+- 各モデル固有の chat template レンダリング (Hermes / Qwen / Llama 等の差分は別 ticket)
+- Tool 結果の構造化出力 (citation / file reference 等) のスキーマ
+- Tool 並列実行・依存解決・cancellation
+- ユーザ定義ツールの permission 管理 (sandbox.md と別)
+
+## 参考
+
+- `docs/ref/claude-code-deferred-tools.md` — Claude Code の deferred tools 機構と実演による検証
+- `docs/ref/tool_approach_comparison.md` — Anthropic / OpenAI のツール呼び出しアプローチ比較
+- `docs/plan/llm_providers.md` — プロバイダ抽象とスキーム / capability 設計
--- a/docs/ref/claude-code-deferred-tools.md
+++ b/docs/ref/claude-code-deferred-tools.md
@ -266,6 +266,8 @@ system prompt 冒頭に置かれる:

 Pod / insomnia でローカル LLM を使う場合、(4) が効かない。最近のローカル向けエージェントモデルは tool use 用に訓練されているので XML パースのような原始的処理は不要だが、各モデルが訓練された自前のフォーマット (Hermes / Llama / Qwen / Mistral 等で異なる) があり、それに合わせてレンダリングする必要がある。

+なお、Anthropic と OpenAI のツール呼び出しアプローチの比較 (XML タグ vs 特殊制御トークン、JSON Schema vs TypeScript namespace、thinking block vs 3 チャネル分離) は [tool_approach_comparison.md](./tool_approach_comparison.md) を参照。ローカルモデル向けの設計では、Claude 系の知見 (本書: deferred / registry / context 圧縮) と OpenAI 系の知見 (Harmony: トークン保証 / チャネル分離 / 公開仕様) を役割で使い分けるのが自然。
+
 具体的な責務分担は以下:

 - ツール定義のレンダリング: モデル固有のテンプレート (chat template の `tools` 拡張等) に合わせる
--- a/docs/ref/tool_approach_comparison.md
+++ b/docs/ref/tool_approach_comparison.md
@ -0,0 +1,100 @@
+# Anthropic Claude と OpenAI ChatGPT におけるツール呼び出しアプローチの比較
+
+## 概要
+
+LLMをエージェントとして動かす上で、ツール呼び出し（function calling / tool use）の仕組みはモデルの実用性を大きく左右する。ChatGPTもClaudeも、最終的に開発者へ返るのは構造化されたJSONレスポンスである点は共通している。しかし、その裏側でモデルが実際に生成しているトークン列、ツールを定義する構文、思考過程の扱い方には明確な違いがある。本レポートでは、両者のアプローチを4つの観点から整理する。
+
+---
+
+## 1. 内部出力フォーマット — XML風タグ vs 特殊制御トークン
+
+最も根本的な違いは、モデルが生のテキストとして何を出力しているかにある。
+
+**Claude (Anthropic)** は、`<function_calls>`、`<invoke>`、`<parameter>` といったXML風のタグでツール呼び出しを表現する。これらはトークナイザーから見れば普通のテキストトークンであり、APIサーバー側が正規表現でパースして構造化データに変換する。Anthropic公式のドキュメントでも、この出力は厳密なXMLとして扱われるわけではなく、正規表現パースを前提に設計されている、と明記されている。
+
+**OpenAI (ChatGPT / gpt-oss)** は、Harmonyと呼ばれるレスポンスフォーマットを採用しており、`<|start|>`、`<|message|>`、`<|channel|>`、`<|call|>`、`<|end|>`、`<|return|>` といった**特殊制御トークン**でメッセージ構造を区切る。これらは見た目こそタグ風だが、トークナイザー内では1つの専用IDを持つ単一トークンとして扱われる。つまり、テキストを後からパースしているのではなく、トークンレベルで構造が埋め込まれている。
+
+この違いは設計哲学の差を反映している。Claudeは可読性とトレーニングデータとの親和性を重視した「テキスト寄り」のアプローチであり、OpenAIはトークンレベルで構造を保証する「プロトコル寄り」のアプローチといえる。
+
+---
+
+## 2. ツール定義の構文 — JSON Schema vs TypeScript風
+
+開発者がツールをモデルに教える際の書き方も大きく異なる。
+
+**Claude** ではツールはJSON Schemaで定義する。`name`、`description`、`input_schema` を持つオブジェクトの配列をAPIに渡し、APIサーバーが内部でシステムプロンプトを構築してモデルに提示する。JSON Schemaという既存の標準仕様にそのまま乗っているため、他のJSON処理エコシステムとの相性が良い。
+
+**OpenAI** はHarmonyフォーマットの中で、ツール定義をTypeScript風の型構文で記述する。関数は `namespace functions { ... }` というブロックでまとめられ、各関数は `type get_current_weather = (_: { location: string, format?: "celsius" | "fahrenheit" }) => any;` のように定義される。コメントが説明文として機能する。
+
+開発者がOpenAI APIを叩く際にはJSON Schema形式で渡せるが、サーバー内部で `harmony` ライブラリがこれをTypeScript風構文に変換してからモデルに渡している。最終的にモデルが「読む」のはTypeScript風の表現である。これは、コード補完タスクで大量のTypeScriptを学習しているLLMにとって型シグネチャの方が認識しやすい、という経験的判断に基づくと考えられる。
+
+---
+
+## 3. 引数のフォーマット
+
+ツール呼び出し時の引数の渡し方は、両者で似てはいるが微妙に違う。
+
+**Claude** では、文字列やスカラー値はそのままタグ内に書き、リストやオブジェクトなどの複合型はJSONとして埋め込む、というハイブリッド方式を取る。例えば `<parameter name="city">Tokyo</parameter>` のようにシンプルな値は素のテキストで、配列やネストした構造はJSONで表現される。
+
+**OpenAI (Harmony)** では、引数全体を一括してJSONで渡す。`<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>` というように、`<|message|>` 以降に丸ごとJSONオブジェクトを置き、`<|call|>` トークンで実行要求を確定する。`<|constrain|>json` は出力をJSONに制約することを示すヒントとして機能する。
+
+エンジニアリング的には、JSONで統一する方がパースが単純で予測可能だが、Claudeのハイブリッド方式は単純な値の場合のトークン消費を抑える効果がある。
+
+---
+
+## 4. 思考過程の分離
+
+エージェントモデルでは、思考と最終応答とツール呼び出しを分離する仕組みが重要になる。両者ともこれに対応しているが、構造化の度合いが異なる。
+
+**Claude** は `thinking` ブロックを持ち、Extended Thinkingモードでは推論内容を専用ブロックに格納する。基本的には「思考」と「応答」の2層構造である。
+
+**OpenAI (Harmony)** は **チャネル** という、より明確に分離された3層構造を採用している。
+
+- `analysis` チャネル — 生のchain-of-thought。安全性のトレーニングを受けておらず、ユーザーには通常見せない。
+- `commentary` チャネル — ツール呼び出しや、ユーザーに見せても良い計画的なコメント。
+- `final` チャネル — ユーザー向けの最終応答。
+
+このチャネル分離により、開発者は推論プロセスをログとして残しつつ、ユーザーには `final` チャネルだけを表示する、といった運用が容易になる。
+
+---
+
+## 5. 開発者から見た最終的なレスポンス
+
+ここまで内部の差を見てきたが、API経由で開発者が受け取るレスポンスは両者ともJSONである。
+
+- Claudeは `tool_use` タイプのコンテンツブロックとしてツール呼び出しを返す。
+- OpenAIは `tool_calls` 配列としてツール呼び出しを返す。
+
+つまり、開発者の視点では「JSONを送ってJSONを受け取る」点は同じであり、内部表現の違いはアプリケーション層には漏れない。違いが顕在化するのは、ファインチューニング、推論サーバーの自前構築、デバッグでモデルの生出力を観察する場合などに限られる。
+
+---
+
+## 比較表
+
+| 観点 | Claude (Anthropic) | ChatGPT (OpenAI / Harmony) |
+|---|---|---|
+| 構造の区切り | XML風タグ（通常のテキストトークン） | 特殊制御トークン（単一トークン） |
+| パース方式 | 正規表現ベース | トークンレベルで構造化 |
+| ツール定義の表現 | JSON Schema | TypeScript風型構文 + namespace |
+| 引数のフォーマット | スカラはそのまま、複合型はJSON | 全てJSON |
+| 思考の分離 | thinkingブロック（2層） | analysis / commentary / final（3層チャネル） |
+| API応答 | tool_useブロック（JSON） | tool_calls配列（JSON） |
+| 設計の傾向 | テキスト寄り・既存標準活用 | プロトコル寄り・トークン専用化 |
+
+---
+
+## 考察
+
+両社のアプローチの違いは、それぞれの強みとトレードオフを反映している。
+
+Claudeのアプローチは、JSON SchemaとXML風タグという既存の表記法を活用しており、ツールチェーンの相互運用性が高い。一方で、生成された出力が「壊れた」場合（タグの閉じ忘れなど）のパース失敗リスクは構造的に存在する。
+
+OpenAIのHarmonyは、特殊トークンによってトークンレベルで構造が保証されるため、フォーマットの破綻が起きにくい。チャネル分離のような細かい構造化も自然に実現できる。一方で、独自プロトコルであるため、モデルを使う側のスタックがHarmonyを正しく扱う必要があり、外部ツールとの統合に追加実装が必要になる場面もある（実際、TensorRT-LLMやvLLMなどでHarmonyトークンの取り扱いに関する問題が報告されている）。
+
+エージェント開発者にとって重要なのは、これらの差は通常APIの抽象化に隠されている、という点である。ただし、ローカルで重み付きモデルを動かす、ファインチューニングを行う、エージェントの動作をデバッグする、といった一段深い作業に踏み込む場合には、内部表現の理解が直接実装に効いてくる。
+
+---
+
+## 注記
+
+本レポートで示したOpenAI側の詳細は、主にオープンウェイトモデル `gpt-oss` 向けに公開されているHarmonyフォーマットの仕様に基づく。商用のGPT-4o / GPT-5などの内部表現は完全には公開されていないが、HarmonyはOpenAIのResponses APIを模倣するように設計されており、商用モデルもおおむね類似した構造を採用していると推測される。
--- a/docs/research/zed-deltadb.md
+++ b/docs/research/zed-deltadb.md
@ -0,0 +1,104 @@
+# DeltaDB 調査メモ（Zed Industries）
+
+調査日: 2026-05-01
+出典: Zed 公式ブログ「Sequoia Backs Zed's Vision for Collaborative Coding」を中心に、二次情報・関連記事を補強。
+
+## 1. 何か
+
+DeltaDB は Zed Industries が開発中の **operation-based version control system / synchronization engine** である。Zed エディタの Series B（Sequoia Capital 主導、$32M）と同時に発表された、同社の次フェーズの中核プロダクト。
+
+ひとことで言うと「Git の commit ベースを置き換えるのではなく、**コミット間の "あらゆる編集操作"** を粒度として保持する VCS」。CRDT を用いてリアルタイムに変更を記録・同期し、Git と相互運用可能な設計を採る。
+
+> "DeltaDB ... uses CRDTs to incrementally record and synchronize changes as they happen ... designed to interoperate with Git" — Zed Blog
+
+## 2. 動機（解こうとしている問題）
+
+LLM／AI エージェント時代のコーディングで、Git の粒度が粗すぎることが課題と捉えられている。
+
+- **会話／意図がコードから剥がれる**: PR コメントやチャットでのやり取り、AI への指示・修正・pivot は、コードが変わると参照先を失い、文脈ごと失われる。
+- **スナップショットでは追えない**: Git は commit という離散点しか持たない。エージェントとの "continuous dialogue" や、commit 未満の編集ステップを履歴として扱えない。
+- **同時編集の衝突**: 複数の AI エージェント＋人間がリアルタイムに編集する時、従来の merge conflict モデルは機能しにくい。
+- **コード位置の永続参照**: コードがリファクタや改名で移動するたび、URL 形式のパーマリンクや議論の固定ピンが切れる。
+
+これらは OpenAI の Sean Grove が指摘した「進化する仕様・プロンプトをどう track するか」という課題感とも重なるとされている。
+
+## 3. 技術的な構成要素
+
+### 3.1 Operation-based design（vs. snapshot-based）
+
+- Git の **commit = snapshot** に対し、DeltaDB は **operation = 編集 1 個** を一次データとして保持する。
+- 「every operation, not just commits」を track する。
+- スナップショットは operation log から導出可能な派生物として位置付けられる（VCS 系では "operational" / "patch theory" 系の系譜：Pijul, Fossil, darcs と同じ家系の発想）。
+
+### 3.2 CRDT による同期
+
+- 並行編集の整合性を CRDT（Conflict-free Replicated Data Types）で取る。
+- これは Zed エディタの multiplayer editing で既に実戦投入されている技術の延長：
+  - **Logical location**: 編集をオフセットではなく `(insertion id, offset)` のアンカーで表現し、操作を可換にする。
+  - **Replica ID + sequence number**: 中央が replica id を一度割り当てれば、以降は各 replica が独立に一意 ID を生成可能。
+  - **Tombstone**: 削除はテキストを物理削除せず墓標として残し、論理位置解決を保つ。
+  - **Lamport timestamp / vector timestamp**: 因果順序を尊重し、並行挿入の可視性を制御。
+  - **Per-user undo map**: 単一スタックではなく操作 ID → カウントの map で、ユーザーごとに独立 undo を実現。
+- DeltaDB はこの editor 内 CRDT を、**ファイル横断・リポジトリ規模・永続化**にスケールさせるレイヤと読み解ける。
+
+### 3.3 Character-level permalink
+
+- 「あらゆるコード変換を生き延びる文字単位のパーマリンク」を提供する。
+- スナップショット時刻のファイル＋行番号ではなく、CRDT のアンカー（insertion id ベース）に対して URL 的な参照を発行できるため、リネーム・リフォーマット・移動でも切れない。
+- ユースケース: 議論／レビュー／AI への指示／設計メモを、特定の文字位置に永続的に固定する。
+
+### 3.4 Git との相互運用
+
+- リプレースではなく interop 前提。
+- 既存 Git リポジトリを保ったまま段階導入できる戦略で、エンタープライズ採用のハードルを下げる狙い。
+- 推測: operation log → Git commit へのフラット化／ Git commit → operation log への lift が可能な層を持つはず（公式の実装仕様は未公開）。
+
+## 4. Zed エディタとの関係
+
+- DeltaDB は Zed の中で「人間 × 人間」「人間 × AI エージェント」「AI × AI」の協働基盤になる。
+- Zed が既に持つ multiplayer editing（CRDT）を、**editor session の寿命を超えて永続化・分散同期**する位置づけ。
+- 「terminal interface、local IDE、web-based agent tool」の 3 系統の良さを束ねた統合 GUI を作る、という Zed の方針の中核。
+- 「コードベースを生きた、辿れる履歴（a living, navigable history）として扱う」というビジョンの実装手段。
+
+## 5. ビジネス／ライセンス
+
+- Zed 本体と同様に **オープンソース＋ optional paid managed service** モデル。
+- Series B（$32M、Sequoia 主導）の調達は DeltaDB 開発を主目的の一つとしている。報道では累計 $42M 規模との表記もある。
+- 競合として GitHub と比較する論調も出てきている（例: Hypeburner が "GitHub Competitor" と表現）。ただし公式は「Git と interop」を強調しており、置換戦略ではなく上位レイヤ戦略。
+
+## 6. 既知の不明点（現時点で公開されていない情報）
+
+- 具体的なデータモデル／ストレージフォーマット（log の物理表現、圧縮、GC、tombstone 回収など）。
+- Git ↔ DeltaDB ブリッジの双方向変換の詳細（特に rebase、cherry-pick、shallow clone との整合）。
+- ブランチ・マージのモデル（patch theory ライクな順序非依存マージか、Git ライクなブランチ概念を載せるか）。
+- スケーラビリティ特性（モノレポ、巨大履歴、多数 replica）。
+- アクセス制御・権限モデル（CRDT の特性上、後付けが難しい領域）。
+- リリース時期、API 仕様、SDK の有無。
+
+これらは公開ロードマップが出るまで判断保留。
+
+## 7. 関連技術／系譜
+
+- **Operation-based / patch-based VCS**: darcs, Pijul, Fossil。理論的には patch theory／category-theoretic merge。
+- **CRDT 系コラボエディタ**: Google Docs, Figma, Zed multiplayer。Yjs / Automerge は CRDT ライブラリの代表。
+- **永続的位置参照**: Tree-sitter ベースの semantic anchor、Sourcegraph の SCIP、`git-blame` の line tracking。DeltaDB は CRDT identity を使うため理論的にこれらより堅牢な anchor を提供できる。
+- **AI エージェント協働基盤**: OpenAI の "evolving spec" 議論、Anthropic の Computer Use 系、各社の MCP。DeltaDB は「エージェント↔コード↔人間の対話」を VCS 層で受ける狙い。
+
+## 8. 自プロジェクト（insomnia）への含意メモ
+
+参考材料として残す（採用の可否ではない）。
+
+- LLM エージェントのインタラクション履歴をコードに永続的に紐付けたい場面（例: 「この修正はどの会話・どのプロンプトから来たか」）で、character-level permalink の発想は流用余地がある。
+- ScopedFs を将来スクリプティング言語に公開する計画（memory: project_scopedfs_scripting）と組み合わせる際、エージェントの編集系列を operation log として残すかどうかは設計判断ポイント。
+- 短期的には DeltaDB そのものを依存に取り込む選択肢はないが、「commit 未満の粒度をどう持つか」という設計議論の参照点として有用。
+
+## 9. 参考リンク
+
+- [Sequoia Backs Zed's Vision for Collaborative Coding — Zed Blog](https://zed.dev/blog/sequoia-backs-zed) — 一次情報源
+- [How CRDTs make multiplayer text editing part of Zed's DNA — Zed Blog](https://zed.dev/blog/crdts) — DeltaDB の基盤となる Zed の CRDT 実装解説
+- [Partnering with Zed — Sequoia Capital](https://sequoiacap.com/article/partnering-with-zed-the-ai-powered-code-editor-built-from-scratch/) — 投資側の論点
+- [Zed Raises $32M in Series B, Pivots to DeltaDB — Hypeburner](https://hypeburner.com/blog/news/zed-deltadb)
+- [Zed Raises $32M ... Unveils DeltaDB — Menlo Times](https://www.menlotimes.com/post/zed-raises-32-million-in-series-b-to-build-next-gen-operation-based-version-control-unveils-deltad)
+- [Zed Industries Raises $32M ... DeltaDB — CXO Digital Pulse](https://www.cxodigitalpulse.com/zed-industries-raises-32-million-to-redefine-ai-powered-code-collaboration-with-deltadb/)
+- [DeltaDB From Zed — shapeof.com (August Mueller)](https://shapeof.com/archives/2025/8/deltadb_from_zed.html) — 第三者の所感
+- [Conflict-free replicated data type — Wikipedia](https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type)
--- a/tickets/bash-tool.md
+++ b/tickets/bash-tool.md
@ -10,9 +10,11 @@ Permission 層（deny/allow ルール）との統合が前提。

 - コマンド実行（`tokio::process::Command`）
 - タイムアウト（`timeout` パラメータ、デフォルト 120秒、最大 600秒）
- 作業ディレクトリの永続（ツール内部で `pwd` 状態を保持、`cd` で変更可能）
+- stateless: 各呼び出しは workspace root から fresh start。pwd は session を跨いで保持しない（`cd <dir> && cmd` のように chain させる運用。Claude Code と同方針）
 - stdout/stderr の結合出力
 - ToolOutput の summary（コマンド + exit code）+ content（出力テキスト）
+- 出力ハンドリング: 短い場合（≤ 80行 & ≤ 12 KiB）はインラインで返す。それを超えたら full output を `<runtime_dir>/<pod_name>/bash-output/` 配下のファイルに退避し、tail 80行 + ファイルパスを返却。Worker の `ToolOutputLimits` (default 16 KiB) が末尾を切り捨てる挙動を回避するため
+- 退避ファイルは scope に `allow(Read)` で追加するので `Read` ツールで読める。Pod 終了で `RuntimeDir::Drop` がまとめて掃除、session 終了でも `BashTool::Drop` が個別削除

 ## Scope との関係

@ -25,3 +27,12 @@ Bash の子プロセスは ScopedFs を経由しない。Scope による保護
 ## 依存チケット

 - [permission-extension-point.md](permission-extension-point.md) — deny/allow ルールによる Bash コマンド制御
+
+## Review
+- 状態: Approve with follow-up（Round 2）
+- レビュー詳細: [./bash-tool.review.md](./bash-tool.review.md)
+- 日付: 2026-05-01（Round 2: 2026-05-01）
+- Round 2 残課題:
+  - ~~チケット本文「作業ディレクトリの永続」の記述を stateless 仕様に更新~~（解消済み）
+  - `crates/tools/src/lib.rs:51` の broken intra-doc link 修正（別コミットで対応）
+  - TUI `render_default` 回帰テストは別チケットへ切り出し
--- a/tickets/bash-tool.review.md
+++ b/tickets/bash-tool.review.md
@ -0,0 +1,127 @@
+# Review: Bash ツール
+
+## Round 1 (初回レビュー)
+
+### 前提・要件の確認
+
+- **コマンド実行 (`tokio::process::Command`)**: 満たされている。`crates/tools/src/bash.rs:94-103` で `bash -c <wrapped>` を起動。`stdin(null)` で stdin ブロックを防止、`kill_on_drop(true)` でタイムアウト時のリーク防止。
+- **timeout (default 120s / max 600s)**: 満たされている。`bash.rs:38-39, 64-67` の `clamp(1, 600)`、`bash.rs:130-144` の `tokio::time::timeout`。`timeout_kills_long_command` で動作確認済み。
+- **作業ディレクトリの永続**: 満たされている。`cd` のパースに頼らず wrapper script + tempfile で post-command の `pwd` を取得（`bash.rs:74-90`）。`cd_persists_across_calls` テストで `subdir` 移動後の `pwd` が反映されることを確認。`canonicalize` 同士で比較しており macOS の `/private/tmp` ずれにも耐性あり。
+- **stdout/stderr 結合**: 満たされている。wrapper 内 `exec 2>&1` で実装、`merges_stdout_and_stderr` テストで両方含まれることを確認。子プロセス側の `stderr(Stdio::null())` も整合。
+- **`ToolOutput` summary（コマンド + exit code）+ content（出力）**: 満たされている。`bash.rs:164-175` で exit 0 / 非 0 / シグナルを区別。content が空のときは `None` を返しており、`SUMMARY_THRESHOLD` を意識した良い実装。
+
+### アーキテクチャ・スコープ
+
+- **層分離**: `tools` クレート内に閉じており、`llm-worker` を低レベル基盤に保つ方針と整合（`bash.rs:20` で `Tool` trait のみ依存）。`builtin_tools()` のファクトリ列に追加するだけで、層を跨ぐ侵入はない。
+- **クレート命名/構造**: `bash.rs` を独立モジュールに切り出し、`lib.rs` で `pub use bash::bash_tool` のみ公開。`read.rs/write.rs/...` と一貫。
+- **依存追加**: `Cargo.toml` の tokio features に `process`/`time`/`io-util`/`sync` を追加（`Cargo.toml:22`）。`tempfile` は既存。`cargo add` 経由前提のフィールド追加で違和感なし。
+- **Permission 層との関係**: ticket の前提通り、ScopedFs では保護せず Permission 層に委譲。`lib.rs:18-19` のドキュメントコメントで明示しており、設計意図は読み手に伝わる。
+- **設計判断 1（wrapper による pwd 取得）**: `cd` パースの脆さ（サブシェル、変数展開、関数定義内 `cd` 等）を回避できるので妥当。`exec` で bash 自体が置換されると wrapper が走らないが、`bash.rs:149-155` が「ファイル読めなければ pwd 据え置き」とフェイルソフトしておりロバスト。
+- **設計判断 2（wrapper の `wait`）**: `(sleep 0.05; echo bg) &` のようなジョブで stdout が EOF せずハングする問題に対する実装上必須の対処。`background_job_does_not_hang` で回帰防止済み。
+- **設計判断 3（`tokio::sync::Mutex` で逐次化）**: pwd の共有可変状態と「順序のある shell セッション」の意味論を考えると正解。長時間コマンドの間 lock を握り続けるのは仕様上自然（同一セッションの bash は元々直列）。
+- **設計判断 4（256KB cap）**: worker 側 `ToolOutputLimits` の手前で OOM を抑える二重防壁。truncated marker の追記後に `String::from_utf8_lossy` で UTF-8 化しており、マルチバイト切断もロスレスではないが panic はしない。妥当。
+- **設計判断 5（summary/content）**: 既存ツールと API 形状が一致。`SUMMARY_THRESHOLD` の境界も意識されている。
+- **設計判断 6（description のプロンプト誘導）**: Read/Write/Edit/Glob/Grep を優先させる文言は、Claude Code リファレンスとも整合し、ローカルモデルでも効きやすい簡潔さ。
+
+### 指摘事項
+
+#### Non-blocking / Follow-up
+
+- **TUI 側の `render_default` 修正の同梱について** (`crates/tui/src/tool.rs:590-619`)
+  - 内容としては正しいバグ修正。Bash のような汎用ツールが Detail モードでも summary しか出ない状態を解消している。
+  - ただし、厳密には Bash チケットの範囲外（既存の任意の "default 経路の" ツールに同じ問題があったはず）。同梱の妥当性: Bash 投入によりバグが顕在化したこと、5 行程度の置き換えで完結すること、Bash 単体だと UX として未完であることを踏まえれば現実的な判断と言える。次回同種の状況では、TUI 表示仕様の修正として別チケットを切るほうがレビュー単位がきれいになる、というレベル。
+  - フォローアップ提案: `crates/tui/` 配下に `output` を含むレンダリングが Detail/Normal で正しく出ることを確認するスナップショット/ユニットテストを 1 本追加すると、将来の `summary` フォールバック方向への意図しない退行を防げる（現状はロジックレビューのみで担保）。
+
+- **`docs/ref/claude-code-deferred-tools.md` への追記**: Bash 実装と直接関係しない文献参照の追加（Anthropic vs OpenAI 比較への言及）。1 段落で軽微とはいえ、チケットスコープからは外れている。次回はドキュメント更新も別コミット/別チケット推奨。
+
+- **pwd 更新の堅牢性についての観察 (`bash.rs:149-155`)**: ユーザーコマンドが `exec some-program` で bash を置換した場合や、wrapper の `pwd > tempfile` がディスクフル等で失敗した場合に pwd が据え置かれる挙動になっている。仕様上は妥当だが、ユーザー視点では「`cd foo && exec bar` 後に `cd` が消えた」ように見える可能性がある。コメントで現挙動の合理性は説明されているので blocking ではないが、将来 Permission 層導入時にエッジケースとして再考の余地あり。
+
+#### Nits
+
+- `BashParams` の `timeout` フィールドが `Option<u64>` で `#[serde(default)]` だが、`Option` は serde が自動的に欠落を `None` にするため `#[serde(default)]` は冗長（害はない）。
+- `bash.rs:111-112` の `let mut child = child; let mut stdout = stdout;` は `async move` ブロックで mutable に再束縛しているだけ。慣用的だが `let mut` を引数側で書いてもよい。スタイル差。
+
+### 判断
+
+**Approve with follow-up** — チケット要件は完全に満たされており、設計判断もすべて合理的に説明されている。テストカバレッジ (8 unit + 1 integration) も妥当。同梱されている TUI 修正は実害のあるバグ修正で内容は正しいが、本来は別チケット相当のスコープ越えがあり、回帰テストの追加は次回までのフォローアップとして残しておくとよい。
+
+---
+
+## Round 2 (再レビュー: 2026-05-01)
+
+### 主な変更点
+
+1. 256KB byte-cap → **80 行 / 12 KiB を超えたら `<runtime_dir>/<pod_name>/bash-output/bash-XXX.log` に退避、tail 80 行 + パスを返す**
+2. spill ファイルを Read で読めるようにするため、`Scope::deny_rules()` を `allow_rules()` の対称として追加。controller では memory 流儀（`ScopeConfig` を組み立てて `Scope::from_config` で rebuild）に統一
+3. **pwd 永続を撤廃**（stateless 化）。各呼び出しは workspace root からスタート。tokio の `sync` feature 削除。`cd_persists_across_calls` → `cd_does_not_persist_across_calls` にテスト反転
+4. TUI `render_default` の修正は前回と同じまま同梱（回帰テストはまだ）
+
+### 前提・要件の再確認（仕様変更分）
+
+- **要件「作業ディレクトリの永続」が撤回されたこと**: チケット本文 (`tickets/bash-tool.md:13`) には依然「作業ディレクトリの永続（ツール内部で `pwd` 状態を保持、`cd` で変更可能）」と書かれている。実装は stateless に切り替わっており、要件文と実装が乖離している。Claude Code 互換のために stateless が正しい判断という認識には同意するが、**チケット本文の更新がない点はチケットライフサイクルの観点で要修正**（`b. 詳細化や前提の変化` 段に該当）。
+- **stdout/stderr 結合と timeout/exit handling**: 不変。`merges_stdout_and_stderr` / `nonzero_exit_is_reported` / `timeout_kills_long_command` で担保。
+- **長い出力の扱い**: 新仕様（80 行 / 12 KiB 閾値、tail 返却 + spill パス通知）は description (`bash.rs:42-45`) でモデルに伝達済み。`long_output_spills_and_returns_tail` / `wide_short_output_still_spills_when_byte_budget_exceeded` で行ベース・バイトベース双方の閾値を確認。
+- **spill ファイルの可視性**: `bash_spilled_file_is_readable_via_read_tool` (integration:353) で「spill 経由で書かれたファイルが Read ツールで読める」E2E が確認できている。Worker の 16 KiB cap で tail が落ちる問題への対処として筋の通った再設計。
+
+### 設計判断の検証
+
+#### (1) 二重防衛 cleanup（BashTool::Drop + RuntimeDir::Drop）
+
+- **適切。** 役割分離が明快:
+  - `BashTool::Drop` (`bash.rs:92-100`) — セッション終了時、その session が積んだ spill 群を削除。controller 上で worker/tools が落ちる経路に乗る。
+  - `RuntimeDir::Drop` (`runtime/dir.rs:103-107`) — pod 終了時、`bash-output/` ディレクトリごと一掃。`SocketServer`、status/history 等と同列の sweeper として機能。
+- 二段にする必要性も理屈が通る: BashTool だけだと「クラッシュや中断で Drop が走らなかった残骸」が残り続ける可能性があるが、RuntimeDir 全消しが最後の保険になる。逆に RuntimeDir だけだと「同一 pod 内で session を多数立ち上げ続けた場合に bash-output が肥大化する」リスクがあるが、BashTool::Drop が切り詰める。
+- 設計コメント (`bash.rs:86-88`) で「session 終了時の遅延 cleanup」だと明示しているのも良い。
+
+#### (2) `Scope::deny_rules()` 追加と memory 流儀への統一
+
+- **対称性として自然。** `allow_rules()` (`scope.rs:148-157`) と `deny_rules()` (`scope.rs:165-174`) は完全な双子で、ResolvedRule から ScopeRule への射影がペアで揃った形。
+- 当初考えていた `Scope::with_extra_read(self, PathBuf) -> Self` を撤回したのは正解。`with_extra_*` 系の単機能 API はパス追加のたびに増える危険があり、`ScopeConfig` 経由のラウンドトリップに比べて表現力で劣る。汎用 accessor を 1 本足すだけのほうが余計な API を呼ばない（YAGNI に沿う）。
+- controller での組み立て方 (`controller.rs:245-255`) は memory の `build_scope_with_memory` (`pod.rs:2030-2037`) と同型で、コメント (`controller.rs:236-237`) で「memory が取っているのと同じアプローチ」と明示しているので意図が伝わりやすい。
+- `Scope::summary()` (`scope.rs:198-230`) が deny を表示しないのと対称に、`deny_rules()` は明示的にプログラム的取得を許す形になっており、API 設計の意図が一貫している。
+
+#### (3) stateless 化 + `BashTool.cwd: PathBuf` 不変フィールド
+
+- **stateless 化自体は正しい判断。** Claude Code 仕様への寄せが理にかなっており、(a) tokio の `sync` feature が落ちて concurrent 実行が可能になる、(b) wrapper から pwd marker tempfile + `pwd > file` 行が消えてシンプル化、(c) サブシェル / 関数定義内 `cd` のような pwd 抽出のエッジケースが消える、と複数の便益がある。
+- **`cwd: PathBuf` 不変フィールドの妥当性**: 現状の実装で十分。ご質問の「`fs` ごと持って `fs.pwd()` を毎回参照すべきか」については以下の理由で「**現状のスナップショットで OK**」と判断する:
+  - `ScopedFs::pwd()` は pod-lifetime の不変値で、`ScopedFs::new(scope, pwd)` で構築後に変わらない。`fs` をフィールドとして持っても毎回同じ値が返る。
+  - `BashTool` は子プロセスで fs を直接触らない（ScopedFs バイパス）ので、`fs` を参照する用事はない。pwd だけが必要。
+  - 不要なフィールドを抱えるとライフタイムや clone コストが増える。`PathBuf` は単純で読みやすい。
+  - 将来 `Bash` を ScopedFs 経由に切り替える計画があれば `fs` を持つ必要が出てくるが、ticket の前提（Permission 層に委譲）を覆さない限り発生しない。
+- 一点だけ気になるとすれば、`bash.rs:78-81` のコメントで「`ScopedFs::pwd()` の snapshot」と説明している割に factory (`bash.rs:330-344`) では `fs.pwd().to_path_buf()` を呼ぶだけで、`fs` 自体は捨てているので、もし将来 Permission 層連携で `fs` が要るなら API 形状が変わる。今は気にしなくていい。
+
+#### (4) TUI 修正の扱い
+
+- **前回と完全に同一。回帰テスト追加もなし。** 前回 follow-up の 1 つ目「`crates/tui/` 配下に snapshot/unit test を 1 本」は未対応。
+- 判断: **後段送り（別チケット）が妥当**。今回の追加変更が TUI には触れていない以上、再度 blocking 化するのは過剰。Bash チケットを閉じて、TUI の `render_default` 仕様確認を別チケットに切り出すほうが履歴が綺麗。
+- ただし「別チケット化を約束する」のは現実的に flaky なので、`TODO.md` への 1 行追加（例: `- TUI tool render_default の output 表示に対する回帰テスト`）または専用チケット作成を Round 2 完了条件に組み込むことを推奨。
+
+### 新規指摘事項
+
+#### Blocking
+
+なし。
+
+#### Non-blocking / Follow-up
+
+- **チケット本文と実装の乖離 (`tickets/bash-tool.md:13`)**: 「作業ディレクトリの永続」と書かれているが実装は stateless。Round 2 で意識的に撤回された変更なので、ticket 本文を更新してから完了に進むのが筋。代替の表記例:
+  - `作業ディレクトリ: 各呼び出しは workspace root から開始（stateless、Claude Code 互換）。複数ディレクトリで作業するときは "cd <dir> && cmd" でチェイン`
+- **`crates/tools/src/lib.rs:51` の broken intra-doc link**: `[\`manifest::Scope::with_extra_read\`]` を残したまま。`with_extra_read` は撤回されて存在しない。`cargo doc -p tools --no-deps` で warning が出る:
+  ```
+  warning: unresolved link to `manifest::Scope::with_extra_read`
+   --> crates/tools/src/lib.rs:51:12
+  ```
+  実害は doc 生成警告のみだが、撤回した API のドキュメント参照が残るのは将来の混乱の元。`(see [\`manifest::Scope::deny_rules\`] / [\`Scope::from_config\`])` 等に書き直すか、リンクを外して文章だけ残すのが妥当。
+- **TUI `render_default` 回帰テスト未追加**: Round 1 から持ち越し。前述の通り別チケット化推奨。
+- **チケット範囲外の `docs/ref/claude-code-deferred-tools.md` 追記**: Round 1 と同じ。Round 2 では新たな逸脱はなし。
+- **`exec` 系の堅牢性ノート (Round 1 持ち越し)**: stateless 化により pwd 更新の問題が消えたので、Round 1 で挙げた「`cd foo && exec bar` で pwd が消える」観察事項は **解消**。次回 Permission 層導入時のエッジケースとしてのみ意識すれば足りる。
+
+#### Nits
+
+- **`BashParams.timeout` の `#[serde(default)]`**: Round 1 と同じく冗長（害はない）。
+- **`bash.rs:151-163` の child エラー早期 return パス**: `output_path` の cleanup を 3 箇所で重複 (`spawn` 失敗、`wait` 失敗、timeout でファイルサイズ 0)。1 つの helper に括れるが、エラーハンドラの分岐が異なる（`?` で抜けるかそうでないか）ので無理に統一する必要はない。Mio スタイルが統一されていれば良い、というレベル。
+- **`shell_single_quote`** (`bash.rs:319-322`): 現実的にはここに来る出力パスは tempfile が生成した英数字パスで、`'` を含むことは無いに等しい。とはいえ防御として正しい実装で、コストもゼロ。
+
+### 判断
+
+**Approve with follow-up** — Round 2 の主要変更（spill 退避、`deny_rules()`、stateless 化、cleanup 二重化）はいずれも設計として筋が良く、テストカバレッジ（unit 10 / integration 14 / edge_cases 9 全 pass、`cargo check --workspace` クリーン、TUI 55/55 pass）で動作も担保されている。残作業は (a) チケット本文の文言を stateless に合わせて更新、(b) `lib.rs:51` の broken doc link を修正、(c) TUI の回帰テストは別チケットへ切り出し — の 3 点。いずれも軽微で、(a)(b) を Round 2 完了の条件として処理すれば本チケット自体はクローズ可能。
--- a/tickets/memory-gc.md
+++ b/tickets/memory-gc.md
@ -1,61 +0,0 @@
-# メモリ機構: GC（定期再評価）
-
-## 背景
-
-`docs/plan/memory.md` §GC の実装。Phase 2 は情報統合寄りだが、それでも残る重要度低下・類似 slug 乱立・`replaced` 滞留・sources 累積・現状不整合を整理する定期再評価経路。人間 offer はかけず、結果は git diff で検証する建て付け。
-
-保護閾値は使用頻度メトリクスの明示 invoke frequency に依存する。
-
-## 要件
-
-### Trigger
-
- 定期実行（累積 input token ベース推奨、具体値は設定で tune、実装判断）
-
-### 実行主体と渡すツール
-
- GC Agent が spawn
- Phase 2 と同じ汎用 CRUD + 検索ツール + post-write Linter Hook
- 入力: GC 対象 record 群 + Linter Warn + 使用頻度メトリクス + `replaced` chain + sources 過多情報
-
-### 操作粒度
-
- ファイル単位: 丸ごと drop / 複数ファイル merge / 1 ファイルの split
- ファイル内部分: 節・箇条の削除 or 圧縮、`sources` 古いエントリの trim
-
-### 評価カテゴリ
-
-`outdated`, `superseded`, `unused`, `noisy` を GC 理由の分類として使う。record に一律の「stale」フラグは付けない。drop / merge / split / trim / rewrite のどれを選ぶかをこのカテゴリで説明可能にする。
-
-### 判断ルール
-
- 保護閾値: **明示 invoke** の `frequency >= 1.0 invokes/Mtoken` の record は drop / 大幅圧縮の対象外
-  - 初期値 1.0、workspace 設定でカスタマイズ可
-  - `model_invokation` 注入による常駐は計数対象外（別指標として参照のみ）
- 単一 record が複数カテゴリに該当してもよい
- 直接削除してよいが、誤判定しやすいものは merge / trim を優先（prompt 側で誘導）
-
-### prompt
-
- `docs/plan/memory-prompts.md` §GC prompt に従う
-
-## 範囲外
-
- 監査 LLM 層（将来検討）
- Vector index / FTS5 導入（将来検討、GC 判断には影響しない）
- Workflow の GC 対象化（初期は触らない）
-
-## 完了条件
-
- Trigger で GC Agent が走り、不要 record が整理される
- Linter Warn で検出した類似 slug 乱立などが GC でまとめて収斂する
- 保護閾値超過 record が drop / 大幅圧縮から外れる
- 置き換えは `status: replaced` + `replaced_by` で残り、直接削除と区別可能
- git diff で drop / merge / split / trim / rewrite の理由が読める
-
-## 参照
-
- `docs/plan/memory.md` §GC / §使用頻度メトリクス / §判断ルール
- `docs/plan/memory-prompts.md` §GC prompt
- `tickets/memory-phase2-consolidation.md`（ツール構成の共通化）
- `tickets/memory-usage-metrics.md`（保護閾値の依存）
--- a/tickets/memory-phase2-consolidation.md
+++ b/tickets/memory-phase2-consolidation.md
@ -1,17 +1,19 @@
-# メモリ機構: Phase 2 consolidation
+# メモリ機構: Phase 2 consolidation + 整理

 ## 背景

-`docs/plan/memory.md` §Phase 2 の実装。staging の活動ログ + 既存 `memory/*` + Knowledge 化候補レポートを入力に、consolidation Worker が汎用 CRUD + 検索ツール + Linter Hook で agentic に統合する。Phase 1 を終えた pod が spawn し、並走防止は staging 配下の進行状況ファイルで担保する。
+`docs/plan/memory.md` §自動化メカニズム / §整理（GC 相当）の扱い の実装。staging の活動ログ + 既存 `memory/*` + Knowledge 化候補レポート + 整理材料を入力に、consolidation Worker が **統合 phase（活動ログを memory/knowledge へ落とす）** と **整理 phase（既存 record の drop / merge / split / trim / rewrite）** を 1 セッション内で順に実行する。Phase 1 を終えた pod が spawn し、並走防止は staging 配下の進行状況ファイルで担保する。

-Knowledge 新規作成は「候補レポート掲載の source から派生する場合」に限る。使用頻度メトリクス（候補レポートの集計元）が未完のうちは、レポートは空入力として動作し、Phase 2 は decisions / requests / summary / 既存 Knowledge update のみ行う。
+整理 phase は Phase 2 とは別 trigger を持たない。同じ Worker 同じツール surface で済むため、別 Agent / 別 spawn 経路は設けない。
+
+Knowledge 新規作成は「候補レポート掲載の source から派生する場合」に限る。使用頻度メトリクス（候補レポートと保護閾値の集計元）が未完のうちは、レポートと整理材料は空入力として動作し、統合 phase は decisions / requests / summary / 既存 Knowledge update のみ、整理 phase は Linter Warn と `replaced` chain と sources 累積を見るに留まる（保護閾値による drop 抑制は metrics 完成後に有効化）。

 ## 要件

 ### Trigger

 - staging の累積ファイル数 or bytes 閾値（設定で tune）
- compact 発火時に必ず flush（compact で失われる raw を漏らさない）
+- compact 同期発火は持たない（raw 保全は Phase 1 が compact 前に走ることで担保される）

 ### 実行主体と入力

@ -21,6 +23,7 @@ Knowledge 新規作成は「候補レポート掲載の source から派生す
  - consumed ID 分の staging エントリ（活動ログ + `source`）
  - 既存 `memory/*`（summary / decisions / requests）全文
  - Knowledge 化候補レポート（メトリクスチケットの成果物。未完のうちは空）
+  - 整理材料（使用頻度メトリクス、Linter Warn、`replaced` chain、sources 過多情報。メトリクス未完のうちは Linter Warn / `replaced` / sources のみ）
  - 既存 `knowledge/*` は prompt に埋めず、Knowledge 検索ツール経由で agent が引く

 ### 渡すツール
@ -38,12 +41,23 @@ Knowledge 新規作成は「候補レポート掲載の source から派生す

 ### 処理内容

+#### 統合 phase
+
 - 新規 decisions / requests を 1 件 1 ファイルで追加、`sources` は staging の `source` をコピー（LLM 推論ではない）
 - 活動ログから派生する Knowledge を新規作成 or 既存 patch。**新規作成は候補レポート掲載の source 由来に限る**
 - summary を必要に応じて rewrite（1-5k tokens 目安）
- 削除は `status: replaced` + `replaced_by: <slug>` で置き換え記録、直接削除しない
+- Decision の置き換えは `status: replaced` + `replaced_by: <slug>`、直接削除しない
 - 書き込み先: `memory/*`, `knowledge/*`。`memory/workflow/` は Linter で弾かれる

+#### 整理 phase（統合 phase 完了後、余力で実行）
+
+- 既存 record 群を `outdated | superseded | unused | noisy` の評価カテゴリで分類
+- 操作粒度はファイル単位（drop / merge / split）とファイル内部分（節・箇条削除、`sources` 古いエントリ trim）
+- **保護閾値**: 明示 invoke `frequency >= 1.0 invokes/Mtoken` の record は drop / 大幅圧縮の対象外（初期値 1.0、workspace 設定でカスタマイズ可、metrics 未完のうちは閾値判定スキップで保守的に振る舞う）
+- 単一 record が複数カテゴリに該当してもよい
+- 直接削除してよいが、誤判定しやすいものは merge / trim を優先（prompt 側で誘導）
+- Linter Warn で検出した類似 slug 乱立 / sources 過多 / `replaced` 滞留はここで収斂
+
 ### 並走防止

 - staging 配下に 1 ファイル（Pod 識別子 + consumed ID list）
@ -59,28 +73,30 @@ Knowledge 新規作成は「候補レポート掲載の source から派生す

 ### prompt

- `docs/plan/memory-prompts.md` §共通原則 / §Phase 2: 統合 prompt / §Phase 2: Knowledge 書き込み prompt に従う
+- `docs/plan/memory-prompts.md` §共通原則 / §Phase 2: 統合 + 整理 prompt / §Phase 2: Knowledge 書き込み prompt に従う

 ## 範囲外

- 使用頻度メトリクスと Knowledge 化候補レポートの集計（別チケット。未完の間は空レポートで動作）
- GC（別チケット）
+- 使用頻度メトリクスと Knowledge 化候補レポート / 保護閾値の集計（別チケット。未完の間は空入力で動作）
 - Workflow 関連の offer（別チケット、Notification 経路が先）
 - 意味破壊検出の監査 LLM 層（将来検討）

 ## 完了条件

- Phase 1 が staging に残した活動ログを Phase 2 が `memory/*` / `knowledge/*` に統合する
+- Phase 1 が staging に残した活動ログを統合 phase が `memory/*` / `knowledge/*` に統合する
+- 統合 phase 完了後、整理 phase が同じ agent セッション内で続けて走り、既存 record を整理する
 - Linter 違反時は turn が戻り、sub-Worker が自己修正する
 - 並走防止ファイルが想定通り機能し、複数 Phase 2 の重複起動が防げる
 - Coalesce で実行中追加分が次回に引き継がれる
- compact 発火時に Phase 2 が flush される
- 空レポートでも新規 Knowledge を作らずに動く（decisions / requests / summary / 既存 Knowledge update のみ）
+- 空レポートでも新規 Knowledge を作らずに動く（統合は decisions / requests / summary / 既存 Knowledge update のみ）
+- 整理 phase は git diff で drop / merge / split / trim / rewrite の理由（評価カテゴリ）が読める形で記録される
+- 置き換えは `status: replaced` + `replaced_by` で残り、直接削除と区別可能

 ## 参照

- `docs/plan/memory.md` §Phase 2 / §Phase 2 agent への原則 / §Compact との関係
+- `docs/plan/memory.md` §自動化メカニズム / §整理（GC 相当）の扱い / §Compact との関係
 - `docs/plan/memory-prompts.md` §Phase 2 関連
 - `tickets/memory-file-format.md`（Linter）
 - `tickets/memory-search-tools.md`（検索ツール）
 - `tickets/memory-phase1-extract.md`（staging 生産）
+- `tickets/memory-usage-metrics.md`（候補レポート / 保護閾値の供給）
--- a/tickets/memory-usage-metrics.md
+++ b/tickets/memory-usage-metrics.md
@ -2,7 +2,7 @@

 ## 背景

-`docs/plan/memory.md` §使用頻度メトリクス の実装。memory 検索ツール / Knowledge 検索ツール内に invoke 計測フックを入れ、時間単位ではなく累積 input token で正規化した頻度を算出する。Phase 2 の Knowledge 新規作成 gate と GC の保護閾値の両方で使われる。
+`docs/plan/memory.md` §使用頻度メトリクス の実装。memory 検索ツール / Knowledge 検索ツール内に invoke 計測フックを入れ、時間単位ではなく累積 input token で正規化した頻度を算出する。Phase 2 統合 phase の Knowledge 新規作成 gate と Phase 2 整理 phase の保護閾値の両方で使われる。

 ## 要件

@ -36,12 +36,12 @@

 ### 消費者

- Phase 2 Worker の入力として候補レポートを渡す
- GC Agent の保護閾値判定（明示 invoke frequency >= 1.0 invokes/Mtoken）
+- Phase 2 Worker の統合 phase 入力として候補レポートを渡す
+- Phase 2 Worker の整理 phase で保護閾値判定（明示 invoke frequency >= 1.0 invokes/Mtoken）に使う

 ## 範囲外

- GC の実装本体（別チケット。本チケットは保護閾値判定に必要なメトリクスの提供まで）
+- Phase 2 整理 phase の実装本体（`memory-phase2-consolidation` 側。本チケットは保護閾値判定に必要なメトリクスの提供まで）
 - `model_invokation` ON/OFF の自動判定ロジック（将来検討）
 - Shallow request の自動除外（将来検討）

@ -57,4 +57,4 @@

 - `docs/plan/memory.md` §使用頻度メトリクス / §判断ルール / §retrieval 経路
 - `tickets/memory-search-tools.md`（hook 挿入点）
- `tickets/memory-phase2-consolidation.md`（消費者）
+- `tickets/memory-phase2-consolidation.md`（統合 / 整理 両 phase の消費者）
--- a/tickets/workflow.md
+++ b/tickets/workflow.md
@ -4,7 +4,7 @@

 `docs/plan/workflow.md` で決まった「制約付きの強制的な作業フロー」を `/<slug>` で呼び出せるようにする。Knowledge (`#<slug>`) を依存として inject できる経路を持つことで、procedural な能力を再利用可能な単位に固定する。

-memory 機構（`docs/plan/memory.md`）からは独立してスタートできる: Workflow は人間が書く / consolidation の offer 経由でしか作られず、自動書き込み禁止のため Phase 2 / GC の前提に依存しない。Knowledge resolver は `requires` の inject 経路として相互依存する。
+memory 機構（`docs/plan/memory.md`）からは独立してスタートできる: Workflow は人間が書く / consolidation の offer 経由でしか作られず、自動書き込み禁止のため Phase 2 の前提に依存しない。Knowledge resolver は `requires` の inject 経路として相互依存する。

 agent-skills (agentskills.io 形式) は本チケットの ingest 経路を再利用して Workflow として読み込む側になる（`tickets/agent-skills.md` 参照）。
Author	SHA1	Message	Date
Hare	0070aabd26	docs: memoryシステムの仕様変更と、動的Tool・VCSの話	2026-05-01 18:47:52 +09:00
Hare	3e2c9ee32b	bashツール一旦完了	2026-05-01 18:47:09 +09:00
Hare	97f9b14ceb	bashツール実装	2026-05-01 18:14:13 +09:00