llm_worker_rs/docs/research/2026-01-02-llm-streaming.md

# 2026-01-02 LLM Streaming & Hooks Research

## Summary Table
| Topic | Key Takeaways | Sources |
| --- | --- | --- |
| Fine-grained tool streaming | Anthropic beta header `fine-grained-tool-streaming-2025-05-14` streams tool parameters without intermediate JSON validation; reduces latency but may emit invalid/partial JSON that callers must sanitize. | Anthropic Docs – Fine-grained tool streaming (https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/fine-grained-tool-streaming) [turn1search0]; AWS Bedrock Anthropic tool-use reference (https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-tool-use.html) [turn1search5]; Anthropic release notes (https://docs.anthropic.com/en/release-notes/api) [turn1search3]
| Streaming SSE events | Anthropic SSE stream emits typed events (`message_start`, `content_block_start`, `content_block_delta`, etc.) that clients must map to internal state machines for deterministic playback. | Anthropic Streaming Messages (https://docs.anthropic.com/en/docs/build-with-claude/streaming) [turn1search6]
| Tool streaming ergonomics | LangChain Anthropic integration exposes `betas=["fine-grained-tool-streaming-2025-05-14"]` and warns about invalid JSON, reinforcing need for resilient parsers. | LangChain Anthropic integration guide (https://docs.langchain.com/oss/python/integrations/chat/anthropic) [turn1search8]
| Hook architectures | Claude Code hook lifecycle (SessionStart, UserPromptSubmit, Tool Use, etc.) keeps hooks non-blocking, context-injecting, and failure-isolated—useful template for worker hooks/routers. | Claude-Mem hook architecture overview (https://docs.claude-mem.ai/hooks-architecture) [turn0search0]; Claude Blog – configuring hooks (https://claude.com/blog/how-to-configure-hooks) [turn0search2]

## Detailed Notes

### Fine-grained Tool Streaming
- Anthropic’s beta header `fine-grained-tool-streaming-2025-05-14` enables parameter streams, shrinking first-byte latency from ~15s to ~3s in their example. Clients must prepare for partial JSON and wrap invalid payloads before echoing them back. [turn1search0]
- AWS Bedrock mirrors the same header, confirming availability on Claude Sonnet 4.5/4 and Opus 4. Their docs explicitly caution about invalid/partial JSON and show the request schema. [turn1search5]
- Anthropic’s June 11, 2025 release notes document the beta launch, signaling freshness and likely API stability requirements. [turn1search3]

### SSE Event Model
- Anthropic exposes SSE event names plus JSON `type` fields; implementers should parse events like `content_block_start`, `content_block_delta`, and `message_stop` to drive a deterministic timeline. This justifies a dedicated timeline router/state machine as in the spec. [turn1search6]

### Tool Streaming Ergonomics in SDKs
- LangChain’s Anthropic integration demonstrates how third-party SDKs surface the beta header and reiterates the need for error handling when incomplete JSON arrives because `max_tokens` can stop a stream mid-parameter. This informs library-level abstractions for `scheme` and `llm_client` layers. [turn1search8]

### Hook / Lifecycle Design Patterns
- Claude-Mem’s architecture treats hooks as lifecycle-triggered closures that must stay non-blocking, degrade gracefully, and respect security constraints (frozen configs, permission prompts). This maps closely to the proposed `Tools/Hooks` system. [turn0search0]
- Claude’s official hook configuration guide enumerates eight hook types (PreToolUse, PostToolUse, PermissionRequest, SessionStart, Stop, etc.) and their contexts, reinforcing the need for a trait/macro system to statically describe hooks and route events. [turn0search2]