llm_worker_rs/docs/research/2026-01-02-llm-streaming.md
2026-01-05 23:03:48 +09:00

3.6 KiB
Raw Blame History

2026-01-02 LLM Streaming & Hooks Research

Summary Table

Topic Key Takeaways Sources
Fine-grained tool streaming Anthropic beta header fine-grained-tool-streaming-2025-05-14 streams tool parameters without intermediate JSON validation; reduces latency but may emit invalid/partial JSON that callers must sanitize. Anthropic Docs Fine-grained tool streaming (https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/fine-grained-tool-streaming) [turn1search0]; AWS Bedrock Anthropic tool-use reference (https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages-tool-use.html) [turn1search5]; Anthropic release notes (https://docs.anthropic.com/en/release-notes/api) [turn1search3]
Streaming SSE events Anthropic SSE stream emits typed events (message_start, content_block_start, content_block_delta, etc.) that clients must map to internal state machines for deterministic playback. Anthropic Streaming Messages (https://docs.anthropic.com/en/docs/build-with-claude/streaming) [turn1search6]
Tool streaming ergonomics LangChain Anthropic integration exposes betas=["fine-grained-tool-streaming-2025-05-14"] and warns about invalid JSON, reinforcing need for resilient parsers. LangChain Anthropic integration guide (https://docs.langchain.com/oss/python/integrations/chat/anthropic) [turn1search8]
Hook architectures Claude Code hook lifecycle (SessionStart, UserPromptSubmit, Tool Use, etc.) keeps hooks non-blocking, context-injecting, and failure-isolated—useful template for worker hooks/routers. Claude-Mem hook architecture overview (https://docs.claude-mem.ai/hooks-architecture) [turn0search0]; Claude Blog configuring hooks (https://claude.com/blog/how-to-configure-hooks) [turn0search2]

Detailed Notes

Fine-grained Tool Streaming

  • Anthropics beta header fine-grained-tool-streaming-2025-05-14 enables parameter streams, shrinking first-byte latency from ~15s to ~3s in their example. Clients must prepare for partial JSON and wrap invalid payloads before echoing them back. [turn1search0]
  • AWS Bedrock mirrors the same header, confirming availability on Claude Sonnet 4.5/4 and Opus 4. Their docs explicitly caution about invalid/partial JSON and show the request schema. [turn1search5]
  • Anthropics June 11, 2025 release notes document the beta launch, signaling freshness and likely API stability requirements. [turn1search3]

SSE Event Model

  • Anthropic exposes SSE event names plus JSON type fields; implementers should parse events like content_block_start, content_block_delta, and message_stop to drive a deterministic timeline. This justifies a dedicated timeline router/state machine as in the spec. [turn1search6]

Tool Streaming Ergonomics in SDKs

  • LangChains Anthropic integration demonstrates how third-party SDKs surface the beta header and reiterates the need for error handling when incomplete JSON arrives because max_tokens can stop a stream mid-parameter. This informs library-level abstractions for scheme and llm_client layers. [turn1search8]

Hook / Lifecycle Design Patterns

  • Claude-Mems architecture treats hooks as lifecycle-triggered closures that must stay non-blocking, degrade gracefully, and respect security constraints (frozen configs, permission prompts). This maps closely to the proposed Tools/Hooks system. [turn0search0]
  • Claudes official hook configuration guide enumerates eight hook types (PreToolUse, PostToolUse, PermissionRequest, SessionStart, Stop, etc.) and their contexts, reinforcing the need for a trait/macro system to statically describe hooks and route events. [turn0search2]