# 2026-01-02 Provider Event & Tool Streaming Specs ## Snapshot | Provider | Stream Transport | Event/Deltas | Tool Call Streaming | References | | --- | --- | --- | --- | --- | | OpenAI Responses API | SSE (`stream=true`) with typed semantic events | `response.created`, `response.output_text.delta/done`, `response.function_call_arguments.delta/done`, `response.refusal.*`, `response.code_interpreter.*`, etc. | Tool/function arguments stream as JSON string deltas; supports MCP tool calls and built-ins. | OpenAI Streaming guide & API reference (2025-12) | | Anthropic Messages API | SSE (`stream=true`) enumerating `message_*` + `content_block_*` events | Content blocks typed (`text_delta`, `input_json_delta`, `thinking_delta`, `signature_delta`, tool results). `ping` and `error` events possible. | Fine-grained tool streaming via beta header `anthropic-beta: fine-grained-tool-streaming-2025-05-14`; tool inputs stream as `partial_json`. | Anthropic Streaming docs (2025-08) + Fine-grained Tool Streaming beta (2025-05) | | Google Gemini API | SSE via `:streamGenerateContent?alt=sse` returning repeated `GenerateContentResponse` objects | Each chunk is a partial `candidates[]` payload; structured output streaming returns partial JSON; Live API adds WebSocket session events. | Function declarations via `tools.functionDeclarations`. `streamFunctionCallArguments=true` yields `partialArgs` records with `jsonPath` + `willContinue`. | Gemini REST docs + Function Calling guide + Vertex AI streaming args (updated 2025-12) | ## OpenAI Responses API (2025-12 docs) - **Lifecycle events**: `response.created`, `response.queued`, `response.in_progress`, `response.completed`, `response.failed`, `error`. Each emits `sequence_number` for ordering. - **Content events**: `response.output_text.delta/done`, `response.output_text.annotation.added`, `response.refusal.delta/done`, plus `response.output_item.added/done` for multi-part outputs. - **Tool events**: `response.function_call_arguments.delta/done`, `response.file_search_call.*`, `response.code_interpreter_call.*`, `response.custom_tool_call_input.delta`, `response.mcp_call_arguments.delta/done` for MCP connectors. - **Design takeaways**: timeline layer must preserve ordering by `sequence_number`, correlate `item_id`/`output_index`, and buffer partial JSON until `.done`. Provide parsing helpers for MCP/custom tool deltas and code interpreter outputs to unify built-in + custom tools. - **Primary sources**: OpenAI Streaming Responses Guide + Streaming Events API reference (retrieved Jan 2 2026). ## Anthropic Messages API - **Event flow**: Always begins with `message_start`, then repeated blocks of (`content_block_start` → `content_block_delta`* → `content_block_stop`), optional `message_delta`, and `message_stop`. `ping` and `error` events may interleave. - **Content block delta types**: `text_delta`, `input_json_delta` (tool inputs), `thinking_delta`, `signature_delta` (integrity for thinking), plus specialized tool result payloads (e.g., `code_execution_tool_result`). - **Fine-grained tool streaming**: Beta header `fine-grained-tool-streaming-2025-05-14` streams tool parameters chunk-by-chunk without JSON validation; clients must handle invalid/partial JSON when `max_tokens` stop reason fires mid-argument. Recommended strategy: accumulate raw string, optionally wrap invalid JSON before echoing back. - **Operational cautions**: Tool input streaming may pause between keys (docs warn of delays). Provide timeouts/heartbeat detection for UIs (issue reports note >3 min stalls with no `ping`). - **Primary sources**: Anthropic Streaming Messages guide & Fine-grained Tool Streaming beta docs (retrieved Jan 2 2026). ## Google Gemini API - **REST streaming**: `POST https://generativelanguage.googleapis.com/v1beta/models/{model}:streamGenerateContent?alt=sse` returns SSE where each `data:` line is a partial `GenerateContentResponse` (with `candidates`, `usageMetadata`, `modelVersion`). Chrome DevDoc example shows chunked `text` fields. - **Structured output streaming**: When `response_mime_type=application/json`, streamed chunks are valid partial JSON strings that concatenate to the final object. - **Function calling**: Tools defined via `functionDeclarations`; `function_calling_config` modes `AUTO/ANY/NONE`. Latest docs confirm JSON Schema + Pydantic/Zod compatibility (2025-12 update) and compositional calls. - **Function call argument streaming**: Set `toolConfig.functionCallingConfig.streamFunctionCallArguments=true`; responses include `functionCall.partialArgs` with `jsonPath`, `delta` values (string/number/bool/null) and `willContinue`. Vertex AI content schema documents `PartialArg` message for decoding. - **Transports**: SSE for REST, WebSockets for Live API sessions (bidirectional audio/text). Need to abstract both. - **Primary sources**: Gemini API overview + streamGenerateContent reference + function calling guide + Vertex AI streaming arguments doc + Chrome streaming example (retrieved Jan 2 2026).