4.9 KiB
4.9 KiB
2026-01-02 Provider Event & Tool Streaming Specs
Snapshot
| Provider | Stream Transport | Event/Deltas | Tool Call Streaming | References |
|---|---|---|---|---|
| OpenAI Responses API | SSE (stream=true) with typed semantic events |
response.created, response.output_text.delta/done, response.function_call_arguments.delta/done, response.refusal.*, response.code_interpreter.*, etc. |
Tool/function arguments stream as JSON string deltas; supports MCP tool calls and built-ins. | OpenAI Streaming guide & API reference (2025-12) |
| Anthropic Messages API | SSE (stream=true) enumerating message_* + content_block_* events |
Content blocks typed (text_delta, input_json_delta, thinking_delta, signature_delta, tool results). ping and error events possible. |
Fine-grained tool streaming via beta header anthropic-beta: fine-grained-tool-streaming-2025-05-14; tool inputs stream as partial_json. |
Anthropic Streaming docs (2025-08) + Fine-grained Tool Streaming beta (2025-05) |
| Google Gemini API | SSE via :streamGenerateContent?alt=sse returning repeated GenerateContentResponse objects |
Each chunk is a partial candidates[] payload; structured output streaming returns partial JSON; Live API adds WebSocket session events. |
Function declarations via tools.functionDeclarations. streamFunctionCallArguments=true yields partialArgs records with jsonPath + willContinue. |
Gemini REST docs + Function Calling guide + Vertex AI streaming args (updated 2025-12) |
OpenAI Responses API (2025-12 docs)
- Lifecycle events:
response.created,response.queued,response.in_progress,response.completed,response.failed,error. Each emitssequence_numberfor ordering. - Content events:
response.output_text.delta/done,response.output_text.annotation.added,response.refusal.delta/done, plusresponse.output_item.added/donefor multi-part outputs. - Tool events:
response.function_call_arguments.delta/done,response.file_search_call.*,response.code_interpreter_call.*,response.custom_tool_call_input.delta,response.mcp_call_arguments.delta/donefor MCP connectors. - Design takeaways: timeline layer must preserve ordering by
sequence_number, correlateitem_id/output_index, and buffer partial JSON until.done. Provide parsing helpers for MCP/custom tool deltas and code interpreter outputs to unify built-in + custom tools. - Primary sources: OpenAI Streaming Responses Guide + Streaming Events API reference (retrieved Jan 2 2026).
Anthropic Messages API
- Event flow: Always begins with
message_start, then repeated blocks of (content_block_start→content_block_delta* →content_block_stop), optionalmessage_delta, andmessage_stop.pinganderrorevents may interleave. - Content block delta types:
text_delta,input_json_delta(tool inputs),thinking_delta,signature_delta(integrity for thinking), plus specialized tool result payloads (e.g.,code_execution_tool_result). - Fine-grained tool streaming: Beta header
fine-grained-tool-streaming-2025-05-14streams tool parameters chunk-by-chunk without JSON validation; clients must handle invalid/partial JSON whenmax_tokensstop reason fires mid-argument. Recommended strategy: accumulate raw string, optionally wrap invalid JSON before echoing back. - Operational cautions: Tool input streaming may pause between keys (docs warn of delays). Provide timeouts/heartbeat detection for UIs (issue reports note >3 min stalls with no
ping). - Primary sources: Anthropic Streaming Messages guide & Fine-grained Tool Streaming beta docs (retrieved Jan 2 2026).
Google Gemini API
- REST streaming:
POST https://generativelanguage.googleapis.com/v1beta/models/{model}:streamGenerateContent?alt=ssereturns SSE where eachdata:line is a partialGenerateContentResponse(withcandidates,usageMetadata,modelVersion). Chrome DevDoc example shows chunkedtextfields. - Structured output streaming: When
response_mime_type=application/json, streamed chunks are valid partial JSON strings that concatenate to the final object. - Function calling: Tools defined via
functionDeclarations;function_calling_configmodesAUTO/ANY/NONE. Latest docs confirm JSON Schema + Pydantic/Zod compatibility (2025-12 update) and compositional calls. - Function call argument streaming: Set
toolConfig.functionCallingConfig.streamFunctionCallArguments=true; responses includefunctionCall.partialArgswithjsonPath,deltavalues (string/number/bool/null) andwillContinue. Vertex AI content schema documentsPartialArgmessage for decoding. - Transports: SSE for REST, WebSockets for Live API sessions (bidirectional audio/text). Need to abstract both.
- Primary sources: Gemini API overview + streamGenerateContent reference + function calling guide + Vertex AI streaming arguments doc + Chrome streaming example (retrieved Jan 2 2026).