32 lines
4.9 KiB
Markdown
32 lines
4.9 KiB
Markdown
# 2026-01-02 Provider Event & Tool Streaming Specs
|
|
|
|
## Snapshot
|
|
| Provider | Stream Transport | Event/Deltas | Tool Call Streaming | References |
|
|
| --- | --- | --- | --- | --- |
|
|
| OpenAI Responses API | SSE (`stream=true`) with typed semantic events | `response.created`, `response.output_text.delta/done`, `response.function_call_arguments.delta/done`, `response.refusal.*`, `response.code_interpreter.*`, etc. | Tool/function arguments stream as JSON string deltas; supports MCP tool calls and built-ins. | OpenAI Streaming guide & API reference (2025-12) |
|
|
| Anthropic Messages API | SSE (`stream=true`) enumerating `message_*` + `content_block_*` events | Content blocks typed (`text_delta`, `input_json_delta`, `thinking_delta`, `signature_delta`, tool results). `ping` and `error` events possible. | Fine-grained tool streaming via beta header `anthropic-beta: fine-grained-tool-streaming-2025-05-14`; tool inputs stream as `partial_json`. | Anthropic Streaming docs (2025-08) + Fine-grained Tool Streaming beta (2025-05) |
|
|
| Google Gemini API | SSE via `:streamGenerateContent?alt=sse` returning repeated `GenerateContentResponse` objects | Each chunk is a partial `candidates[]` payload; structured output streaming returns partial JSON; Live API adds WebSocket session events. | Function declarations via `tools.functionDeclarations`. `streamFunctionCallArguments=true` yields `partialArgs` records with `jsonPath` + `willContinue`. | Gemini REST docs + Function Calling guide + Vertex AI streaming args (updated 2025-12) |
|
|
|
|
## OpenAI Responses API (2025-12 docs)
|
|
- **Lifecycle events**: `response.created`, `response.queued`, `response.in_progress`, `response.completed`, `response.failed`, `error`. Each emits `sequence_number` for ordering.
|
|
- **Content events**: `response.output_text.delta/done`, `response.output_text.annotation.added`, `response.refusal.delta/done`, plus `response.output_item.added/done` for multi-part outputs.
|
|
- **Tool events**: `response.function_call_arguments.delta/done`, `response.file_search_call.*`, `response.code_interpreter_call.*`, `response.custom_tool_call_input.delta`, `response.mcp_call_arguments.delta/done` for MCP connectors.
|
|
- **Design takeaways**: timeline layer must preserve ordering by `sequence_number`, correlate `item_id`/`output_index`, and buffer partial JSON until `.done`. Provide parsing helpers for MCP/custom tool deltas and code interpreter outputs to unify built-in + custom tools.
|
|
- **Primary sources**: OpenAI Streaming Responses Guide + Streaming Events API reference (retrieved Jan 2 2026).
|
|
|
|
## Anthropic Messages API
|
|
- **Event flow**: Always begins with `message_start`, then repeated blocks of (`content_block_start` → `content_block_delta`* → `content_block_stop`), optional `message_delta`, and `message_stop`. `ping` and `error` events may interleave.
|
|
- **Content block delta types**: `text_delta`, `input_json_delta` (tool inputs), `thinking_delta`, `signature_delta` (integrity for thinking), plus specialized tool result payloads (e.g., `code_execution_tool_result`).
|
|
- **Fine-grained tool streaming**: Beta header `fine-grained-tool-streaming-2025-05-14` streams tool parameters chunk-by-chunk without JSON validation; clients must handle invalid/partial JSON when `max_tokens` stop reason fires mid-argument. Recommended strategy: accumulate raw string, optionally wrap invalid JSON before echoing back.
|
|
- **Operational cautions**: Tool input streaming may pause between keys (docs warn of delays). Provide timeouts/heartbeat detection for UIs (issue reports note >3 min stalls with no `ping`).
|
|
- **Primary sources**: Anthropic Streaming Messages guide & Fine-grained Tool Streaming beta docs (retrieved Jan 2 2026).
|
|
|
|
## Google Gemini API
|
|
- **REST streaming**: `POST https://generativelanguage.googleapis.com/v1beta/models/{model}:streamGenerateContent?alt=sse` returns SSE where each `data:` line is a partial `GenerateContentResponse` (with `candidates`, `usageMetadata`, `modelVersion`). Chrome DevDoc example shows chunked `text` fields.
|
|
- **Structured output streaming**: When `response_mime_type=application/json`, streamed chunks are valid partial JSON strings that concatenate to the final object.
|
|
- **Function calling**: Tools defined via `functionDeclarations`; `function_calling_config` modes `AUTO/ANY/NONE`. Latest docs confirm JSON Schema + Pydantic/Zod compatibility (2025-12 update) and compositional calls.
|
|
- **Function call argument streaming**: Set `toolConfig.functionCallingConfig.streamFunctionCallArguments=true`; responses include `functionCall.partialArgs` with `jsonPath`, `delta` values (string/number/bool/null) and `willContinue`. Vertex AI content schema documents `PartialArg` message for decoding.
|
|
- **Transports**: SSE for REST, WebSockets for Live API sessions (bidirectional audio/text). Need to abstract both.
|
|
- **Primary sources**: Gemini API overview + streamGenerateContent reference + function calling guide + Vertex AI streaming arguments doc + Chrome streaming example (retrieved Jan 2 2026).
|
|
|