llm_worker_rs/docs/research/2026-01-02-provider-event-specs.md
2026-01-05 23:03:48 +09:00

4.9 KiB

2026-01-02 Provider Event & Tool Streaming Specs

Snapshot

Provider Stream Transport Event/Deltas Tool Call Streaming References
OpenAI Responses API SSE (stream=true) with typed semantic events response.created, response.output_text.delta/done, response.function_call_arguments.delta/done, response.refusal.*, response.code_interpreter.*, etc. Tool/function arguments stream as JSON string deltas; supports MCP tool calls and built-ins. OpenAI Streaming guide & API reference (2025-12)
Anthropic Messages API SSE (stream=true) enumerating message_* + content_block_* events Content blocks typed (text_delta, input_json_delta, thinking_delta, signature_delta, tool results). ping and error events possible. Fine-grained tool streaming via beta header anthropic-beta: fine-grained-tool-streaming-2025-05-14; tool inputs stream as partial_json. Anthropic Streaming docs (2025-08) + Fine-grained Tool Streaming beta (2025-05)
Google Gemini API SSE via :streamGenerateContent?alt=sse returning repeated GenerateContentResponse objects Each chunk is a partial candidates[] payload; structured output streaming returns partial JSON; Live API adds WebSocket session events. Function declarations via tools.functionDeclarations. streamFunctionCallArguments=true yields partialArgs records with jsonPath + willContinue. Gemini REST docs + Function Calling guide + Vertex AI streaming args (updated 2025-12)

OpenAI Responses API (2025-12 docs)

  • Lifecycle events: response.created, response.queued, response.in_progress, response.completed, response.failed, error. Each emits sequence_number for ordering.
  • Content events: response.output_text.delta/done, response.output_text.annotation.added, response.refusal.delta/done, plus response.output_item.added/done for multi-part outputs.
  • Tool events: response.function_call_arguments.delta/done, response.file_search_call.*, response.code_interpreter_call.*, response.custom_tool_call_input.delta, response.mcp_call_arguments.delta/done for MCP connectors.
  • Design takeaways: timeline layer must preserve ordering by sequence_number, correlate item_id/output_index, and buffer partial JSON until .done. Provide parsing helpers for MCP/custom tool deltas and code interpreter outputs to unify built-in + custom tools.
  • Primary sources: OpenAI Streaming Responses Guide + Streaming Events API reference (retrieved Jan 2 2026).

Anthropic Messages API

  • Event flow: Always begins with message_start, then repeated blocks of (content_block_startcontent_block_delta* → content_block_stop), optional message_delta, and message_stop. ping and error events may interleave.
  • Content block delta types: text_delta, input_json_delta (tool inputs), thinking_delta, signature_delta (integrity for thinking), plus specialized tool result payloads (e.g., code_execution_tool_result).
  • Fine-grained tool streaming: Beta header fine-grained-tool-streaming-2025-05-14 streams tool parameters chunk-by-chunk without JSON validation; clients must handle invalid/partial JSON when max_tokens stop reason fires mid-argument. Recommended strategy: accumulate raw string, optionally wrap invalid JSON before echoing back.
  • Operational cautions: Tool input streaming may pause between keys (docs warn of delays). Provide timeouts/heartbeat detection for UIs (issue reports note >3 min stalls with no ping).
  • Primary sources: Anthropic Streaming Messages guide & Fine-grained Tool Streaming beta docs (retrieved Jan 2 2026).

Google Gemini API

  • REST streaming: POST https://generativelanguage.googleapis.com/v1beta/models/{model}:streamGenerateContent?alt=sse returns SSE where each data: line is a partial GenerateContentResponse (with candidates, usageMetadata, modelVersion). Chrome DevDoc example shows chunked text fields.
  • Structured output streaming: When response_mime_type=application/json, streamed chunks are valid partial JSON strings that concatenate to the final object.
  • Function calling: Tools defined via functionDeclarations; function_calling_config modes AUTO/ANY/NONE. Latest docs confirm JSON Schema + Pydantic/Zod compatibility (2025-12 update) and compositional calls.
  • Function call argument streaming: Set toolConfig.functionCallingConfig.streamFunctionCallArguments=true; responses include functionCall.partialArgs with jsonPath, delta values (string/number/bool/null) and willContinue. Vertex AI content schema documents PartialArg message for decoding.
  • Transports: SSE for REST, WebSockets for Live API sessions (bidirectional audio/text). Need to abstract both.
  • Primary sources: Gemini API overview + streamGenerateContent reference + function calling guide + Vertex AI streaming arguments doc + Chrome streaming example (retrieved Jan 2 2026).