8.2 KiB
Created
Created by tickets.sh create.
Plan
Preflight
Classification: research-first / implementation-ready after sources are recorded.
The work is mostly data/catalog maintenance. It should begin with current provider documentation/model-list research and a short source note before editing the catalog. Implementation should be limited to resources/models/builtin.toml and directly related docs/tests unless research proves a provider definition is wrong.
Critical risks:
- Do not guess model IDs or context windows from memory.
- Do not add models that the current provider client cannot address.
- Do not churn provider definitions unless needed.
- If changing the default profile model, explain the product reason and verify compaction/effective window metadata.
Decision
Research note for builtin catalog refresh:
Sources checked:
- Anthropic Models overview (
https://docs.anthropic.com/en/docs/about-claude/models/overview, redirected tohttps://platform.claude.com/docs/en/about-claude/models/overview): current comparison lists Claude Opus 4.8, Claude Sonnet 4.6, and Claude Haiku 4.5. API IDs:claude-opus-4-8,claude-sonnet-4-6,claude-haiku-4-5-20251001; aliases includeclaude-haiku-4-5. Context windows: Opus 4.8 1M, Sonnet 4.6 1M, Haiku 4.5 200k. Opus 4.8 is described as the starting point for most complex tasks, but the table says Extended thinking: No, so the catalog gives it an explicit capability withoutreasoning = "budget_tokens". - OpenAI Models overview (
https://platform.openai.com/docs/models, redirected tohttps://developers.openai.com/api/docs/models): recommendsgpt-5.5for complex reasoning/coding, withgpt-5.4andgpt-5.4-minias lower latency/cost variants.gpt-5.5andgpt-5.4have 1.05M context windows and 128k max output. - OpenAI model detail pages:
https://developers.openai.com/api/docs/models/gpt-5.5: model IDgpt-5.5, 1,050,000 context window, xhigh reasoning support, notes prompts over 272K input tokens are charged differently; local catalog retainsmax_context_window = 272000for the existing backend/effective-window clamp decision.https://developers.openai.com/api/docs/models/gpt-5.4: model IDgpt-5.4, 1,050,000 context window.https://developers.openai.com/api/docs/models/gpt-5-codex: model IDgpt-5-codex, 400,000 context window, Responses API only, optimized for agentic coding in Codex/similar environments.
- OpenRouter model list endpoint (
https://openrouter.ai/api/v1/models): confirmedanthropic/claude-opus-4.8(1M),anthropic/claude-sonnet-4.6(1M), andopenai/gpt-5.5(1.05M) with tools/structured output/reasoning parameters. Dynamic~...latestrouter aliases exist, but the builtin catalog uses concrete IDs to avoid unstable default behavior. - Ollama Library:
https://ollama.com/library/llama3.3:llama3.3latest/70b has 128K context.https://ollama.com/library/qwen3-coder:qwen3-coderlatest/30b has 256K context and is positioned for agentic/coding tasks.
Selected changes:
- Anthropic direct: replace stale
claude-sonnet-4-5/claude-opus-4-1withclaude-opus-4-8,claude-sonnet-4-6, andclaude-haiku-4-5; update Sonnet context to 1M. - Codex OAuth/OpenAI: keep default
codex-oauth/gpt-5.5, update advertised context to 1.05M while retaining the existing 272K effective clamp; replace older plaingpt-5entry withgpt-5.4; keepgpt-5-codexbecause OpenAI documents it as a Codex/similar-environment Responses model. - OpenRouter: replace stale
anthropic/claude-sonnet-4/openai/gpt-5with concrete current IDsanthropic/claude-opus-4.8,anthropic/claude-sonnet-4.6, andopenai/gpt-5.5. - Ollama: replace
llama3.1/qwen2.5-coderwith current generic local placeholdersllama3.3andqwen3-coder. - Provider definitions unchanged; no provider-level source indicated that
resources/providers/builtin.tomlis stale. - Default profile remains
codex-oauth/gpt-5.5; this remains aligned with OpenAI’s current model recommendation and the existing effective-context clamp used by compaction safety.
Review: approve
External review by reviewer Pod builtin-catalog-reviewer-20260530: approve.
Reviewer summary:
- Sources were recorded in the ticket thread before/with the catalog changes.
- Catalog changes align with the recorded sources:
- Anthropic entries now use Opus 4.8 / Sonnet 4.6 / Haiku 4.5, with Sonnet and Opus 1M context and Haiku 200k.
- Codex OAuth/OpenAI entries use
gpt-5.5,gpt-5.4, andgpt-5-codex;gpt-5.5retains the existingmax_context_window = 272000effective clamp while advertising 1.05M. - OpenRouter uses concrete current IDs, avoiding unstable
~...latestaliases. - Ollama entries use
llama3.3andqwen3-coder.
- Provider definitions and default profile were reasonably left unchanged.
- Reported validation was adequate.
Blockers: none.
Non-blocking note addressed after review:
- The provider catalog test name/assertion still implied provider-default capability fallback for a model that now has explicit capability. It was renamed/updated to describe provider+model catalog merge semantics.
Implementation report
Implementation report:
- Refreshed
resources/models/builtin.tomlfrom recorded official/semiofficial sources. - Anthropic direct entries now cover
claude-opus-4-8,claude-sonnet-4-6, andclaude-haiku-4-5; Sonnet/Opus context windows are 1M and Haiku is 200k. Opus has explicit capability withoutreasoning = "budget_tokens"because the source table says Extended thinking is not supported. - Ollama local placeholders are now
llama3.3(128K) andqwen3-coder(256K). - Codex OAuth/OpenAI entries now cover
gpt-5.5,gpt-5.4, andgpt-5-codex;gpt-5.5advertises 1.05M while retaining the existingmax_context_window = 272000effective clamp. - OpenRouter entries now use concrete current IDs
anthropic/claude-opus-4.8,anthropic/claude-sonnet-4.6, andopenai/gpt-5.5; dynamic latest aliases were intentionally not added. resources/providers/builtin.tomlandresources/profiles/default.luawere left unchanged.- Updated provider catalog test expectations and renamed the affected test to avoid implying provider-default capability fallback for an explicitly cataloged model.
External review:
- Reviewer Pod
builtin-catalog-reviewer-20260530approved with no blockers. - Reviewer non-blocking note about the stale test name/assert message was addressed.
Validation:
cargo fmt --checkpassedcargo test -p providerpassedcargo test -p manifest modelpassedcargo test -p manifest profile -- --nocapturepassedcargo check -p provider -p manifestpassed./tickets.sh doctorpassedgit diff --checkpassed
Closed
Refreshed the builtin model catalog from recorded official/semiofficial sources. Anthropic, OpenAI/Codex OAuth, OpenRouter, and Ollama entries now point at current concrete model IDs; default profile remains ; provider definitions were unchanged. External review approved and validation passed: cargo fmt --check, cargo test -p provider, cargo test -p manifest model, cargo test -p manifest profile, cargo check -p provider -p manifest, ./tickets.sh doctor, git diff --check.
Closed
Refreshed the builtin model catalog from recorded official/semiofficial sources. Anthropic, OpenAI/Codex OAuth, OpenRouter, and Ollama entries now point at current concrete model IDs; default profile remains codex-oauth/gpt-5.5; provider definitions were unchanged.
External review approved and validation passed:
cargo fmt --checkcargo test -p providercargo test -p manifest modelcargo test -p manifest profile -- --nocapturecargo check -p provider -p manifest./tickets.sh doctorgit diff --check