ticket: close web search fetch tools
This commit is contained in:
parent
fed461036a
commit
56a5e47d63
|
|
@ -1,102 +0,0 @@
|
|||
---
|
||||
id: 20260528-152959-web-search-fetch-tools
|
||||
slug: web-search-fetch-tools
|
||||
title: Add WebSearch and WebFetch tools
|
||||
status: open
|
||||
kind: task
|
||||
priority: P2
|
||||
labels: [tools, web, llm]
|
||||
created_at: 2026-05-28T15:29:59Z
|
||||
updated_at: 2026-05-28T15:29:59Z
|
||||
assignee: null
|
||||
legacy_ticket: null
|
||||
---
|
||||
|
||||
## Background
|
||||
|
||||
Insomnia currently has strong local filesystem / shell / memory tools, but the agent cannot directly consult current web information except through user-provided excerpts or shell commands. Add first-class WebSearch and WebFetch tools so the model can gather public web information through bounded, observable tool calls.
|
||||
|
||||
This should be implemented as normal built-in tools, not as hidden context injection. Tool calls and results must remain visible in history, subject to manifest permission policy, and bounded by output limits.
|
||||
|
||||
## Requirement
|
||||
|
||||
- Add `WebSearch` tool.
|
||||
- Input includes query string and optional result limit.
|
||||
- Output returns structured results: title, URL, snippet/summary, source/search provider metadata where available.
|
||||
- Search provider must be configurable. If no provider/API key is configured, the tool should fail with a clear diagnostic instead of falling back to scraping arbitrary search pages.
|
||||
- Add `WebFetch` tool.
|
||||
- Input includes URL and optional mode/limits.
|
||||
- Output returns normalized text content plus metadata such as final URL, status, content type, title if available, and byte/token truncation indication.
|
||||
- HTML should be converted to readable text. Non-text content should be rejected or summarized only when a safe explicit handler exists.
|
||||
- Add manifest configuration for web tools.
|
||||
- Enable/disable controls.
|
||||
- Search provider/API key configuration.
|
||||
- Fetch timeout, max response bytes, max output bytes/tokens, redirect limit.
|
||||
- Allowed/denied URL schemes and host policy.
|
||||
- Integrate with built-in tool registration and manifest permission policy.
|
||||
- Web tools are normal tool calls and should go through the existing tool permission mechanism.
|
||||
- No implicit network access should happen outside a tool call.
|
||||
- Add security and reliability protections.
|
||||
- Only `http`/`https` by default.
|
||||
- Reject local/private/link-local/loopback addresses by default unless explicitly configured.
|
||||
- Bound redirects and re-check final URLs.
|
||||
- Bound download size and output size.
|
||||
- Provide clear errors for timeout, DNS/network failure, unsupported content, blocked host/scheme, and truncation.
|
||||
- Prompts/tool descriptions should tell the model when to use WebSearch vs WebFetch and that fetched content may be stale/untrusted.
|
||||
|
||||
## Brave Search API notes
|
||||
|
||||
`https://github.com/brave/brave-search-mcp-server` was checked as the reference implementation. It is an MCP server around Brave Search APIs, not a general page fetcher.
|
||||
|
||||
Useful details to mirror for `WebSearch`:
|
||||
|
||||
- Use `https://api.search.brave.com/res/v1/web/search` for the first provider.
|
||||
- Authenticate with `X-Subscription-Token: <api key>`; Insomnia should read the key from a configured environment variable such as `BRAVE_SEARCH_API_KEY` rather than storing raw secrets in the manifest.
|
||||
- Send `q` for the query. Validate query length up front: Brave's MCP server caps at 400 characters and 50 words.
|
||||
- Expose a small initial subset of Brave parameters rather than the full API surface:
|
||||
- `count` / result limit: 1-20.
|
||||
- `offset`: 0-9 if pagination is included.
|
||||
- `country`, `search_lang`, `ui_lang` as optional config/defaults, not necessarily per-call in the first version.
|
||||
- `safesearch`: default `moderate`.
|
||||
- `freshness`: optional (`pd`, `pw`, `pm`, `py`, or date range) can be added if easy, but is not required for the first cut.
|
||||
- Format output conservatively. The MCP server reduces web results to `{ url, title, description, extra_snippets }` and optionally emits FAQ, discussions, news, and videos. Insomnia's first version should return the core web result fields plus provider metadata, and may ignore non-web result buckets unless explicitly requested later.
|
||||
- Brave's public docs/reference include an LLM Context endpoint (`/res/v1/summarizer/llm_context`) that returns extracted snippets/content with token and per-URL limits. This is useful as a future `WebContext`/enhanced search provider, but should not replace `WebFetch` in the first implementation because it is provider-specific and not a direct URL fetch tool.
|
||||
- Brave MCP defines a nominal free-plan rate limit of 1 request/second and 15,000/month and does not implement robust self-throttling. Insomnia should at least surface HTTP 429/rate-limit errors clearly; local throttling can be a follow-up unless implementation is cheap.
|
||||
|
||||
## Implementation plan
|
||||
|
||||
- First implementation should be provider-independent normal function tools, not OpenAI/Codex hosted `web_search`.
|
||||
- Add a small typed web config to manifest resolution and pass it into built-in tool registration.
|
||||
- Default disabled/no provider should produce clear tool errors, not implicit network access.
|
||||
- Prefer environment-variable secret references for API keys; do not require raw keys in manifest files.
|
||||
- Implement `WebSearch` with Brave as the first provider.
|
||||
- Keep the first-call schema small: `query`, optional `limit`, and optional `offset` only unless a broader parameter is trivial and well-tested.
|
||||
- Use deterministic JSON output with provider metadata and a bounded result list.
|
||||
- Implement `WebFetch` independently from Brave.
|
||||
- Use an HTTP client with timeout, redirect limits, response byte limits, and content-type checks.
|
||||
- Enforce URL/host/IP policy before fetch and on every redirect.
|
||||
- Convert HTML to readable text with a scoped dependency or a small internal extractor; reject unsupported binary content.
|
||||
- Register both tools through the existing built-in tool path so existing permission policy and history/logging behavior applies naturally.
|
||||
- Do not implement provider-hosted OpenAI/Codex `web_search` in this ticket; that is a separate model/provider hosted-tool feature.
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
- `WebSearch` and `WebFetch` are registered built-in tools when enabled/configured.
|
||||
- Tool schemas are typed and validated.
|
||||
- Manifest docs/config examples describe how to enable/configure web tools.
|
||||
- Permission policy can allow/deny/ask these tools like other tools.
|
||||
- Tool results are bounded and visible in history; no hidden web context is injected.
|
||||
- Unit tests cover input validation, disabled/unconfigured errors, URL policy, redirect/final URL policy, output truncation, and representative HTML-to-text conversion.
|
||||
- At least one integration-style test uses a local test HTTP server or mock provider rather than the public internet.
|
||||
- `cargo fmt --check`
|
||||
- `cargo check -p tools -p manifest -p pod`
|
||||
- Relevant focused tests for tools/manifest.
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Browser automation.
|
||||
- Authenticated browsing / cookies / sessions.
|
||||
- Javascript rendering.
|
||||
- File downloads as attachments.
|
||||
- Using arbitrary shell commands as the primary web access path.
|
||||
- Hidden pre-request browsing or automatic web context injection.
|
||||
|
|
@ -1,37 +0,0 @@
|
|||
---
|
||||
id: 20260528-152959-web-search-fetch-tools-review
|
||||
slug: web-search-fetch-tools
|
||||
title: Review for WebSearch and WebFetch tools
|
||||
status: reviewed
|
||||
kind: review
|
||||
created_at: 2026-05-28T15:29:59Z
|
||||
updated_at: 2026-05-29T09:28:00Z
|
||||
reviewer: insomnia-system
|
||||
---
|
||||
|
||||
## Review summary
|
||||
|
||||
Reviewed implementation branch `work/web-search-fetch-tools` in worktree `/home/hare/Projects/insomnia/.worktree/web-search-fetch-tools`.
|
||||
|
||||
The implementation adds normal built-in function tools `WebSearch` and `WebFetch`, not provider-hosted OpenAI/Codex tools. `WebSearch` uses Brave Search API with environment-variable API key configuration, query/limit/offset validation, and bounded JSON output. `WebFetch` uses an independent HTTP client with URL/scheme/host/IP policy, redirect revalidation, timeout and byte limits, content-type checks, and HTML/text/JSON/XML-ish rendering. Both tools are registered through the existing built-in tool path and fail closed when web access is disabled or search is unconfigured.
|
||||
|
||||
One blocking issue was found and fixed: Brave WebSearch initially had no request timeout and read the provider response body without a size bound. The amendment adds typed search timeout configuration and bounded response reading.
|
||||
|
||||
The implementation keeps Codex hosted web search out of scope, which matches the ticket decision.
|
||||
|
||||
## Validation
|
||||
|
||||
Reviewer ran:
|
||||
|
||||
- `cargo fmt --check`
|
||||
- `cargo test -p tools --no-default-features`
|
||||
- `cargo test -p manifest --no-default-features`
|
||||
- `cargo check -p pod --no-default-features`
|
||||
- `cargo check -p tui --no-default-features`
|
||||
- `git diff --check develop...HEAD`
|
||||
|
||||
All passed. The only compiler warnings observed were pre-existing dead-code warnings under no-default feature checks.
|
||||
|
||||
## Judgment
|
||||
|
||||
Approved after amendment.
|
||||
|
|
@ -1,7 +0,0 @@
|
|||
<!-- event: create author: tickets.sh at: 2026-05-28T15:29:59Z -->
|
||||
|
||||
## Created
|
||||
|
||||
Created by tickets.sh create.
|
||||
|
||||
---
|
||||
Loading…
Reference in New Issue
Block a user