4.4 KiB
Created
Created by tickets.sh create.
Plan
Implementation plan:
- Replace the current
readability-rsadapter with a local DOM-based reader extractor scoped tocrates/tools. - Add
include_navigationtoWebFetchInput, default false, and thread it only through the HTML render path. - Render readable content as Markdown-ish text so inline links remain followable.
- Detect navigation generically and omit it by default while reporting a notice; include bounded navigation links only when requested.
- Remove the direct
readability-rsdependency and update Cargo/Nix lock data. - Validate with focused web tests, tools check, doctor, diff check, and Nix build/hash handling.
Implementation report
Implementation report from coder Pod webfetch-local-reader-coder-20260530:
- Branch:
webfetch-local-reader-markdown - Commits:
2a3208b96e151d748fd80d5a1c4236ba616b3c6eweb: render readable html as markdown44262c42365cbba1dad87629c06ce539cbd25105web: harden reader navigation fallbacks
- Changed files:
crates/tools/src/web.rs,crates/tools/Cargo.toml,Cargo.lock,package.nix - Removed direct
readability-rsdependency and replaced it with a localhtml5ever/rcdom reader implementation. - Added
include_navigation: Option<bool>toWebFetchInput. - Renders readable HTML as Markdown-ish text, preserving inline body links as
[label](absolute-url). - Detects generic navigation/sidebar/toc/menu/breadcrumb/pagination/prev-next content, omits it by default with metadata/notice, and includes bounded
## Navigationwhen requested. - Reader failure/fallback is reported with
readable=false/fallback metadata; fallback output omits detected navigation by default so metadata and text agree. - Added/strengthened regression tests for link preservation, navigation omission/inclusion, link-heavy main failure, fallback navigation consistency, and navigation truncation.
Validation reported by coder:
cargo fmt --checkpassedcargo test -p tools webpassed (14 passed)cargo check -p toolspassed./tickets.sh doctorpassedgit diff --checkpassednix build .#insomniapassed
Unresolved issues: none.
Review: approve
External review by reviewer Pod webfetch-local-reader-reviewer-20260530: approve.
First review requested changes for two blockers:
- link-heavy
body/maincould be accepted as readable main content; - fallback could claim navigation omission while returning detected navigation text.
Follow-up commit 44262c42365cbba1dad87629c06ce539cbd25105 resolved both:
candidate_scorerejects high link density for all candidate tags, includingbodyandmain;- fallback text is generated through the DOM reader path so detected navigation is omitted by default when
include_navigation=false; - metadata aligns with included/omitted navigation state;
- tests cover link-heavy main, fallback nav omission consistency, strengthened omitted nav labels, and navigation truncation metadata.
Reviewer found no new blocker. Reported validation is adequate.
Implementation report
Main workspace validation after merge:
cargo fmt --checkpassedcargo test -p tools webpassed (14 passed)cargo check -p toolspassed with existingllm-workerdead_code warning./tickets.sh doctorpassedgit diff --checkpassednix build .#insomniapassed (with dirty tree warning due to existing.insomnia/workflow/multi-agent-workflow.mdlocal modification and open ticket lifecycle files)
Closed
Replaced the readability-rs WebFetch HTML extraction path with a local pure-Rust DOM reader that renders Markdown-ish main content and preserves inline links as absolute Markdown links. Added optional include_navigation, default navigation omission notices, bounded navigation inclusion, readable/fallback metadata, and regression coverage. External review approved after blocker fixes; validation passed including focused tools tests and Nix build.