yoi/thread.md at c306339e0a143e1cc4486d03a02aca31ebebd0a6

Hare/yoi

Fork 0

Hare c306339e0a

ticket: record panel and analytics implementations

2026-06-09 19:55:50 +09:00

7.5 KiB

Raw Blame History

作成

LocalTicketBackend によって作成されました。

Intake summary

既存 Ticket は、closed 済みの 20260609-032533-001 で追加された session-analytics 基盤への concrete follow-up として十分に具体化済み。目的は assistant response 単位の tool batching / edit round-trip 指標を JSON report に追加することで、実装対象 metrics、断定しない diagnostics 方針、非目標、受け入れ条件、テスト観点が明記されている。未決定の product/API/authority boundary はなく、Orchestrator は implementation_ready として routing できる。Reviewer focus は、response/tool-result cycle 推定の妥当性、raw content を出さない既存 analytics privacy boundary の維持、consecutive edit-only streak の過剰断定回避、既存 CLI/JSON schema との整合性。

State changed

Intake で既存 Ticket の本文・thread・artifacts と関連する closed Ticket 20260609-032533-001 を確認した。要件は実装・レビュー・検証できる粒度まで整理済みであり、planning から ready にする。

State changed

Ticket を workspace-panel が queued にしました。

State changed

Accepted queued implementation under the updated parallel-capacity policy. This Ticket extends the already-landed session analytics crate and is independent from the active ToolExecutionContext, TicketList, and Panel worktrees.

Decision

Routing decision: implementation_ready_parallel

Reason:

This Ticket extends session-analytics, a recently landed and currently inactive area.
It is independent from active ToolExecutionContext, TicketList output, and Panel UX work.
User asked to prefer parallel work when conflicts are absent or expected small.

IntentPacket:

Intent:

Add assistant-response-level tool batching and edit round-trip metrics to session-analytics so tool-use speed patterns can be evaluated in structured JSON.

Binding decisions / invariants:

Infer assistant response / tool-result cycle units from session JSONL as best-effort analytics.
Add response-level tool call metrics: total responses, tool-call responses, total tool calls, avg/p50/p90/max, histogram, top responses by tool call count.
Add edit batching metrics: Edit calls per response, responses containing Edit, same-file multiple Edit calls in one response, files touched per edit response, and correlation with large edit args where possible.
Add consecutive edit-only response streak metrics by file/path and response range, distinguishing pure edit-only streaks from sequences broken/annotated by Read/Bash/test/result-dependent steps.
Diagnostics must be observations such as possible batching opportunity, not blame or policy enforcement.
Preserve existing privacy boundary: no raw user input, raw tool args, raw file contents, raw session snippets, or raw tool output content in default output.
Do not change prompt/workflow behavior, implement EditBatch/ordered patch tools, or force batching policy.

Validation:

Tests for single response with multiple tools.
Tests for same-response same-file multiple Edit.
Tests for consecutive edit-only responses to same file.
Tests for interleaved Read/test step breaking or annotating streak.
Tests for sessions with no edits.
cargo fmt --check, git diff --check, cargo run -q -p yoi -- ticket doctor, cargo check --workspace, nix build .#yoi.

Implementation report

Implementation routing started in parallel with the current active work set.

Worktree/branch:

Worktree: .worktree/session-analytics-response-batching
Branch: session-analytics-response-batching
Base/routing commit: 5d9e227 ticket: route analytics batching and parallel guidance

Spawned sibling Coder Pod:

coder-session-analytics-response-batching
Scope: non-recursive read on parent workspace root plus recursive write limited to the child worktree.

Parallel-routing rationale:

This extends the isolated session-analytics crate/CLI surface and is independent from active ToolExecutionContext/TicketList/Panel work.
Coder was instructed to use synthetic fixtures and preserve the privacy boundary.

Implementation report

Coder implementation completed and is ready for sibling review.

Coder Pod:

coder-session-analytics-response-batching
Commit: 963db29d960d23b40e32d6b07db88f52b914a72d analytics: add response batching metrics
Worktree status before review: clean branch session-analytics-response-batching
Stopped after collecting output to reclaim delegated worktree scope.

Response boundary design:

Best-effort response grouping uses consecutive top-level kind: "assistant_item" JSONL entries as one assistant response.
Non-assistant entries such as tool_result, turn_end, or segment_start close the current response group.
Seeded segment_start.history is excluded from response-level metrics because exact original response boundaries are not explicit; a response_boundary_approximation diagnostic records this limitation.
Metrics live under response_batches and remain distinct from user-turn metrics.

Implementation summary:

Added response-level tool metrics: total responses, tool-call responses, total tool calls, avg/p50/p90/max tools per response, histogram, and top tool-call responses.
Added Edit batching metrics: responses containing Edit, total Edit calls, calls per response, same-file multi-Edit responses, files touched per Edit response, large-argument summary fields, and replace_all count.
Added consecutive edit round-trip metrics: pure same-file edit-only streaks and interrupted/annotated sequences when Read/Bash/test-like steps intervene.
Preserved privacy boundary: no raw user input, raw tool args, raw file contents, raw session snippets, or raw tool output content in default JSON output.

Changed files:

crates/session-analytics/src/lib.rs
crates/yoi/src/session_cli.rs

Coder validation reported passed:

cargo test -p session-analytics
cargo test -p yoi run_session_analyze_outputs_json
cargo fmt --check
git diff --check
cargo run -q -p yoi -- ticket doctor
cargo check --workspace
nix build .#yoi

Focused tests covered multiple tools in one response, same-response same-file edits, consecutive edit-only responses, interleaved Read/test-like Bash interruption/annotation, sessions with no edits, existing analytics behavior, and CLI JSON shape.

Residual notes:

Response boundaries are best-effort for current JSONL shape.
Percentile/avg output is count-based and avg_milli avoids floating-point JSON instability.
Bash test detection is heuristic and only annotates interrupted edit sequences; it is not blame/policy classification.

7.5 KiB Raw Blame History

作成

Intake summary

State changed

State changed

State changed

Decision

Implementation report

Implementation report

7.5 KiB

Raw Blame History