ticket: add responses reasoning context safety

2026-05-29 15:13:44 +09:00 · 2026-05-29 15:13:44 +09:00 · b14a141341
commit b14a141341
parent ef09a75fba
3 changed files with 57 additions and 0 deletions
--- a/work-items/open/20260529-061224-responses-reasoning-context-safety/artifacts/.gitkeep
+++ b/work-items/open/20260529-061224-responses-reasoning-context-safety/artifacts/.gitkeep
--- a/work-items/open/20260529-061224-responses-reasoning-context-safety/item.md
+++ b/work-items/open/20260529-061224-responses-reasoning-context-safety/item.md
@ -0,0 +1,50 @@
+---
+id: 20260529-061224-responses-reasoning-context-safety
+slug: responses-reasoning-context-safety
+title: Fix context safety accounting for Responses reasoning
+status: open
+kind: bug
+priority: P1
+labels: [llm-worker, pod, compact, reasoning]
+created_at: 2026-05-29T06:12:24Z
+updated_at: 2026-05-29T06:12:24Z
+assignee: null
+legacy_ticket: null
+---
+
+## Background
+
+A long-running `gpt-5.5` session hit `context_length_exceeded` while the TUI still showed roughly `190k/400k`. The failing request was in session `019e6bcf-fc62-7f93-b117-39369699c2c3`, segment `019e6e18-c777-7be0-af32-9a2585e19ff7`, `turn=1195`, `llm_call=9`.
+
+The immediate trace showed the last successful usage event reported `input_tokens=197700`, while the failed request returned no usage. The request diagnostics also showed `reasoning.context="current_turn"` and a large request body (`items_len=2617`, `items_json_bytes=1775947`, `raw_json_bytes=1834360`, `wire_bytes=686528`). The same segment contained hundreds of persisted reasoning items with substantial `encrypted_content`.
+
+Two implementation areas need to be corrected together so context safety checks match what the Responses backend actually receives:
+
+1. `openai_responses` request construction appears to project persisted `Item::Reasoning` entries, including `encrypted_content`, back into the next request without enforcing the intended `reasoning.context` / current-turn / function-call adjacency policy documented in `docs/ref/model-reasoning-context.md`.
+2. Pod request-threshold safety checks appear to use persisted usage history and can miss in-flight usage records from earlier LLM calls in the same run, so a long tool loop can keep issuing requests based on stale token occupancy.
+
+## Requirements
+
+- Reconcile `docs/ref/model-reasoning-context.md` with `crates/llm-worker/src/llm_client/scheme/openai_responses/request.rs`.
+  - Define exactly which reasoning items may be sent for `reasoning.context="current_turn"`.
+  - Preserve the provider requirements for tool/function-call continuity.
+  - Do not silently resend old reasoning `encrypted_content` outside the documented policy.
+- Update request construction so persisted reasoning items are included only when required by the documented policy.
+  - Add focused tests covering old reasoning items, current-turn reasoning, function-call adjacency, and encrypted reasoning content.
+- Update Pod context safety accounting so request-threshold / pre-request checks include in-flight `UsageTracker` records from the current run, not only persisted session-log usage history.
+  - Ensure long same-run tool loops can trigger compact/prune/stop decisions using the latest successful usage before the next request is sent.
+- Preserve the existing principle that `Usage.input_tokens` is request prompt occupancy, while acknowledging failed `context_length_exceeded` responses may not include usage.
+- Improve diagnostics for context overflow and near-overflow cases.
+  - Record at least items count, item JSON bytes, raw/wire request bytes, reasoning item count, reasoning encrypted-content bytes, and whether provider usage was absent.
+  - Keep diagnostics out of model context unless they are intentionally logged as normal visible events.
+
+## Acceptance criteria
+
+- `reasoning.context="current_turn"` no longer causes old persisted reasoning `encrypted_content` to be resent outside the documented policy.
+- Function/tool-call continuity still works for Responses models that require adjacent reasoning/function-call state.
+- Request safety checks include current-run in-flight usage before sending subsequent LLM calls.
+- A focused regression test covers a single run with multiple LLM calls where later calls would exceed the threshold if in-flight usage were ignored.
+- A focused regression test covers a history containing old reasoning items and verifies request input contains only the allowed reasoning subset.
+- Context overflow diagnostics make it clear when provider usage is absent and expose request-size/reasoning-size counters.
+- `cargo fmt --check`
+- Relevant `cargo test` / `cargo check` for `llm-worker` and `pod` pass.
--- a/work-items/open/20260529-061224-responses-reasoning-context-safety/thread.md
+++ b/work-items/open/20260529-061224-responses-reasoning-context-safety/thread.md
@ -0,0 +1,7 @@
+<!-- event: create author: tickets.sh at: 2026-05-29T06:12:24Z -->
+
+## Created
+
+Created by tickets.sh create.
+
+---