close: prompt occupancy estimator

This commit is contained in:
Keisuke Hirata 2026-06-01 10:10:06 +09:00
parent e51944f045
commit 623f45a254
No known key found for this signature in database
4 changed files with 40 additions and 2 deletions

View File

@ -2,12 +2,12 @@
id: 20260601-001616-prompt-occupancy-token-estimator
slug: prompt-occupancy-token-estimator
title: Token estimator must keep prompt occupancy accounting whole
status: open
status: closed
kind: task
priority: P1
labels: [compaction, token-accounting]
created_at: 2026-06-01T00:16:16Z
updated_at: 2026-06-01T00:59:20Z
updated_at: 2026-06-01T01:10:06Z
assignee: null
legacy_ticket: null
---

View File

@ -0,0 +1,15 @@
Merged and completed.
Implementation:
- Merged branch `prompt-occupancy-token-estimator` into `develop` with `merge: prompt occupancy estimator`.
- `llm-worker` token counter extrapolation now keeps exact measured prompt occupancy authoritative and no longer extrapolates one-measurement growth via `total_input_tokens / history_bytes`.
- Extrapolation past the latest measurement uses a measured incremental span rate when available; otherwise it adds a conservative byte fallback for the unmeasured delta.
- Added pod interceptor regression coverage for the fresh-session / one-measurement overestimation case.
Validation after merge:
- `cargo test -p llm-worker token_counter` passed.
- `cargo test -p pod pre_llm_request_does_not_yield_from_single_measurement_history_rate_projection` passed.
- `./tickets.sh doctor` passed.
Review:
- External reviewer approved with no blockers.

View File

@ -122,4 +122,27 @@ Non-blocking follow-up:
- Some comments still describe extrapolation as a latest/final measurement rate even though the implementation is now latest measured incremental span or byte fallback. Reviewer classified this as documentation drift only, not a blocker.
---
<!-- event: close author: hare at: 2026-06-01T01:10:06Z status: closed -->
## Closed
Merged and completed.
Implementation:
- Merged branch `prompt-occupancy-token-estimator` into `develop` with `merge: prompt occupancy estimator`.
- `llm-worker` token counter extrapolation now keeps exact measured prompt occupancy authoritative and no longer extrapolates one-measurement growth via `total_input_tokens / history_bytes`.
- Extrapolation past the latest measurement uses a measured incremental span rate when available; otherwise it adds a conservative byte fallback for the unmeasured delta.
- Added pod interceptor regression coverage for the fresh-session / one-measurement overestimation case.
Validation after merge:
- `cargo test -p llm-worker token_counter` passed.
- `cargo test -p pod pre_llm_request_does_not_yield_from_single_measurement_history_rate_projection` passed.
- `./tickets.sh doctor` passed.
Review:
- External reviewer approved with no blockers.
---