qwen-code

mirror of https://github.com/QwenLM/qwen-code.git synced 2026-04-28 11:41:04 +00:00

History

Shaojin Wen 1e8bc031cc feat(core): adaptive output token escalation (8K default + 64K retry) (#2898 ) * feat(core): adaptive output token escalation (8K default + 64K retry) 99% of model responses are under 5K tokens, but we previously reserved 32K for every request. This wastes GPU slot capacity by ~4x. Now the default output limit is 8K. When a response hits this cap (stop_reason=max_tokens), it automatically retries once at 64K — only the ~1% of requests that actually need more tokens pay the cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add design doc and user doc for adaptive output token escalation - Add design doc covering problem, architecture, token limit determination, escalation mechanism, and design decisions - Document QWEN_CODE_MAX_OUTPUT_TOKENS env var in settings.md - Add max_tokens adaptive behavior explanation in model config section --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-08 17:30:39 +08:00
..
_meta.ts	feat: update docs	2025-12-15 22:12:34 +08:00
auth.md	fix(docs): update references from Bailian to ModelStudio in README and localization files	2026-03-27 17:55:42 +08:00
memory.md	feat: refactor docs	2025-12-05 10:51:57 +08:00
model-providers.md	docs: clarify envKey and add env field examples to model-providers	2026-03-27 18:22:16 +08:00
qwen-ignore.md	Merge pull request #1266 from QwenLM/docs-fix	2025-12-17 22:04:27 +08:00
settings.md	feat(core): adaptive output token escalation (8K default + 64K retry) (#2898 )	2026-04-08 17:30:39 +08:00
themes.md	Merge pull request #1266 from QwenLM/docs-fix	2025-12-17 22:04:27 +08:00
trusted-folders.md	docs: updated all links, click and open in vscode, new showcase video in overview	2025-12-17 11:10:31 +08:00