mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-04-28 19:52:02 +00:00
feat(core): adaptive output token escalation (8K default + 64K retry) (#2898)
* feat(core): adaptive output token escalation (8K default + 64K retry) 99% of model responses are under 5K tokens, but we previously reserved 32K for every request. This wastes GPU slot capacity by ~4x. Now the default output limit is 8K. When a response hits this cap (stop_reason=max_tokens), it automatically retries once at 64K — only the ~1% of requests that actually need more tokens pay the cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add design doc and user doc for adaptive output token escalation - Add design doc covering problem, architecture, token limit determination, escalation mechanism, and design decisions - Document QWEN_CODE_MAX_OUTPUT_TOKENS env var in settings.md - Add max_tokens adaptive behavior explanation in model config section --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
3c23952ef7
commit
1e8bc031cc
11 changed files with 299 additions and 57 deletions
|
|
@ -168,6 +168,18 @@ Settings are organized into categories. All settings should be placed within the
|
|||
}
|
||||
```
|
||||
|
||||
**max_tokens (adaptive output tokens):**
|
||||
|
||||
When `samplingParams.max_tokens` is not set, Qwen Code uses an adaptive output token strategy to optimize GPU resource usage:
|
||||
|
||||
1. Requests start with a default limit of **8K** output tokens
|
||||
2. If the response is truncated (the model hits the limit), Qwen Code automatically retries with **64K** tokens
|
||||
3. The partial output is discarded and replaced with the full response from the retry
|
||||
|
||||
This is transparent to users — you may briefly see a retry indicator if escalation occurs. Since 99% of responses are under 5K tokens, the retry happens rarely (<1% of requests).
|
||||
|
||||
To override this behavior, either set `samplingParams.max_tokens` in your settings or use the `QWEN_CODE_MAX_OUTPUT_TOKENS` environment variable.
|
||||
|
||||
**contextWindowSize:**
|
||||
|
||||
Overrides the default context window size for the selected model. Qwen Code determines the context window using built-in defaults based on model name matching, with a constant fallback value. Use this setting when a provider's effective context limit differs from Qwen Code's default. This value defines the model's assumed maximum context capacity, not a per-request token limit.
|
||||
|
|
@ -491,22 +503,23 @@ For authentication-related variables (like `OPENAI_*`) and the recommended `.qwe
|
|||
|
||||
### Environment Variables Table
|
||||
|
||||
| Variable | Description | Notes |
|
||||
| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `QWEN_TELEMETRY_ENABLED` | Set to `true` or `1` to enable telemetry. Any other value is treated as disabling it. | Overrides the `telemetry.enabled` setting. |
|
||||
| `QWEN_TELEMETRY_TARGET` | Sets the telemetry target (`local` or `gcp`). | Overrides the `telemetry.target` setting. |
|
||||
| `QWEN_TELEMETRY_OTLP_ENDPOINT` | Sets the OTLP endpoint for telemetry. | Overrides the `telemetry.otlpEndpoint` setting. |
|
||||
| `QWEN_TELEMETRY_OTLP_PROTOCOL` | Sets the OTLP protocol (`grpc` or `http`). | Overrides the `telemetry.otlpProtocol` setting. |
|
||||
| `QWEN_TELEMETRY_LOG_PROMPTS` | Set to `true` or `1` to enable or disable logging of user prompts. Any other value is treated as disabling it. | Overrides the `telemetry.logPrompts` setting. |
|
||||
| `QWEN_TELEMETRY_OUTFILE` | Sets the file path to write telemetry to when the target is `local`. | Overrides the `telemetry.outfile` setting. |
|
||||
| `QWEN_TELEMETRY_USE_COLLECTOR` | Set to `true` or `1` to enable or disable using an external OTLP collector. Any other value is treated as disabling it. | Overrides the `telemetry.useCollector` setting. |
|
||||
| `QWEN_SANDBOX` | Alternative to the `sandbox` setting in `settings.json`. | Accepts `true`, `false`, `docker`, `podman`, or a custom command string. |
|
||||
| `SEATBELT_PROFILE` | (macOS specific) Switches the Seatbelt (`sandbox-exec`) profile on macOS. | `permissive-open`: (Default) Restricts writes to the project folder (and a few other folders, see `packages/cli/src/utils/sandbox-macos-permissive-open.sb`) but allows other operations. `strict`: Uses a strict profile that declines operations by default. `<profile_name>`: Uses a custom profile. To define a custom profile, create a file named `sandbox-macos-<profile_name>.sb` in your project's `.qwen/` directory (e.g., `my-project/.qwen/sandbox-macos-custom.sb`). |
|
||||
| `DEBUG` or `DEBUG_MODE` | (often used by underlying libraries or the CLI itself) Set to `true` or `1` to enable verbose debug logging, which can be helpful for troubleshooting. | **Note:** These variables are automatically excluded from project `.env` files by default to prevent interference with the CLI behavior. Use `.qwen/.env` files if you need to set these for Qwen Code specifically. |
|
||||
| `NO_COLOR` | Set to any value to disable all color output in the CLI. | |
|
||||
| `CLI_TITLE` | Set to a string to customize the title of the CLI. | |
|
||||
| `CODE_ASSIST_ENDPOINT` | Specifies the endpoint for the code assist server. | This is useful for development and testing. |
|
||||
| `TAVILY_API_KEY` | Your API key for the Tavily web search service. | Used to enable the `web_search` tool functionality. Example: `export TAVILY_API_KEY="tvly-your-api-key-here"` |
|
||||
| Variable | Description | Notes |
|
||||
| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `QWEN_TELEMETRY_ENABLED` | Set to `true` or `1` to enable telemetry. Any other value is treated as disabling it. | Overrides the `telemetry.enabled` setting. |
|
||||
| `QWEN_TELEMETRY_TARGET` | Sets the telemetry target (`local` or `gcp`). | Overrides the `telemetry.target` setting. |
|
||||
| `QWEN_TELEMETRY_OTLP_ENDPOINT` | Sets the OTLP endpoint for telemetry. | Overrides the `telemetry.otlpEndpoint` setting. |
|
||||
| `QWEN_TELEMETRY_OTLP_PROTOCOL` | Sets the OTLP protocol (`grpc` or `http`). | Overrides the `telemetry.otlpProtocol` setting. |
|
||||
| `QWEN_TELEMETRY_LOG_PROMPTS` | Set to `true` or `1` to enable or disable logging of user prompts. Any other value is treated as disabling it. | Overrides the `telemetry.logPrompts` setting. |
|
||||
| `QWEN_TELEMETRY_OUTFILE` | Sets the file path to write telemetry to when the target is `local`. | Overrides the `telemetry.outfile` setting. |
|
||||
| `QWEN_TELEMETRY_USE_COLLECTOR` | Set to `true` or `1` to enable or disable using an external OTLP collector. Any other value is treated as disabling it. | Overrides the `telemetry.useCollector` setting. |
|
||||
| `QWEN_SANDBOX` | Alternative to the `sandbox` setting in `settings.json`. | Accepts `true`, `false`, `docker`, `podman`, or a custom command string. |
|
||||
| `SEATBELT_PROFILE` | (macOS specific) Switches the Seatbelt (`sandbox-exec`) profile on macOS. | `permissive-open`: (Default) Restricts writes to the project folder (and a few other folders, see `packages/cli/src/utils/sandbox-macos-permissive-open.sb`) but allows other operations. `strict`: Uses a strict profile that declines operations by default. `<profile_name>`: Uses a custom profile. To define a custom profile, create a file named `sandbox-macos-<profile_name>.sb` in your project's `.qwen/` directory (e.g., `my-project/.qwen/sandbox-macos-custom.sb`). |
|
||||
| `DEBUG` or `DEBUG_MODE` | (often used by underlying libraries or the CLI itself) Set to `true` or `1` to enable verbose debug logging, which can be helpful for troubleshooting. | **Note:** These variables are automatically excluded from project `.env` files by default to prevent interference with the CLI behavior. Use `.qwen/.env` files if you need to set these for Qwen Code specifically. |
|
||||
| `NO_COLOR` | Set to any value to disable all color output in the CLI. | |
|
||||
| `CLI_TITLE` | Set to a string to customize the title of the CLI. | |
|
||||
| `CODE_ASSIST_ENDPOINT` | Specifies the endpoint for the code assist server. | This is useful for development and testing. |
|
||||
| `QWEN_CODE_MAX_OUTPUT_TOKENS` | Overrides the default maximum output tokens per response. When not set, Qwen Code uses an adaptive strategy: starts with 8K tokens and automatically retries with 64K if the response is truncated. Set this to a specific value (e.g., `16000`) to use a fixed limit instead. | Takes precedence over the capped default (8K) but is overridden by `samplingParams.max_tokens` in settings. Disables automatic escalation when set. Example: `export QWEN_CODE_MAX_OUTPUT_TOKENS=16000` |
|
||||
| `TAVILY_API_KEY` | Your API key for the Tavily web search service. | Used to enable the `web_search` tool functionality. Example: `export TAVILY_API_KEY="tvly-your-api-key-here"` |
|
||||
|
||||
## Command-Line Arguments
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue