feat(core): adaptive output token escalation (8K default + 64K retry) (#2898)

* feat(core): adaptive output token escalation (8K default + 64K retry) 99% of model responses are under 5K tokens, but we previously reserved 32K for every request. This wastes GPU slot capacity by ~4x. Now the default output limit is 8K. When a response hits this cap (stop_reason=max_tokens), it automatically retries once at 64K — only the ~1% of requests that actually need more tokens pay the cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add design doc and user doc for adaptive output token escalation - Add design doc covering problem, architecture, token limit determination, escalation mechanism, and design decisions - Document QWEN_CODE_MAX_OUTPUT_TOKENS env var in settings.md - Add max_tokens adaptive behavior explanation in model config section --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 19:52:02 +00:00 · 2026-04-08 17:30:39 +08:00 · 2026-04-08 17:30:39 +08:00 · 1e8bc031cc
commit 1e8bc031cc
parent 3c23952ef7
11 changed files with 299 additions and 57 deletions
--- a/docs/users/configuration/settings.md
+++ b/docs/users/configuration/settings.md
@ -168,6 +168,18 @@ Settings are organized into categories. All settings should be placed within the
 }
 ```

+**max_tokens (adaptive output tokens):**
+
+When `samplingParams.max_tokens` is not set, Qwen Code uses an adaptive output token strategy to optimize GPU resource usage:
+
+1. Requests start with a default limit of **8K** output tokens
+2. If the response is truncated (the model hits the limit), Qwen Code automatically retries with **64K** tokens
+3. The partial output is discarded and replaced with the full response from the retry
+
+This is transparent to users — you may briefly see a retry indicator if escalation occurs. Since 99% of responses are under 5K tokens, the retry happens rarely (<1% of requests).
+
+To override this behavior, either set `samplingParams.max_tokens` in your settings or use the `QWEN_CODE_MAX_OUTPUT_TOKENS` environment variable.
+
 **contextWindowSize:**

 Overrides the default context window size for the selected model. Qwen Code determines the context window using built-in defaults based on model name matching, with a constant fallback value. Use this setting when a provider's effective context limit differs from Qwen Code's default. This value defines the model's assumed maximum context capacity, not a per-request token limit.
@ -491,22 +503,23 @@ For authentication-related variables (like `OPENAI_*`) and the recommended `.qwe

 ### Environment Variables Table

-| Variable                       | Description                                                                                                                                            | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
-| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `QWEN_TELEMETRY_ENABLED`       | Set to `true` or `1` to enable telemetry. Any other value is treated as disabling it.                                                                  | Overrides the `telemetry.enabled` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                         |
-| `QWEN_TELEMETRY_TARGET`        | Sets the telemetry target (`local` or `gcp`).                                                                                                          | Overrides the `telemetry.target` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                          |
-| `QWEN_TELEMETRY_OTLP_ENDPOINT` | Sets the OTLP endpoint for telemetry.                                                                                                                  | Overrides the `telemetry.otlpEndpoint` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| `QWEN_TELEMETRY_OTLP_PROTOCOL` | Sets the OTLP protocol (`grpc` or `http`).                                                                                                             | Overrides the `telemetry.otlpProtocol` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| `QWEN_TELEMETRY_LOG_PROMPTS`   | Set to `true` or `1` to enable or disable logging of user prompts. Any other value is treated as disabling it.                                         | Overrides the `telemetry.logPrompts` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                      |
-| `QWEN_TELEMETRY_OUTFILE`       | Sets the file path to write telemetry to when the target is `local`.                                                                                   | Overrides the `telemetry.outfile` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                         |
-| `QWEN_TELEMETRY_USE_COLLECTOR` | Set to `true` or `1` to enable or disable using an external OTLP collector. Any other value is treated as disabling it.                                | Overrides the `telemetry.useCollector` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| `QWEN_SANDBOX`                 | Alternative to the `sandbox` setting in `settings.json`.                                                                                               | Accepts `true`, `false`, `docker`, `podman`, or a custom command string.                                                                                                                                                                                                                                                                                                                                                                                                           |
-| `SEATBELT_PROFILE`             | (macOS specific) Switches the Seatbelt (`sandbox-exec`) profile on macOS.                                                                              | `permissive-open`: (Default) Restricts writes to the project folder (and a few other folders, see `packages/cli/src/utils/sandbox-macos-permissive-open.sb`) but allows other operations. `strict`: Uses a strict profile that declines operations by default. `<profile_name>`: Uses a custom profile. To define a custom profile, create a file named `sandbox-macos-<profile_name>.sb` in your project's `.qwen/` directory (e.g., `my-project/.qwen/sandbox-macos-custom.sb`). |
-| `DEBUG` or `DEBUG_MODE`        | (often used by underlying libraries or the CLI itself) Set to `true` or `1` to enable verbose debug logging, which can be helpful for troubleshooting. | **Note:** These variables are automatically excluded from project `.env` files by default to prevent interference with the CLI behavior. Use `.qwen/.env` files if you need to set these for Qwen Code specifically.                                                                                                                                                                                                                                                               |
-| `NO_COLOR`                     | Set to any value to disable all color output in the CLI.                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| `CLI_TITLE`                    | Set to a string to customize the title of the CLI.                                                                                                     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-| `CODE_ASSIST_ENDPOINT`         | Specifies the endpoint for the code assist server.                                                                                                     | This is useful for development and testing.                                                                                                                                                                                                                                                                                                                                                                                                                                        |
-| `TAVILY_API_KEY`               | Your API key for the Tavily web search service.                                                                                                        | Used to enable the `web_search` tool functionality. Example: `export TAVILY_API_KEY="tvly-your-api-key-here"`                                                                                                                                                                                                                                                                                                                                                                      |
+| Variable                       | Description                                                                                                                                                                                                                                                                    | Notes                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `QWEN_TELEMETRY_ENABLED`       | Set to `true` or `1` to enable telemetry. Any other value is treated as disabling it.                                                                                                                                                                                          | Overrides the `telemetry.enabled` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| `QWEN_TELEMETRY_TARGET`        | Sets the telemetry target (`local` or `gcp`).                                                                                                                                                                                                                                  | Overrides the `telemetry.target` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+| `QWEN_TELEMETRY_OTLP_ENDPOINT` | Sets the OTLP endpoint for telemetry.                                                                                                                                                                                                                                          | Overrides the `telemetry.otlpEndpoint` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `QWEN_TELEMETRY_OTLP_PROTOCOL` | Sets the OTLP protocol (`grpc` or `http`).                                                                                                                                                                                                                                     | Overrides the `telemetry.otlpProtocol` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `QWEN_TELEMETRY_LOG_PROMPTS`   | Set to `true` or `1` to enable or disable logging of user prompts. Any other value is treated as disabling it.                                                                                                                                                                 | Overrides the `telemetry.logPrompts` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| `QWEN_TELEMETRY_OUTFILE`       | Sets the file path to write telemetry to when the target is `local`.                                                                                                                                                                                                           | Overrides the `telemetry.outfile` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+| `QWEN_TELEMETRY_USE_COLLECTOR` | Set to `true` or `1` to enable or disable using an external OTLP collector. Any other value is treated as disabling it.                                                                                                                                                        | Overrides the `telemetry.useCollector` setting.                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `QWEN_SANDBOX`                 | Alternative to the `sandbox` setting in `settings.json`.                                                                                                                                                                                                                       | Accepts `true`, `false`, `docker`, `podman`, or a custom command string.                                                                                                                                                                                                                                                                                                                                                                                                           |
+| `SEATBELT_PROFILE`             | (macOS specific) Switches the Seatbelt (`sandbox-exec`) profile on macOS.                                                                                                                                                                                                      | `permissive-open`: (Default) Restricts writes to the project folder (and a few other folders, see `packages/cli/src/utils/sandbox-macos-permissive-open.sb`) but allows other operations. `strict`: Uses a strict profile that declines operations by default. `<profile_name>`: Uses a custom profile. To define a custom profile, create a file named `sandbox-macos-<profile_name>.sb` in your project's `.qwen/` directory (e.g., `my-project/.qwen/sandbox-macos-custom.sb`). |
+| `DEBUG` or `DEBUG_MODE`        | (often used by underlying libraries or the CLI itself) Set to `true` or `1` to enable verbose debug logging, which can be helpful for troubleshooting.                                                                                                                         | **Note:** These variables are automatically excluded from project `.env` files by default to prevent interference with the CLI behavior. Use `.qwen/.env` files if you need to set these for Qwen Code specifically.                                                                                                                                                                                                                                                               |
+| `NO_COLOR`                     | Set to any value to disable all color output in the CLI.                                                                                                                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `CLI_TITLE`                    | Set to a string to customize the title of the CLI.                                                                                                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `CODE_ASSIST_ENDPOINT`         | Specifies the endpoint for the code assist server.                                                                                                                                                                                                                             | This is useful for development and testing.                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| `QWEN_CODE_MAX_OUTPUT_TOKENS`  | Overrides the default maximum output tokens per response. When not set, Qwen Code uses an adaptive strategy: starts with 8K tokens and automatically retries with 64K if the response is truncated. Set this to a specific value (e.g., `16000`) to use a fixed limit instead. | Takes precedence over the capped default (8K) but is overridden by `samplingParams.max_tokens` in settings. Disables automatic escalation when set. Example: `export QWEN_CODE_MAX_OUTPUT_TOKENS=16000`                                                                                                                                                                                                                                                                            |
+| `TAVILY_API_KEY`               | Your API key for the Tavily web search service.                                                                                                                                                                                                                                | Used to enable the `web_search` tool functionality. Example: `export TAVILY_API_KEY="tvly-your-api-key-here"`                                                                                                                                                                                                                                                                                                                                                                      |

 ## Command-Line Arguments