qwen-code/docs/users/configuration
Shaojin Wen efb7351d58
feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800)
* feat(core): support reasoning effort 'max' tier (DeepSeek extension)

DeepSeek's chat-completions endpoint added an extra-strong `max` tier
to `reasoning_effort` (per
https://api-docs.deepseek.com/zh-cn/api/create-chat-completion ; valid
values are now `high` and `max`, with `low`/`medium` mapping to `high`
for backward compat). Plumb it end-to-end:

- `ContentGeneratorConfig.reasoning.effort` union now includes 'max'.
- DeepSeek OpenAI-compat provider: translate the standard nested
  `reasoning: { effort }` shape into DeepSeek's flat `reasoning_effort`
  body parameter so user-configured effort actually takes effect (the
  nested shape was previously sent verbatim and silently ignored,
  defaulting to `high`). low/medium → high mirrors the documented
  server-side behavior so dashboards / logs match wire reality.
  An explicit top-level `reasoning_effort` (via samplingParams or
  extra_body) wins over the nested form.
- Anthropic converter: pass 'max' through to `output_config.effort`
  unchanged and bump the `thinking.budget_tokens` budget for the new
  tier (low 16k / medium 32k / high 64k / max 128k).
- Gemini converter: clamp 'max' to HIGH since Gemini has no higher
  thinking level. Without this, 'max' would silently fall through to
  THINKING_LEVEL_UNSPECIFIED.

Live verification against api.deepseek.com:
- `reasoning_effort: high` → 200
- `reasoning_effort: max`  → 200 (the new tier)
- `reasoning_effort: bogus`→ 400 with valid-set list confirming
  [high, low, medium, max, xhigh]

108 anthropic/openai-deepseek/gemini tests pass; full core suite
(6601 tests) green; lint + typecheck clean.

* fix(core): map xhigh→max + clamp max on non-DeepSeek anthropic + docs

Address PR review (copilot × 2) and add missing user docs:

1. (J698) `translateReasoningEffort` claimed in the PR description that
   it surfaces the DeepSeek backward-compat mapping client-side, but
   only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match
   the doc and stay symmetric with the low/medium branch.

2. (J6-A) `output_config.effort: 'max'` would have been emitted on
   any anthropic-protocol provider whenever a user configured it, even
   when the baseURL points at real `api.anthropic.com` (which only
   accepts low/medium/high and would 400). Reuse the existing
   `isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'`
   on non-DeepSeek anthropic backends, with a debugLogger.warn so the
   downgrade is visible. DeepSeek anthropic-compatible endpoints still
   pass through unchanged.

3. New docs:
   - `docs/users/configuration/model-providers.md`: a "Reasoning /
     thinking configuration" section under generationConfig — single
     example targeting DeepSeek + a per-provider behavior table
     (OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for
     other servers, real Anthropic clamp, Anthropic-compatible
     DeepSeek passthrough, Gemini thinkingLevel mapping).
   - `docs/users/configuration/settings.md`: extend the
     `model.generationConfig` description to mention `reasoning`
     (the field was undocumented before this PR even though it
     already existed as a typed field) and link to the new section.

96 anthropic + deepseek tests pass; lint + typecheck clean.

* refactor(core): single-source effort normalization for anthropic + doc fix

Address PR review round 2 (copilot × 2):

1. (J8aG) The `contentGenerator.ts` comment claimed passing
   `reasoning.effort: 'max'` to real Anthropic was "up to the user",
   but commit b5b05ae actively clamps 'max' → 'high' (with a debug
   log) on non-DeepSeek anthropic backends. Update the comment to
   describe current runtime behavior.

2. (J8aL) The clamp ran inside `buildOutputConfig()` only — the effort
   label was downgraded but `buildThinkingConfig()` still used the
   raw user value to size the budget, so a non-DeepSeek anthropic
   request could end up with `output_config.effort: 'high'` paired
   with a 'max'-sized 128K thinking budget. Inconsistent label vs.
   budget on the wire.

   Refactor: hoist the normalization into a single
   `resolveEffectiveEffort()` helper that runs once per request in
   `buildRequest()`. Both `buildThinkingConfig` and `buildOutputConfig`
   now consume the same clamped value, so the budget ladder and the
   effort label stay aligned. The debug log fires once per request.

Add a regression test asserting that on a non-DeepSeek anthropic
provider with `effort: 'max'` configured, the wire request carries
both `output_config.effort: 'high'` AND `thinking.budget_tokens:
64_000` (the 'high' tier), not the 128K 'max' budget.

96 tests pass; lint + typecheck clean.

* fix(core): tighten 'max' clamp + warn-once + strip reasoning_effort on side queries

Address PR review round 3 (copilot × 3):

1. (J-2v) When request.config.thinkingConfig.includeThoughts is false,
   pipeline.buildRequest's post-processing only deleted the nested
   `reasoning` key. The DeepSeek provider's translateReasoningEffort
   may have already flattened an extra_body-injected reasoning into
   top-level `reasoning_effort` by that point, so a side query (e.g.
   suggestionGenerator) could still ship reasoning_effort on the wire.
   Extend the post-processing to also delete `reasoning_effort`.

2. (J-2z) The warn for clamping 'max' on non-DeepSeek anthropic ran on
   every request needing the downgrade — the docstring claimed "first
   time only" but the implementation didn't latch. Add a private
   `effortClampWarned` boolean on the generator so the warning fires
   once per generator lifetime.

3. (J-23) `resolveEffectiveEffort` used the broad
   `isDeepSeekAnthropicProvider` detector for the clamp decision, but
   that helper falls back to model-name matching to cover sglang/vllm
   self-hosted DeepSeek deployments. A model configured as e.g.
   "deepseek-distill" but routed to real api.anthropic.com would
   bypass the clamp and trigger HTTP 400. Split the detector: keep
   `isDeepSeekAnthropicProvider` (broad) for the thinking-block
   injection workaround where false-positives are harmless, and add
   `isDeepSeekAnthropicHostname` (hostname-only) for decisions where
   a model-name false-positive would route DeepSeek-only behavior to
   a stricter backend. The clamp now uses the hostname-only check.

New regression test: a config with model name containing "deepseek"
but baseURL pointing at api.anthropic.com still clamps `'max'` to
`'high'`. Existing "passes max through" test updated to set a
DeepSeek baseURL since model name alone no longer suffices for the
clamp bypass.

385 tests pass; lint + typecheck clean.

* docs(core): correct pipeline timing comment + samplingParams caveat

Address PR review round 4 (copilot × 3) — three documentation accuracy
fixes, no behavior change:

1. (KBcw) The post-processing comment in pipeline.ts misdescribed the
   call order ("after this branch already ran during the same
   buildRequest pass") — provider.buildRequest actually runs BEFORE
   the includeThoughts=false post-processing in the same pass.
   Reword to match the actual order: provider hook flattens nested
   reasoning to reasoning_effort first, this cleanup runs after and
   strips both shapes.

2. (KBdC, KBdE) The "Reasoning / thinking configuration" section in
   model-providers.md and the model.generationConfig description in
   settings.md both implied `reasoning` is honored on every provider.
   For OpenAI-compatible providers, when `generationConfig.samplingParams`
   is set, `ContentGenerationPipeline.buildGenerateContentConfig()`
   ships samplingParams verbatim and skips the separate `reasoning`
   injection entirely. Configs like
   `{ samplingParams: { temperature: 0.5 }, reasoning: { effort: 'max' } }`
   would silently drop the reasoning field on OpenAI/DeepSeek
   requests.

   Add an explicit "Interaction with samplingParams" warning section
   in model-providers.md and a parenthetical note in settings.md
   directing users to put `reasoning_effort` inside `samplingParams`
   (or `extra_body`) when both are configured.

385 tests pass; lint + typecheck clean.

* docs(core): clarify explicit budget_tokens bypasses 'max' effort clamp

When user sets `{ effort: 'max', budget_tokens: N }` on a non-DeepSeek
anthropic backend, the effort label gets clamped to 'high' (otherwise
the server 400s on the unknown enum) but the explicit budget_tokens is
preserved verbatim. The wire-shape mismatch is intentional, not a bug:
the clamp only protects the enum field, while budget is a free integer
the server accepts within the context window, so an explicit override
stays explicit. Document the contract on the early-return and add a
regression test that locks it in.

* docs(deepseek): fix comments to match flatten + reasoning-strip behavior

Two doc-only nits called out in review:

1. `buildRequest` JSDoc said non-text parts are "rejected", but
   `flattenContentParts` actually substitutes a textual placeholder
   (`[Unsupported content type: <type>]`) so the request still goes
   through with a breadcrumb. Reword the JSDoc accordingly.

2. `translateReasoningEffort`'s strip comment claimed it strips the
   nested form to avoid shipping both shapes, but it only drops the
   duplicated `effort` key when other keys (e.g. `budget_tokens`) are
   present. Reword to describe the actual selective behavior and why
   keeping orthogonal keys is intentional.

Behavior unchanged.

* fix(deepseek): gate reasoning_effort translation on actual DeepSeek hostname

The provider class is selected via the broader `isDeepSeekProvider`
check, which falls back to model-name matching to cover self-hosted
DeepSeek deployments (sglang/vllm/ollama, see #3613). That fallback is
the right call for content-part flattening — it's a model-format
constraint baked into the model itself, not the API surface.

But the same broad detection was also gating
`translateReasoningEffort`, which rewrites the standard
`reasoning: { effort }` config into DeepSeek's flat `reasoning_effort`
body parameter. That's a wire-shape decision, not a model-format one:
strict OpenAI-compat backends in self-hosted setups may not accept the
DeepSeek extension and would have happily handled the original shape.

Split the two decisions: keep `isDeepSeekProvider` (broad) for
flattening, add a hostname-only `isDeepSeekHostname` and gate the body
rewrite on it. Self-hosted DeepSeek users who actually want the
translation can either use a baseUrl containing api.deepseek.com or
inject `reasoning_effort` directly via `samplingParams`/`extra_body`.

Regression tests:
  - self-hosted (sglang) with deepseek-named model + nested
    `reasoning.effort` → flattening still runs, body shape preserved
  - `isDeepSeekHostname` matches api.deepseek.com but not custom hosts

* fix(deepseek): use URL parsing in isDeepSeekHostname; fix log-level docs

CodeQL flagged a high-severity URL substring sanitization issue on the
new `isDeepSeekHostname` helper. The naive
`baseUrl.includes('api.deepseek.com')` check would false-positive on
hostile hosts like `https://api.deepseek.com.evil.com/v1` and
incorrectly inject the DeepSeek-only `reasoning_effort` body parameter
into requests routed elsewhere. Switch to `new URL(...).hostname` with
exact match against `api.deepseek.com` (and `.api.deepseek.com`
subdomains), mirroring `isDeepSeekAnthropicHostname` on the Anthropic
side. Invalid URLs treated as non-DeepSeek.

`isDeepSeekProvider` already routes through `isDeepSeekHostname`, so
the hardening applies to both decision paths.

Regression tests cover:
  - subdomain match (us.api.deepseek.com)
  - hostile substrings (api.deepseek.com.evil.com,
    evil.com/api.deepseek.com/v1, api.deepseek.comevil.com,
    api-deepseek-com.example.com)
  - invalid / empty baseUrl

Also fix two doc-level mismatches: the `'max'` clamp on Anthropic logs
via `debugLogger.warn` (warning level, once per generator), not "with
a debug log". Update both `ContentGeneratorConfig.reasoning` JSDoc and
the per-provider behavior table in model-providers.md.

* feat(deepseek): emit thinking:disabled signal when reasoning is off

DeepSeek V4+ defaults `thinking.type` to `'enabled'`, so just stripping
`reasoning_effort` from the request leaves the server happily thinking
on side queries — paying full thinking latency/cost without an effort
configured. Per yiliang114's review, emit the explicit
`thinking: { type: 'disabled' }` field on the wire whenever reasoning
is disabled.

Triggered when either:
  - `request.config.thinkingConfig.includeThoughts === false` (forked
    queries, e.g. suggestion generation)
  - `contentGeneratorConfig.reasoning === false` (config-level opt-out)

The previous post-processing block only fired on the per-request opt-out
path, so the config-level case was already leaking. Unify both under a
single `reasoningDisabled` predicate that runs the same strip + signal
logic.

Hostname-gated to `api.deepseek.com` (and subdomains): self-hosted
DeepSeek behind sglang/vllm/ollama, or older DeepSeek versions, may
not accept the V4 thinking parameter — pushing it there could trip an
unknown-key 400. Mirrors the round-7 decision to gate
`reasoning_effort` translation on hostname.

Regression tests cover all four matrix points:
  - DeepSeek hostname + includeThoughts false → emits disabled
  - DeepSeek hostname + reasoning false → emits disabled
  - non-DeepSeek hostname + includeThoughts false → does not emit
  - self-hosted DeepSeek (model-name fallback only) → does not emit

Docs: extend the `reasoning: false` section with the new behavior and
the self-hosted/non-DeepSeek caveat.

* refactor(deepseek): expose isDeepSeek* as free functions; clarify docs

Two doc/coupling nits from review:

1. The pipeline post-processing block was importing the concrete
   `DeepSeekOpenAICompatibleProvider` class just to reach
   `isDeepSeekHostname`. That couples the generic OpenAI pipeline to a
   specific provider implementation. Promote the helper (and its broad
   `isDeepSeekProvider` sibling) to free `export function`s in
   `provider/deepseek.ts` and import them by name. The class keeps thin
   static delegates for backward compat with existing callers and tests.

2. The per-provider behavior table on `model-providers.md` said
   `'low'/'medium' → 'high'` and `'xhigh' → 'max'` "client-side", but
   that normalization only fires inside `translateReasoningEffort`,
   which runs on the nested `reasoning.effort` config path. Explicit
   top-level overrides via `samplingParams.reasoning_effort` or
   `extra_body.reasoning_effort` skip the rewrite and ship verbatim.
   Reword the row to reflect that.

Behavior unchanged.
2026-05-04 22:42:23 +08:00
..
_meta.ts feat: update docs 2025-12-15 22:12:34 +08:00
auth.md fix(cli): add API Key option to qwen auth interactive menu (#3624) 2026-04-27 22:01:47 +08:00
model-providers.md feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800) 2026-05-04 22:42:23 +08:00
qwen-ignore.md Merge pull request #1266 from QwenLM/docs-fix 2025-12-17 22:04:27 +08:00
settings.md feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800) 2026-05-04 22:42:23 +08:00
themes.md Merge pull request #1266 from QwenLM/docs-fix 2025-12-17 22:04:27 +08:00
trusted-folders.md docs: updated all links, click and open in vscode, new showcase video in overview 2025-12-17 11:10:31 +08:00