qwen-code

mirror of https://github.com/QwenLM/qwen-code.git synced 2026-05-11 04:50:11 +00:00

History

Shaojin Wen efb7351d58 feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800 ) * feat(core): support reasoning effort 'max' tier (DeepSeek extension) DeepSeek's chat-completions endpoint added an extra-strong `max` tier to `reasoning_effort` (per https://api-docs.deepseek.com/zh-cn/api/create-chat-completion ; valid values are now `high` and `max`, with `low`/`medium` mapping to `high` for backward compat). Plumb it end-to-end: - `ContentGeneratorConfig.reasoning.effort` union now includes 'max'. - DeepSeek OpenAI-compat provider: translate the standard nested `reasoning: { effort }` shape into DeepSeek's flat `reasoning_effort` body parameter so user-configured effort actually takes effect (the nested shape was previously sent verbatim and silently ignored, defaulting to `high`). low/medium → high mirrors the documented server-side behavior so dashboards / logs match wire reality. An explicit top-level `reasoning_effort` (via samplingParams or extra_body) wins over the nested form. - Anthropic converter: pass 'max' through to `output_config.effort` unchanged and bump the `thinking.budget_tokens` budget for the new tier (low 16k / medium 32k / high 64k / max 128k). - Gemini converter: clamp 'max' to HIGH since Gemini has no higher thinking level. Without this, 'max' would silently fall through to THINKING_LEVEL_UNSPECIFIED. Live verification against api.deepseek.com: - `reasoning_effort: high` → 200 - `reasoning_effort: max` → 200 (the new tier) - `reasoning_effort: bogus`→ 400 with valid-set list confirming [high, low, medium, max, xhigh] 108 anthropic/openai-deepseek/gemini tests pass; full core suite (6601 tests) green; lint + typecheck clean. * fix(core): map xhigh→max + clamp max on non-DeepSeek anthropic + docs Address PR review (copilot × 2) and add missing user docs: 1. (J698) `translateReasoningEffort` claimed in the PR description that it surfaces the DeepSeek backward-compat mapping client-side, but only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match the doc and stay symmetric with the low/medium branch. 2. (J6-A) `output_config.effort: 'max'` would have been emitted on any anthropic-protocol provider whenever a user configured it, even when the baseURL points at real `api.anthropic.com` (which only accepts low/medium/high and would 400). Reuse the existing `isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'` on non-DeepSeek anthropic backends, with a debugLogger.warn so the downgrade is visible. DeepSeek anthropic-compatible endpoints still pass through unchanged. 3. New docs: - `docs/users/configuration/model-providers.md`: a "Reasoning / thinking configuration" section under generationConfig — single example targeting DeepSeek + a per-provider behavior table (OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for other servers, real Anthropic clamp, Anthropic-compatible DeepSeek passthrough, Gemini thinkingLevel mapping). - `docs/users/configuration/settings.md`: extend the `model.generationConfig` description to mention `reasoning` (the field was undocumented before this PR even though it already existed as a typed field) and link to the new section. 96 anthropic + deepseek tests pass; lint + typecheck clean. * refactor(core): single-source effort normalization for anthropic + doc fix Address PR review round 2 (copilot × 2): 1. (J8aG) The `contentGenerator.ts` comment claimed passing `reasoning.effort: 'max'` to real Anthropic was "up to the user", but commit `b5b05ae` actively clamps 'max' → 'high' (with a debug log) on non-DeepSeek anthropic backends. Update the comment to describe current runtime behavior. 2. (J8aL) The clamp ran inside `buildOutputConfig()` only — the effort label was downgraded but `buildThinkingConfig()` still used the raw user value to size the budget, so a non-DeepSeek anthropic request could end up with `output_config.effort: 'high'` paired with a 'max'-sized 128K thinking budget. Inconsistent label vs. budget on the wire. Refactor: hoist the normalization into a single `resolveEffectiveEffort()` helper that runs once per request in `buildRequest()`. Both `buildThinkingConfig` and `buildOutputConfig` now consume the same clamped value, so the budget ladder and the effort label stay aligned. The debug log fires once per request. Add a regression test asserting that on a non-DeepSeek anthropic provider with `effort: 'max'` configured, the wire request carries both `output_config.effort: 'high'` AND `thinking.budget_tokens: 64_000` (the 'high' tier), not the 128K 'max' budget. 96 tests pass; lint + typecheck clean. * fix(core): tighten 'max' clamp + warn-once + strip reasoning_effort on side queries Address PR review round 3 (copilot × 3): 1. (J-2v) When request.config.thinkingConfig.includeThoughts is false, pipeline.buildRequest's post-processing only deleted the nested `reasoning` key. The DeepSeek provider's translateReasoningEffort may have already flattened an extra_body-injected reasoning into top-level `reasoning_effort` by that point, so a side query (e.g. suggestionGenerator) could still ship reasoning_effort on the wire. Extend the post-processing to also delete `reasoning_effort`. 2. (J-2z) The warn for clamping 'max' on non-DeepSeek anthropic ran on every request needing the downgrade — the docstring claimed "first time only" but the implementation didn't latch. Add a private `effortClampWarned` boolean on the generator so the warning fires once per generator lifetime. 3. (J-23) `resolveEffectiveEffort` used the broad `isDeepSeekAnthropicProvider` detector for the clamp decision, but that helper falls back to model-name matching to cover sglang/vllm self-hosted DeepSeek deployments. A model configured as e.g. "deepseek-distill" but routed to real api.anthropic.com would bypass the clamp and trigger HTTP 400. Split the detector: keep `isDeepSeekAnthropicProvider` (broad) for the thinking-block injection workaround where false-positives are harmless, and add `isDeepSeekAnthropicHostname` (hostname-only) for decisions where a model-name false-positive would route DeepSeek-only behavior to a stricter backend. The clamp now uses the hostname-only check. New regression test: a config with model name containing "deepseek" but baseURL pointing at api.anthropic.com still clamps `'max'` to `'high'`. Existing "passes max through" test updated to set a DeepSeek baseURL since model name alone no longer suffices for the clamp bypass. 385 tests pass; lint + typecheck clean. * docs(core): correct pipeline timing comment + samplingParams caveat Address PR review round 4 (copilot × 3) — three documentation accuracy fixes, no behavior change: 1. (KBcw) The post-processing comment in pipeline.ts misdescribed the call order ("after this branch already ran during the same buildRequest pass") — provider.buildRequest actually runs BEFORE the includeThoughts=false post-processing in the same pass. Reword to match the actual order: provider hook flattens nested reasoning to reasoning_effort first, this cleanup runs after and strips both shapes. 2. (KBdC, KBdE) The "Reasoning / thinking configuration" section in model-providers.md and the model.generationConfig description in settings.md both implied `reasoning` is honored on every provider. For OpenAI-compatible providers, when `generationConfig.samplingParams` is set, `ContentGenerationPipeline.buildGenerateContentConfig()` ships samplingParams verbatim and skips the separate `reasoning` injection entirely. Configs like `{ samplingParams: { temperature: 0.5 }, reasoning: { effort: 'max' } }` would silently drop the reasoning field on OpenAI/DeepSeek requests. Add an explicit "Interaction with samplingParams" warning section in model-providers.md and a parenthetical note in settings.md directing users to put `reasoning_effort` inside `samplingParams` (or `extra_body`) when both are configured. 385 tests pass; lint + typecheck clean. * docs(core): clarify explicit budget_tokens bypasses 'max' effort clamp When user sets `{ effort: 'max', budget_tokens: N }` on a non-DeepSeek anthropic backend, the effort label gets clamped to 'high' (otherwise the server 400s on the unknown enum) but the explicit budget_tokens is preserved verbatim. The wire-shape mismatch is intentional, not a bug: the clamp only protects the enum field, while budget is a free integer the server accepts within the context window, so an explicit override stays explicit. Document the contract on the early-return and add a regression test that locks it in. * docs(deepseek): fix comments to match flatten + reasoning-strip behavior Two doc-only nits called out in review: 1. `buildRequest` JSDoc said non-text parts are "rejected", but `flattenContentParts` actually substitutes a textual placeholder (`[Unsupported content type: <type>]`) so the request still goes through with a breadcrumb. Reword the JSDoc accordingly. 2. `translateReasoningEffort`'s strip comment claimed it strips the nested form to avoid shipping both shapes, but it only drops the duplicated `effort` key when other keys (e.g. `budget_tokens`) are present. Reword to describe the actual selective behavior and why keeping orthogonal keys is intentional. Behavior unchanged. * fix(deepseek): gate reasoning_effort translation on actual DeepSeek hostname The provider class is selected via the broader `isDeepSeekProvider` check, which falls back to model-name matching to cover self-hosted DeepSeek deployments (sglang/vllm/ollama, see #3613). That fallback is the right call for content-part flattening — it's a model-format constraint baked into the model itself, not the API surface. But the same broad detection was also gating `translateReasoningEffort`, which rewrites the standard `reasoning: { effort }` config into DeepSeek's flat `reasoning_effort` body parameter. That's a wire-shape decision, not a model-format one: strict OpenAI-compat backends in self-hosted setups may not accept the DeepSeek extension and would have happily handled the original shape. Split the two decisions: keep `isDeepSeekProvider` (broad) for flattening, add a hostname-only `isDeepSeekHostname` and gate the body rewrite on it. Self-hosted DeepSeek users who actually want the translation can either use a baseUrl containing api.deepseek.com or inject `reasoning_effort` directly via `samplingParams`/`extra_body`. Regression tests: - self-hosted (sglang) with deepseek-named model + nested `reasoning.effort` → flattening still runs, body shape preserved - `isDeepSeekHostname` matches api.deepseek.com but not custom hosts * fix(deepseek): use URL parsing in isDeepSeekHostname; fix log-level docs CodeQL flagged a high-severity URL substring sanitization issue on the new `isDeepSeekHostname` helper. The naive `baseUrl.includes('api.deepseek.com')` check would false-positive on hostile hosts like `https://api.deepseek.com.evil.com/v1` and incorrectly inject the DeepSeek-only `reasoning_effort` body parameter into requests routed elsewhere. Switch to `new URL(...).hostname` with exact match against `api.deepseek.com` (and `.api.deepseek.com` subdomains), mirroring `isDeepSeekAnthropicHostname` on the Anthropic side. Invalid URLs treated as non-DeepSeek. `isDeepSeekProvider` already routes through `isDeepSeekHostname`, so the hardening applies to both decision paths. Regression tests cover: - subdomain match (us.api.deepseek.com) - hostile substrings (api.deepseek.com.evil.com, evil.com/api.deepseek.com/v1, api.deepseek.comevil.com, api-deepseek-com.example.com) - invalid / empty baseUrl Also fix two doc-level mismatches: the `'max'` clamp on Anthropic logs via `debugLogger.warn` (warning level, once per generator), not "with a debug log". Update both `ContentGeneratorConfig.reasoning` JSDoc and the per-provider behavior table in model-providers.md. * feat(deepseek): emit thinking:disabled signal when reasoning is off DeepSeek V4+ defaults `thinking.type` to `'enabled'`, so just stripping `reasoning_effort` from the request leaves the server happily thinking on side queries — paying full thinking latency/cost without an effort configured. Per yiliang114's review, emit the explicit `thinking: { type: 'disabled' }` field on the wire whenever reasoning is disabled. Triggered when either: - `request.config.thinkingConfig.includeThoughts === false` (forked queries, e.g. suggestion generation) - `contentGeneratorConfig.reasoning === false` (config-level opt-out) The previous post-processing block only fired on the per-request opt-out path, so the config-level case was already leaking. Unify both under a single `reasoningDisabled` predicate that runs the same strip + signal logic. Hostname-gated to `api.deepseek.com` (and subdomains): self-hosted DeepSeek behind sglang/vllm/ollama, or older DeepSeek versions, may not accept the V4 thinking parameter — pushing it there could trip an unknown-key 400. Mirrors the round-7 decision to gate `reasoning_effort` translation on hostname. Regression tests cover all four matrix points: - DeepSeek hostname + includeThoughts false → emits disabled - DeepSeek hostname + reasoning false → emits disabled - non-DeepSeek hostname + includeThoughts false → does not emit - self-hosted DeepSeek (model-name fallback only) → does not emit Docs: extend the `reasoning: false` section with the new behavior and the self-hosted/non-DeepSeek caveat. * refactor(deepseek): expose isDeepSeek* as free functions; clarify docs Two doc/coupling nits from review: 1. The pipeline post-processing block was importing the concrete `DeepSeekOpenAICompatibleProvider` class just to reach `isDeepSeekHostname`. That couples the generic OpenAI pipeline to a specific provider implementation. Promote the helper (and its broad `isDeepSeekProvider` sibling) to free `export function`s in `provider/deepseek.ts` and import them by name. The class keeps thin static delegates for backward compat with existing callers and tests. 2. The per-provider behavior table on `model-providers.md` said `'low'/'medium' → 'high'` and `'xhigh' → 'max'` "client-side", but that normalization only fires inside `translateReasoningEffort`, which runs on the nested `reasoning.effort` config path. Explicit top-level overrides via `samplingParams.reasoning_effort` or `extra_body.reasoning_effort` skip the rewrite and ship verbatim. Reword the row to reflect that. Behavior unchanged.		2026-05-04 22:42:23 +08:00
..
_meta.ts	feat: update docs	2025-12-15 22:12:34 +08:00
auth.md	fix(cli): add API Key option to `qwen auth` interactive menu (#3624 )	2026-04-27 22:01:47 +08:00
model-providers.md	feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800 )	2026-05-04 22:42:23 +08:00
qwen-ignore.md	Merge pull request #1266 from QwenLM/docs-fix	2025-12-17 22:04:27 +08:00
settings.md	feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800 )	2026-05-04 22:42:23 +08:00
themes.md	Merge pull request #1266 from QwenLM/docs-fix	2025-12-17 22:04:27 +08:00
trusted-folders.md	docs: updated all links, click and open in vscode, new showcase video in overview	2025-12-17 11:10:31 +08:00