From b5b05ae2194ba195f88e08ceebc9e33536f69cf0 Mon Sep 17 00:00:00 2001 From: wenshao Date: Sun, 3 May 2026 09:17:19 +0800 Subject: [PATCH] =?UTF-8?q?fix(core):=20map=20xhigh=E2=86=92max=20+=20clam?= =?UTF-8?q?p=20max=20on=20non-DeepSeek=20anthropic=20+=20docs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Address PR review (copilot × 2) and add missing user docs: 1. (J698) `translateReasoningEffort` claimed in the PR description that it surfaces the DeepSeek backward-compat mapping client-side, but only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match the doc and stay symmetric with the low/medium branch. 2. (J6-A) `output_config.effort: 'max'` would have been emitted on any anthropic-protocol provider whenever a user configured it, even when the baseURL points at real `api.anthropic.com` (which only accepts low/medium/high and would 400). Reuse the existing `isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'` on non-DeepSeek anthropic backends, with a debugLogger.warn so the downgrade is visible. DeepSeek anthropic-compatible endpoints still pass through unchanged. 3. New docs: - `docs/users/configuration/model-providers.md`: a "Reasoning / thinking configuration" section under generationConfig — single example targeting DeepSeek + a per-provider behavior table (OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for other servers, real Anthropic clamp, Anthropic-compatible DeepSeek passthrough, Gemini thinkingLevel mapping). - `docs/users/configuration/settings.md`: extend the `model.generationConfig` description to mention `reasoning` (the field was undocumented before this PR even though it already existed as a typed field) and link to the new section. 96 anthropic + deepseek tests pass; lint + typecheck clean. --- docs/users/configuration/model-providers.md | 51 +++++++++++++++++++ docs/users/configuration/settings.md | 22 ++++---- .../anthropicContentGenerator.test.ts | 37 ++++++++++++++ .../anthropicContentGenerator.ts | 18 ++++++- .../provider/deepseek.test.ts | 13 +++++ .../provider/deepseek.ts | 12 +++-- 6 files changed, 137 insertions(+), 16 deletions(-) diff --git a/docs/users/configuration/model-providers.md b/docs/users/configuration/model-providers.md index 83a66e8de..68219b02e 100644 --- a/docs/users/configuration/model-providers.md +++ b/docs/users/configuration/model-providers.md @@ -481,6 +481,57 @@ When using a raw model via `--model gpt-4` (not from modelProviders, creates a R The merge strategy for `modelProviders` itself is REPLACE: the entire `modelProviders` from project settings will override the corresponding section in user settings, rather than merging the two. +## Reasoning / thinking configuration + +The optional `reasoning` field under `generationConfig` controls how aggressively the model reasons before responding. It is plumbed through every supported provider — set it once and the converter for each protocol does the right thing. + +```jsonc +{ + "modelProviders": { + "openai": [ + { + "id": "deepseek-v4-pro", + "name": "DeepSeek V4 Pro", + "baseUrl": "https://api.deepseek.com/v1", + "envKey": "DEEPSEEK_API_KEY", + "generationConfig": { + // The four-tier scale: + // 'low' | 'medium' — server-mapped to 'high' on DeepSeek + // 'high' — default reasoning intensity + // 'max' — DeepSeek-specific extra-strong tier + // Or set `false` to disable reasoning entirely. + "reasoning": { "effort": "max" }, + }, + }, + ], + }, +} +``` + +### Per-provider behavior + +| Protocol / provider | Wire shape | Notes | +| -------------------------------------------- | -------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **OpenAI / DeepSeek** (`api.deepseek.com`) | Flat `reasoning_effort: ` body parameter | `'low'`/`'medium'` map to `'high'` and `'xhigh'` to `'max'` client-side, mirroring DeepSeek's [server-side back-compat](https://api-docs.deepseek.com/zh-cn/api/create-chat-completion). Top-level overrides via `samplingParams.reasoning_effort` or `extra_body.reasoning_effort` win. | +| **OpenAI** (other compatible servers) | `reasoning: { effort, ... }` passed through verbatim | Set via `samplingParams` (e.g. `samplingParams.reasoning_effort` for GPT-5/o-series) when the provider expects a different shape. | +| **Anthropic** (real `api.anthropic.com`) | `output_config: { effort }` plus the `effort-2025-11-24` beta header | Real Anthropic accepts `'low'`/`'medium'`/`'high'` only. `'max'` is **clamped to `'high'`** with a debug log; if you want max effort, switch the baseURL to a DeepSeek-compatible endpoint that supports it. | +| **Anthropic** (`api.deepseek.com/anthropic`) | Same `output_config: { effort }` + beta header | `'max'` is passed through unchanged. | +| **Gemini** (`@google/genai`) | `thinkingConfig: { includeThoughts: true, thinkingLevel }` | `'low'` → `LOW`, `'high'`/`'max'` → `HIGH`, others → `THINKING_LEVEL_UNSPECIFIED` (Gemini has no `MAX` tier). | + +### `reasoning: false` + +Setting `reasoning: false` (the literal boolean) explicitly disables thinking on every provider — useful for cheap side queries that don't benefit from reasoning. This is honored at the request level too via `request.config.thinkingConfig.includeThoughts: false` for one-off calls (e.g. suggestion generation). + +### `budget_tokens` + +You can pin an exact thinking-token budget by including `budget_tokens` alongside `effort`: + +```jsonc +"reasoning": { "effort": "high", "budget_tokens": 50000 } +``` + +For Anthropic this becomes `thinking.budget_tokens`. For OpenAI/DeepSeek the field is preserved but currently ignored by the server — `reasoning_effort` is the load-bearing knob. + ## Provider Models vs Runtime Models Qwen Code distinguishes between two types of model configurations: diff --git a/docs/users/configuration/settings.md b/docs/users/configuration/settings.md index e7b23de5d..8e9e35be4 100644 --- a/docs/users/configuration/settings.md +++ b/docs/users/configuration/settings.md @@ -140,17 +140,17 @@ Settings are organized into categories. Most settings should be placed within th #### model -| Setting | Type | Description | Default | -| -------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------- | -| `model.name` | string | The Qwen model to use for conversations. | `undefined` | -| `model.maxSessionTurns` | number | Maximum number of user/model/tool turns to keep in a session. -1 means unlimited. | `-1` | -| `model.generationConfig` | object | Advanced overrides passed to the underlying content generator. Supports request controls such as `timeout`, `maxRetries`, `enableCacheControl`, `splitToolMedia` (set `true` for strict OpenAI-compatible servers like LM Studio that reject non-text content on `role: "tool"` messages — splits media into a follow-up user message), `contextWindowSize` (override model's context window size), `modalities` (override auto-detected input modalities), `customHeaders` (custom HTTP headers for API requests), and `extra_body` (additional body parameters for OpenAI-compatible API requests only), along with fine-tuning knobs under `samplingParams` (for example `temperature`, `top_p`, `max_tokens`). Leave unset to rely on provider defaults. | `undefined` | -| `model.chatCompression.contextPercentageThreshold` | number | Sets the threshold for chat history compression as a percentage of the model's total token limit. This is a value between 0 and 1 that applies to both automatic compression and the manual `/compress` command. For example, a value of `0.6` will trigger compression when the chat history exceeds 60% of the token limit. Use `0` to disable compression entirely. | `0.7` | -| `model.skipNextSpeakerCheck` | boolean | Skip the next speaker check. | `false` | -| `model.skipLoopDetection` | boolean | Disables loop detection checks. Loop detection prevents infinite loops in AI responses but can generate false positives that interrupt legitimate workflows. Enable this option if you experience frequent false positive loop detection interruptions. | `false` | -| `model.skipStartupContext` | boolean | Skips sending the startup workspace context (environment summary and acknowledgement) at the beginning of each session. Enable this if you prefer to provide context manually or want to save tokens on startup. | `false` | -| `model.enableOpenAILogging` | boolean | Enables logging of OpenAI API calls for debugging and analysis. When enabled, API requests and responses are logged to JSON files. | `false` | -| `model.openAILoggingDir` | string | Custom directory path for OpenAI API logs. If not specified, defaults to `logs/openai` in the current working directory. Supports absolute paths, relative paths (resolved from current working directory), and `~` expansion (home directory). | `undefined` | +| Setting | Type | Description | Default | +| -------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | +| `model.name` | string | The Qwen model to use for conversations. | `undefined` | +| `model.maxSessionTurns` | number | Maximum number of user/model/tool turns to keep in a session. -1 means unlimited. | `-1` | +| `model.generationConfig` | object | Advanced overrides passed to the underlying content generator. Supports request controls such as `timeout`, `maxRetries`, `enableCacheControl`, `splitToolMedia` (set `true` for strict OpenAI-compatible servers like LM Studio that reject non-text content on `role: "tool"` messages — splits media into a follow-up user message), `contextWindowSize` (override model's context window size), `modalities` (override auto-detected input modalities), `customHeaders` (custom HTTP headers for API requests), `extra_body` (additional body parameters for OpenAI-compatible API requests only), and `reasoning` (`{ effort: 'low' \| 'medium' \| 'high' \| 'max', budget_tokens?: number }` to control thinking intensity, or `false` to disable; `'max'` is a DeepSeek extension — see [Reasoning / thinking configuration](./model-providers.md#reasoning--thinking-configuration) for per-provider behavior), along with fine-tuning knobs under `samplingParams` (for example `temperature`, `top_p`, `max_tokens`). Leave unset to rely on provider defaults. | `undefined` | +| `model.chatCompression.contextPercentageThreshold` | number | Sets the threshold for chat history compression as a percentage of the model's total token limit. This is a value between 0 and 1 that applies to both automatic compression and the manual `/compress` command. For example, a value of `0.6` will trigger compression when the chat history exceeds 60% of the token limit. Use `0` to disable compression entirely. | `0.7` | +| `model.skipNextSpeakerCheck` | boolean | Skip the next speaker check. | `false` | +| `model.skipLoopDetection` | boolean | Disables loop detection checks. Loop detection prevents infinite loops in AI responses but can generate false positives that interrupt legitimate workflows. Enable this option if you experience frequent false positive loop detection interruptions. | `false` | +| `model.skipStartupContext` | boolean | Skips sending the startup workspace context (environment summary and acknowledgement) at the beginning of each session. Enable this if you prefer to provide context manually or want to save tokens on startup. | `false` | +| `model.enableOpenAILogging` | boolean | Enables logging of OpenAI API calls for debugging and analysis. When enabled, API requests and responses are logged to JSON files. | `false` | +| `model.openAILoggingDir` | string | Custom directory path for OpenAI API logs. If not specified, defaults to `logs/openai` in the current working directory. Supports absolute paths, relative paths (resolved from current working directory), and `~` expansion (home directory). | `undefined` | **Example model.generationConfig:** diff --git a/packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.test.ts b/packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.test.ts index 77ad7ef6b..6150dd2c3 100644 --- a/packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.test.ts +++ b/packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.test.ts @@ -523,6 +523,43 @@ describe('AnthropicContentGenerator', () => { ); }); + it("clamps effort: 'max' to 'high' on a non-DeepSeek anthropic provider", async () => { + // 'max' is a DeepSeek extension; real Anthropic only accepts + // low/medium/high. Clamp so a config targeting DeepSeek doesn't 400 + // when reused against a stricter Anthropic backend. + const { AnthropicContentGenerator } = await importGenerator(); + anthropicState.createImpl.mockResolvedValue({ + id: 'anthropic-1', + model: 'claude-test', + content: [{ type: 'text', text: 'hi' }], + }); + + const generator = new AnthropicContentGenerator( + { + model: 'claude-test', + apiKey: 'test-key', + baseUrl: 'https://api.anthropic.com', + timeout: 10_000, + maxRetries: 2, + samplingParams: { max_tokens: 500 }, + schemaCompliance: 'auto', + reasoning: { effort: 'max' }, + }, + mockConfig, + ); + + await generator.generateContent({ + model: 'models/ignored', + contents: 'Hello', + } as unknown as GenerateContentParameters); + + const [anthropicRequest] = + anthropicState.lastCreateArgs as AnthropicCreateArgs; + expect(anthropicRequest).toEqual( + expect.objectContaining({ output_config: { effort: 'high' } }), + ); + }); + it('omits thinking when request.config.thinkingConfig.includeThoughts is false', async () => { const { AnthropicContentGenerator } = await importGenerator(); anthropicState.createImpl.mockResolvedValue({ diff --git a/packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts b/packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts index 4af331f83..a79ad5317 100644 --- a/packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts +++ b/packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts @@ -429,7 +429,23 @@ export class AnthropicContentGenerator implements ContentGenerator { return undefined; } - return { effort: reasoning.effort }; + // 'max' is a DeepSeek-specific extension; real Anthropic accepts only + // low/medium/high. Clamp on non-DeepSeek anthropic-compatible providers + // so configurations targeting DeepSeek don't 400 when the user later + // switches the same auth profile to a stricter Anthropic backend. + let effort = reasoning.effort; + if ( + effort === 'max' && + !isDeepSeekAnthropicProvider(this.contentGeneratorConfig) + ) { + debugLogger.warn( + "reasoning.effort='max' is a DeepSeek extension; clamping to 'high' " + + 'for non-DeepSeek anthropic provider to avoid HTTP 400.', + ); + effort = 'high'; + } + + return { effort }; } private async *processStream( diff --git a/packages/core/src/core/openaiContentGenerator/provider/deepseek.test.ts b/packages/core/src/core/openaiContentGenerator/provider/deepseek.test.ts index 97da87c83..b0f807518 100644 --- a/packages/core/src/core/openaiContentGenerator/provider/deepseek.test.ts +++ b/packages/core/src/core/openaiContentGenerator/provider/deepseek.test.ts @@ -294,6 +294,19 @@ describe('DeepSeekOpenAICompatibleProvider', () => { expect(r['reasoning']).toBeUndefined(); }); + it("maps backward-compat 'xhigh' effort to 'max' (DeepSeek doc behavior)", () => { + const originalRequest = { + model: 'deepseek-v4-pro', + messages: [{ role: 'user', content: 'hi' }], + reasoning: { effort: 'xhigh' }, + } as unknown as OpenAI.Chat.ChatCompletionCreateParams; + + const result = provider.buildRequest(originalRequest, userPromptId); + const r = result as unknown as Record; + + expect(r['reasoning_effort']).toBe('max'); + }); + it('maps backward-compat `low`/`medium` effort to `high` (DeepSeek doc behavior)', () => { for (const effort of ['low', 'medium'] as const) { const originalRequest = { diff --git a/packages/core/src/core/openaiContentGenerator/provider/deepseek.ts b/packages/core/src/core/openaiContentGenerator/provider/deepseek.ts index 82144d7ba..3d5919043 100644 --- a/packages/core/src/core/openaiContentGenerator/provider/deepseek.ts +++ b/packages/core/src/core/openaiContentGenerator/provider/deepseek.ts @@ -141,10 +141,14 @@ function translateReasoningEffort( typeof next['reasoning_effort'] !== 'string' || !next['reasoning_effort'] ) { - next['reasoning_effort'] = - nestedEffort === 'low' || nestedEffort === 'medium' - ? 'high' - : nestedEffort; + // Backward-compat mapping per the doc: low/medium → high, xhigh → max. + // Surface it client-side so logs reflect the wire value the server will + // actually act on (the server does the same mapping if we passed the + // raw value through, but explicit is better for observability). + let normalized = nestedEffort; + if (normalized === 'low' || normalized === 'medium') normalized = 'high'; + else if (normalized === 'xhigh') normalized = 'max'; + next['reasoning_effort'] = normalized; } // Strip the nested form so we don't ship both shapes on the wire.