fix(core): map xhigh→max + clamp max on non-DeepSeek anthropic + docs

Address PR review (copilot × 2) and add missing user docs:

1. (J698) `translateReasoningEffort` claimed in the PR description that
   it surfaces the DeepSeek backward-compat mapping client-side, but
   only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match
   the doc and stay symmetric with the low/medium branch.

2. (J6-A) `output_config.effort: 'max'` would have been emitted on
   any anthropic-protocol provider whenever a user configured it, even
   when the baseURL points at real `api.anthropic.com` (which only
   accepts low/medium/high and would 400). Reuse the existing
   `isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'`
   on non-DeepSeek anthropic backends, with a debugLogger.warn so the
   downgrade is visible. DeepSeek anthropic-compatible endpoints still
   pass through unchanged.

3. New docs:
   - `docs/users/configuration/model-providers.md`: a "Reasoning /
     thinking configuration" section under generationConfig — single
     example targeting DeepSeek + a per-provider behavior table
     (OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for
     other servers, real Anthropic clamp, Anthropic-compatible
     DeepSeek passthrough, Gemini thinkingLevel mapping).
   - `docs/users/configuration/settings.md`: extend the
     `model.generationConfig` description to mention `reasoning`
     (the field was undocumented before this PR even though it
     already existed as a typed field) and link to the new section.

96 anthropic + deepseek tests pass; lint + typecheck clean.
This commit is contained in:
wenshao 2026-05-03 09:17:19 +08:00
parent 86d687d4cf
commit b5b05ae219
6 changed files with 137 additions and 16 deletions

View file

@ -481,6 +481,57 @@ When using a raw model via `--model gpt-4` (not from modelProviders, creates a R
The merge strategy for `modelProviders` itself is REPLACE: the entire `modelProviders` from project settings will override the corresponding section in user settings, rather than merging the two.
## Reasoning / thinking configuration
The optional `reasoning` field under `generationConfig` controls how aggressively the model reasons before responding. It is plumbed through every supported provider — set it once and the converter for each protocol does the right thing.
```jsonc
{
"modelProviders": {
"openai": [
{
"id": "deepseek-v4-pro",
"name": "DeepSeek V4 Pro",
"baseUrl": "https://api.deepseek.com/v1",
"envKey": "DEEPSEEK_API_KEY",
"generationConfig": {
// The four-tier scale:
// 'low' | 'medium' — server-mapped to 'high' on DeepSeek
// 'high' — default reasoning intensity
// 'max' — DeepSeek-specific extra-strong tier
// Or set `false` to disable reasoning entirely.
"reasoning": { "effort": "max" },
},
},
],
},
}
```
### Per-provider behavior
| Protocol / provider | Wire shape | Notes |
| -------------------------------------------- | -------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **OpenAI / DeepSeek** (`api.deepseek.com`) | Flat `reasoning_effort: <effort>` body parameter | `'low'`/`'medium'` map to `'high'` and `'xhigh'` to `'max'` client-side, mirroring DeepSeek's [server-side back-compat](https://api-docs.deepseek.com/zh-cn/api/create-chat-completion). Top-level overrides via `samplingParams.reasoning_effort` or `extra_body.reasoning_effort` win. |
| **OpenAI** (other compatible servers) | `reasoning: { effort, ... }` passed through verbatim | Set via `samplingParams` (e.g. `samplingParams.reasoning_effort` for GPT-5/o-series) when the provider expects a different shape. |
| **Anthropic** (real `api.anthropic.com`) | `output_config: { effort }` plus the `effort-2025-11-24` beta header | Real Anthropic accepts `'low'`/`'medium'`/`'high'` only. `'max'` is **clamped to `'high'`** with a debug log; if you want max effort, switch the baseURL to a DeepSeek-compatible endpoint that supports it. |
| **Anthropic** (`api.deepseek.com/anthropic`) | Same `output_config: { effort }` + beta header | `'max'` is passed through unchanged. |
| **Gemini** (`@google/genai`) | `thinkingConfig: { includeThoughts: true, thinkingLevel }` | `'low'``LOW`, `'high'`/`'max'``HIGH`, others → `THINKING_LEVEL_UNSPECIFIED` (Gemini has no `MAX` tier). |
### `reasoning: false`
Setting `reasoning: false` (the literal boolean) explicitly disables thinking on every provider — useful for cheap side queries that don't benefit from reasoning. This is honored at the request level too via `request.config.thinkingConfig.includeThoughts: false` for one-off calls (e.g. suggestion generation).
### `budget_tokens`
You can pin an exact thinking-token budget by including `budget_tokens` alongside `effort`:
```jsonc
"reasoning": { "effort": "high", "budget_tokens": 50000 }
```
For Anthropic this becomes `thinking.budget_tokens`. For OpenAI/DeepSeek the field is preserved but currently ignored by the server — `reasoning_effort` is the load-bearing knob.
## Provider Models vs Runtime Models
Qwen Code distinguishes between two types of model configurations:

View file

@ -140,17 +140,17 @@ Settings are organized into categories. Most settings should be placed within th
#### model
| Setting | Type | Description | Default |
| -------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------- |
| `model.name` | string | The Qwen model to use for conversations. | `undefined` |
| `model.maxSessionTurns` | number | Maximum number of user/model/tool turns to keep in a session. -1 means unlimited. | `-1` |
| `model.generationConfig` | object | Advanced overrides passed to the underlying content generator. Supports request controls such as `timeout`, `maxRetries`, `enableCacheControl`, `splitToolMedia` (set `true` for strict OpenAI-compatible servers like LM Studio that reject non-text content on `role: "tool"` messages — splits media into a follow-up user message), `contextWindowSize` (override model's context window size), `modalities` (override auto-detected input modalities), `customHeaders` (custom HTTP headers for API requests), and `extra_body` (additional body parameters for OpenAI-compatible API requests only), along with fine-tuning knobs under `samplingParams` (for example `temperature`, `top_p`, `max_tokens`). Leave unset to rely on provider defaults. | `undefined` |
| `model.chatCompression.contextPercentageThreshold` | number | Sets the threshold for chat history compression as a percentage of the model's total token limit. This is a value between 0 and 1 that applies to both automatic compression and the manual `/compress` command. For example, a value of `0.6` will trigger compression when the chat history exceeds 60% of the token limit. Use `0` to disable compression entirely. | `0.7` |
| `model.skipNextSpeakerCheck` | boolean | Skip the next speaker check. | `false` |
| `model.skipLoopDetection` | boolean | Disables loop detection checks. Loop detection prevents infinite loops in AI responses but can generate false positives that interrupt legitimate workflows. Enable this option if you experience frequent false positive loop detection interruptions. | `false` |
| `model.skipStartupContext` | boolean | Skips sending the startup workspace context (environment summary and acknowledgement) at the beginning of each session. Enable this if you prefer to provide context manually or want to save tokens on startup. | `false` |
| `model.enableOpenAILogging` | boolean | Enables logging of OpenAI API calls for debugging and analysis. When enabled, API requests and responses are logged to JSON files. | `false` |
| `model.openAILoggingDir` | string | Custom directory path for OpenAI API logs. If not specified, defaults to `logs/openai` in the current working directory. Supports absolute paths, relative paths (resolved from current working directory), and `~` expansion (home directory). | `undefined` |
| Setting | Type | Description | Default |
| -------------------------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- |
| `model.name` | string | The Qwen model to use for conversations. | `undefined` |
| `model.maxSessionTurns` | number | Maximum number of user/model/tool turns to keep in a session. -1 means unlimited. | `-1` |
| `model.generationConfig` | object | Advanced overrides passed to the underlying content generator. Supports request controls such as `timeout`, `maxRetries`, `enableCacheControl`, `splitToolMedia` (set `true` for strict OpenAI-compatible servers like LM Studio that reject non-text content on `role: "tool"` messages — splits media into a follow-up user message), `contextWindowSize` (override model's context window size), `modalities` (override auto-detected input modalities), `customHeaders` (custom HTTP headers for API requests), `extra_body` (additional body parameters for OpenAI-compatible API requests only), and `reasoning` (`{ effort: 'low' \| 'medium' \| 'high' \| 'max', budget_tokens?: number }` to control thinking intensity, or `false` to disable; `'max'` is a DeepSeek extension — see [Reasoning / thinking configuration](./model-providers.md#reasoning--thinking-configuration) for per-provider behavior), along with fine-tuning knobs under `samplingParams` (for example `temperature`, `top_p`, `max_tokens`). Leave unset to rely on provider defaults. | `undefined` |
| `model.chatCompression.contextPercentageThreshold` | number | Sets the threshold for chat history compression as a percentage of the model's total token limit. This is a value between 0 and 1 that applies to both automatic compression and the manual `/compress` command. For example, a value of `0.6` will trigger compression when the chat history exceeds 60% of the token limit. Use `0` to disable compression entirely. | `0.7` |
| `model.skipNextSpeakerCheck` | boolean | Skip the next speaker check. | `false` |
| `model.skipLoopDetection` | boolean | Disables loop detection checks. Loop detection prevents infinite loops in AI responses but can generate false positives that interrupt legitimate workflows. Enable this option if you experience frequent false positive loop detection interruptions. | `false` |
| `model.skipStartupContext` | boolean | Skips sending the startup workspace context (environment summary and acknowledgement) at the beginning of each session. Enable this if you prefer to provide context manually or want to save tokens on startup. | `false` |
| `model.enableOpenAILogging` | boolean | Enables logging of OpenAI API calls for debugging and analysis. When enabled, API requests and responses are logged to JSON files. | `false` |
| `model.openAILoggingDir` | string | Custom directory path for OpenAI API logs. If not specified, defaults to `logs/openai` in the current working directory. Supports absolute paths, relative paths (resolved from current working directory), and `~` expansion (home directory). | `undefined` |
**Example model.generationConfig:**

View file

@ -523,6 +523,43 @@ describe('AnthropicContentGenerator', () => {
);
});
it("clamps effort: 'max' to 'high' on a non-DeepSeek anthropic provider", async () => {
// 'max' is a DeepSeek extension; real Anthropic only accepts
// low/medium/high. Clamp so a config targeting DeepSeek doesn't 400
// when reused against a stricter Anthropic backend.
const { AnthropicContentGenerator } = await importGenerator();
anthropicState.createImpl.mockResolvedValue({
id: 'anthropic-1',
model: 'claude-test',
content: [{ type: 'text', text: 'hi' }],
});
const generator = new AnthropicContentGenerator(
{
model: 'claude-test',
apiKey: 'test-key',
baseUrl: 'https://api.anthropic.com',
timeout: 10_000,
maxRetries: 2,
samplingParams: { max_tokens: 500 },
schemaCompliance: 'auto',
reasoning: { effort: 'max' },
},
mockConfig,
);
await generator.generateContent({
model: 'models/ignored',
contents: 'Hello',
} as unknown as GenerateContentParameters);
const [anthropicRequest] =
anthropicState.lastCreateArgs as AnthropicCreateArgs;
expect(anthropicRequest).toEqual(
expect.objectContaining({ output_config: { effort: 'high' } }),
);
});
it('omits thinking when request.config.thinkingConfig.includeThoughts is false', async () => {
const { AnthropicContentGenerator } = await importGenerator();
anthropicState.createImpl.mockResolvedValue({

View file

@ -429,7 +429,23 @@ export class AnthropicContentGenerator implements ContentGenerator {
return undefined;
}
return { effort: reasoning.effort };
// 'max' is a DeepSeek-specific extension; real Anthropic accepts only
// low/medium/high. Clamp on non-DeepSeek anthropic-compatible providers
// so configurations targeting DeepSeek don't 400 when the user later
// switches the same auth profile to a stricter Anthropic backend.
let effort = reasoning.effort;
if (
effort === 'max' &&
!isDeepSeekAnthropicProvider(this.contentGeneratorConfig)
) {
debugLogger.warn(
"reasoning.effort='max' is a DeepSeek extension; clamping to 'high' " +
'for non-DeepSeek anthropic provider to avoid HTTP 400.',
);
effort = 'high';
}
return { effort };
}
private async *processStream(

View file

@ -294,6 +294,19 @@ describe('DeepSeekOpenAICompatibleProvider', () => {
expect(r['reasoning']).toBeUndefined();
});
it("maps backward-compat 'xhigh' effort to 'max' (DeepSeek doc behavior)", () => {
const originalRequest = {
model: 'deepseek-v4-pro',
messages: [{ role: 'user', content: 'hi' }],
reasoning: { effort: 'xhigh' },
} as unknown as OpenAI.Chat.ChatCompletionCreateParams;
const result = provider.buildRequest(originalRequest, userPromptId);
const r = result as unknown as Record<string, unknown>;
expect(r['reasoning_effort']).toBe('max');
});
it('maps backward-compat `low`/`medium` effort to `high` (DeepSeek doc behavior)', () => {
for (const effort of ['low', 'medium'] as const) {
const originalRequest = {

View file

@ -141,10 +141,14 @@ function translateReasoningEffort(
typeof next['reasoning_effort'] !== 'string' ||
!next['reasoning_effort']
) {
next['reasoning_effort'] =
nestedEffort === 'low' || nestedEffort === 'medium'
? 'high'
: nestedEffort;
// Backward-compat mapping per the doc: low/medium → high, xhigh → max.
// Surface it client-side so logs reflect the wire value the server will
// actually act on (the server does the same mapping if we passed the
// raw value through, but explicit is better for observability).
let normalized = nestedEffort;
if (normalized === 'low' || normalized === 'medium') normalized = 'high';
else if (normalized === 'xhigh') normalized = 'max';
next['reasoning_effort'] = normalized;
}
// Strip the nested form so we don't ship both shapes on the wire.