* feat(core): support reasoning effort 'max' tier (DeepSeek extension)
DeepSeek's chat-completions endpoint added an extra-strong `max` tier
to `reasoning_effort` (per
https://api-docs.deepseek.com/zh-cn/api/create-chat-completion ; valid
values are now `high` and `max`, with `low`/`medium` mapping to `high`
for backward compat). Plumb it end-to-end:
- `ContentGeneratorConfig.reasoning.effort` union now includes 'max'.
- DeepSeek OpenAI-compat provider: translate the standard nested
`reasoning: { effort }` shape into DeepSeek's flat `reasoning_effort`
body parameter so user-configured effort actually takes effect (the
nested shape was previously sent verbatim and silently ignored,
defaulting to `high`). low/medium → high mirrors the documented
server-side behavior so dashboards / logs match wire reality.
An explicit top-level `reasoning_effort` (via samplingParams or
extra_body) wins over the nested form.
- Anthropic converter: pass 'max' through to `output_config.effort`
unchanged and bump the `thinking.budget_tokens` budget for the new
tier (low 16k / medium 32k / high 64k / max 128k).
- Gemini converter: clamp 'max' to HIGH since Gemini has no higher
thinking level. Without this, 'max' would silently fall through to
THINKING_LEVEL_UNSPECIFIED.
Live verification against api.deepseek.com:
- `reasoning_effort: high` → 200
- `reasoning_effort: max` → 200 (the new tier)
- `reasoning_effort: bogus`→ 400 with valid-set list confirming
[high, low, medium, max, xhigh]
108 anthropic/openai-deepseek/gemini tests pass; full core suite
(6601 tests) green; lint + typecheck clean.
* fix(core): map xhigh→max + clamp max on non-DeepSeek anthropic + docs
Address PR review (copilot × 2) and add missing user docs:
1. (J698) `translateReasoningEffort` claimed in the PR description that
it surfaces the DeepSeek backward-compat mapping client-side, but
only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match
the doc and stay symmetric with the low/medium branch.
2. (J6-A) `output_config.effort: 'max'` would have been emitted on
any anthropic-protocol provider whenever a user configured it, even
when the baseURL points at real `api.anthropic.com` (which only
accepts low/medium/high and would 400). Reuse the existing
`isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'`
on non-DeepSeek anthropic backends, with a debugLogger.warn so the
downgrade is visible. DeepSeek anthropic-compatible endpoints still
pass through unchanged.
3. New docs:
- `docs/users/configuration/model-providers.md`: a "Reasoning /
thinking configuration" section under generationConfig — single
example targeting DeepSeek + a per-provider behavior table
(OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for
other servers, real Anthropic clamp, Anthropic-compatible
DeepSeek passthrough, Gemini thinkingLevel mapping).
- `docs/users/configuration/settings.md`: extend the
`model.generationConfig` description to mention `reasoning`
(the field was undocumented before this PR even though it
already existed as a typed field) and link to the new section.
96 anthropic + deepseek tests pass; lint + typecheck clean.
* refactor(core): single-source effort normalization for anthropic + doc fix
Address PR review round 2 (copilot × 2):
1. (J8aG) The `contentGenerator.ts` comment claimed passing
`reasoning.effort: 'max'` to real Anthropic was "up to the user",
but commit b5b05ae actively clamps 'max' → 'high' (with a debug
log) on non-DeepSeek anthropic backends. Update the comment to
describe current runtime behavior.
2. (J8aL) The clamp ran inside `buildOutputConfig()` only — the effort
label was downgraded but `buildThinkingConfig()` still used the
raw user value to size the budget, so a non-DeepSeek anthropic
request could end up with `output_config.effort: 'high'` paired
with a 'max'-sized 128K thinking budget. Inconsistent label vs.
budget on the wire.
Refactor: hoist the normalization into a single
`resolveEffectiveEffort()` helper that runs once per request in
`buildRequest()`. Both `buildThinkingConfig` and `buildOutputConfig`
now consume the same clamped value, so the budget ladder and the
effort label stay aligned. The debug log fires once per request.
Add a regression test asserting that on a non-DeepSeek anthropic
provider with `effort: 'max'` configured, the wire request carries
both `output_config.effort: 'high'` AND `thinking.budget_tokens:
64_000` (the 'high' tier), not the 128K 'max' budget.
96 tests pass; lint + typecheck clean.
* fix(core): tighten 'max' clamp + warn-once + strip reasoning_effort on side queries
Address PR review round 3 (copilot × 3):
1. (J-2v) When request.config.thinkingConfig.includeThoughts is false,
pipeline.buildRequest's post-processing only deleted the nested
`reasoning` key. The DeepSeek provider's translateReasoningEffort
may have already flattened an extra_body-injected reasoning into
top-level `reasoning_effort` by that point, so a side query (e.g.
suggestionGenerator) could still ship reasoning_effort on the wire.
Extend the post-processing to also delete `reasoning_effort`.
2. (J-2z) The warn for clamping 'max' on non-DeepSeek anthropic ran on
every request needing the downgrade — the docstring claimed "first
time only" but the implementation didn't latch. Add a private
`effortClampWarned` boolean on the generator so the warning fires
once per generator lifetime.
3. (J-23) `resolveEffectiveEffort` used the broad
`isDeepSeekAnthropicProvider` detector for the clamp decision, but
that helper falls back to model-name matching to cover sglang/vllm
self-hosted DeepSeek deployments. A model configured as e.g.
"deepseek-distill" but routed to real api.anthropic.com would
bypass the clamp and trigger HTTP 400. Split the detector: keep
`isDeepSeekAnthropicProvider` (broad) for the thinking-block
injection workaround where false-positives are harmless, and add
`isDeepSeekAnthropicHostname` (hostname-only) for decisions where
a model-name false-positive would route DeepSeek-only behavior to
a stricter backend. The clamp now uses the hostname-only check.
New regression test: a config with model name containing "deepseek"
but baseURL pointing at api.anthropic.com still clamps `'max'` to
`'high'`. Existing "passes max through" test updated to set a
DeepSeek baseURL since model name alone no longer suffices for the
clamp bypass.
385 tests pass; lint + typecheck clean.
* docs(core): correct pipeline timing comment + samplingParams caveat
Address PR review round 4 (copilot × 3) — three documentation accuracy
fixes, no behavior change:
1. (KBcw) The post-processing comment in pipeline.ts misdescribed the
call order ("after this branch already ran during the same
buildRequest pass") — provider.buildRequest actually runs BEFORE
the includeThoughts=false post-processing in the same pass.
Reword to match the actual order: provider hook flattens nested
reasoning to reasoning_effort first, this cleanup runs after and
strips both shapes.
2. (KBdC, KBdE) The "Reasoning / thinking configuration" section in
model-providers.md and the model.generationConfig description in
settings.md both implied `reasoning` is honored on every provider.
For OpenAI-compatible providers, when `generationConfig.samplingParams`
is set, `ContentGenerationPipeline.buildGenerateContentConfig()`
ships samplingParams verbatim and skips the separate `reasoning`
injection entirely. Configs like
`{ samplingParams: { temperature: 0.5 }, reasoning: { effort: 'max' } }`
would silently drop the reasoning field on OpenAI/DeepSeek
requests.
Add an explicit "Interaction with samplingParams" warning section
in model-providers.md and a parenthetical note in settings.md
directing users to put `reasoning_effort` inside `samplingParams`
(or `extra_body`) when both are configured.
385 tests pass; lint + typecheck clean.
* docs(core): clarify explicit budget_tokens bypasses 'max' effort clamp
When user sets `{ effort: 'max', budget_tokens: N }` on a non-DeepSeek
anthropic backend, the effort label gets clamped to 'high' (otherwise
the server 400s on the unknown enum) but the explicit budget_tokens is
preserved verbatim. The wire-shape mismatch is intentional, not a bug:
the clamp only protects the enum field, while budget is a free integer
the server accepts within the context window, so an explicit override
stays explicit. Document the contract on the early-return and add a
regression test that locks it in.
* docs(deepseek): fix comments to match flatten + reasoning-strip behavior
Two doc-only nits called out in review:
1. `buildRequest` JSDoc said non-text parts are "rejected", but
`flattenContentParts` actually substitutes a textual placeholder
(`[Unsupported content type: <type>]`) so the request still goes
through with a breadcrumb. Reword the JSDoc accordingly.
2. `translateReasoningEffort`'s strip comment claimed it strips the
nested form to avoid shipping both shapes, but it only drops the
duplicated `effort` key when other keys (e.g. `budget_tokens`) are
present. Reword to describe the actual selective behavior and why
keeping orthogonal keys is intentional.
Behavior unchanged.
* fix(deepseek): gate reasoning_effort translation on actual DeepSeek hostname
The provider class is selected via the broader `isDeepSeekProvider`
check, which falls back to model-name matching to cover self-hosted
DeepSeek deployments (sglang/vllm/ollama, see #3613). That fallback is
the right call for content-part flattening — it's a model-format
constraint baked into the model itself, not the API surface.
But the same broad detection was also gating
`translateReasoningEffort`, which rewrites the standard
`reasoning: { effort }` config into DeepSeek's flat `reasoning_effort`
body parameter. That's a wire-shape decision, not a model-format one:
strict OpenAI-compat backends in self-hosted setups may not accept the
DeepSeek extension and would have happily handled the original shape.
Split the two decisions: keep `isDeepSeekProvider` (broad) for
flattening, add a hostname-only `isDeepSeekHostname` and gate the body
rewrite on it. Self-hosted DeepSeek users who actually want the
translation can either use a baseUrl containing api.deepseek.com or
inject `reasoning_effort` directly via `samplingParams`/`extra_body`.
Regression tests:
- self-hosted (sglang) with deepseek-named model + nested
`reasoning.effort` → flattening still runs, body shape preserved
- `isDeepSeekHostname` matches api.deepseek.com but not custom hosts
* fix(deepseek): use URL parsing in isDeepSeekHostname; fix log-level docs
CodeQL flagged a high-severity URL substring sanitization issue on the
new `isDeepSeekHostname` helper. The naive
`baseUrl.includes('api.deepseek.com')` check would false-positive on
hostile hosts like `https://api.deepseek.com.evil.com/v1` and
incorrectly inject the DeepSeek-only `reasoning_effort` body parameter
into requests routed elsewhere. Switch to `new URL(...).hostname` with
exact match against `api.deepseek.com` (and `.api.deepseek.com`
subdomains), mirroring `isDeepSeekAnthropicHostname` on the Anthropic
side. Invalid URLs treated as non-DeepSeek.
`isDeepSeekProvider` already routes through `isDeepSeekHostname`, so
the hardening applies to both decision paths.
Regression tests cover:
- subdomain match (us.api.deepseek.com)
- hostile substrings (api.deepseek.com.evil.com,
evil.com/api.deepseek.com/v1, api.deepseek.comevil.com,
api-deepseek-com.example.com)
- invalid / empty baseUrl
Also fix two doc-level mismatches: the `'max'` clamp on Anthropic logs
via `debugLogger.warn` (warning level, once per generator), not "with
a debug log". Update both `ContentGeneratorConfig.reasoning` JSDoc and
the per-provider behavior table in model-providers.md.
* feat(deepseek): emit thinking:disabled signal when reasoning is off
DeepSeek V4+ defaults `thinking.type` to `'enabled'`, so just stripping
`reasoning_effort` from the request leaves the server happily thinking
on side queries — paying full thinking latency/cost without an effort
configured. Per yiliang114's review, emit the explicit
`thinking: { type: 'disabled' }` field on the wire whenever reasoning
is disabled.
Triggered when either:
- `request.config.thinkingConfig.includeThoughts === false` (forked
queries, e.g. suggestion generation)
- `contentGeneratorConfig.reasoning === false` (config-level opt-out)
The previous post-processing block only fired on the per-request opt-out
path, so the config-level case was already leaking. Unify both under a
single `reasoningDisabled` predicate that runs the same strip + signal
logic.
Hostname-gated to `api.deepseek.com` (and subdomains): self-hosted
DeepSeek behind sglang/vllm/ollama, or older DeepSeek versions, may
not accept the V4 thinking parameter — pushing it there could trip an
unknown-key 400. Mirrors the round-7 decision to gate
`reasoning_effort` translation on hostname.
Regression tests cover all four matrix points:
- DeepSeek hostname + includeThoughts false → emits disabled
- DeepSeek hostname + reasoning false → emits disabled
- non-DeepSeek hostname + includeThoughts false → does not emit
- self-hosted DeepSeek (model-name fallback only) → does not emit
Docs: extend the `reasoning: false` section with the new behavior and
the self-hosted/non-DeepSeek caveat.
* refactor(deepseek): expose isDeepSeek* as free functions; clarify docs
Two doc/coupling nits from review:
1. The pipeline post-processing block was importing the concrete
`DeepSeekOpenAICompatibleProvider` class just to reach
`isDeepSeekHostname`. That couples the generic OpenAI pipeline to a
specific provider implementation. Promote the helper (and its broad
`isDeepSeekProvider` sibling) to free `export function`s in
`provider/deepseek.ts` and import them by name. The class keeps thin
static delegates for backward compat with existing callers and tests.
2. The per-provider behavior table on `model-providers.md` said
`'low'/'medium' → 'high'` and `'xhigh' → 'max'` "client-side", but
that normalization only fires inside `translateReasoningEffort`,
which runs on the nested `reasoning.effort` config path. Explicit
top-level overrides via `samplingParams.reasoning_effort` or
`extra_body.reasoning_effort` skip the rewrite and ship verbatim.
Reword the row to reflect that.
Behavior unchanged.