mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-04-28 11:41:04 +00:00
Previously pipeline.ts always hardcoded max_tokens as the output-token parameter name on the OpenAI-compatible path, falling back from samplingParams.max_tokens to request.config.maxOutputTokens to provider defaults. This broke GPT-5 / o-series on OpenAI and Azure OpenAI, which require max_completion_tokens and reject max_tokens with a 400 error.
Fix: when the user provides samplingParams explicitly, treat it as the complete source of truth for the wire shape and pass its keys through verbatim. No client-injected defaults, no request fallbacks, no hardcoded parameter names. The user describes what the provider wants; the client trusts them.
When samplingParams is absent, the historical default behavior (request fallback through temperature/top_p/.../max_tokens plus provider defaults) is preserved unchanged — existing users see no difference.
Concretely, users can now set any of:
samplingParams: { max_tokens: 4096 } # GPT-4 / Qwen / DeepSeek
samplingParams: { max_completion_tokens: 4096 } # GPT-5 / o-series
samplingParams: { reasoning_effort: 'medium' } # future knobs
without waiting for a qwen-code release that adds model-specific branches.
Signed-off-by: Gordon Lam (SH) <yeelam@microsoft.com>
|
||
|---|---|---|
| .. | ||
| channels | ||
| cli | ||
| core | ||
| sdk-java | ||
| sdk-typescript | ||
| vscode-ide-companion | ||
| web-templates | ||
| webui | ||
| zed-extension | ||