Find a file
Shaojin Wen efb7351d58
feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800)
* feat(core): support reasoning effort 'max' tier (DeepSeek extension)

DeepSeek's chat-completions endpoint added an extra-strong `max` tier
to `reasoning_effort` (per
https://api-docs.deepseek.com/zh-cn/api/create-chat-completion ; valid
values are now `high` and `max`, with `low`/`medium` mapping to `high`
for backward compat). Plumb it end-to-end:

- `ContentGeneratorConfig.reasoning.effort` union now includes 'max'.
- DeepSeek OpenAI-compat provider: translate the standard nested
  `reasoning: { effort }` shape into DeepSeek's flat `reasoning_effort`
  body parameter so user-configured effort actually takes effect (the
  nested shape was previously sent verbatim and silently ignored,
  defaulting to `high`). low/medium → high mirrors the documented
  server-side behavior so dashboards / logs match wire reality.
  An explicit top-level `reasoning_effort` (via samplingParams or
  extra_body) wins over the nested form.
- Anthropic converter: pass 'max' through to `output_config.effort`
  unchanged and bump the `thinking.budget_tokens` budget for the new
  tier (low 16k / medium 32k / high 64k / max 128k).
- Gemini converter: clamp 'max' to HIGH since Gemini has no higher
  thinking level. Without this, 'max' would silently fall through to
  THINKING_LEVEL_UNSPECIFIED.

Live verification against api.deepseek.com:
- `reasoning_effort: high` → 200
- `reasoning_effort: max`  → 200 (the new tier)
- `reasoning_effort: bogus`→ 400 with valid-set list confirming
  [high, low, medium, max, xhigh]

108 anthropic/openai-deepseek/gemini tests pass; full core suite
(6601 tests) green; lint + typecheck clean.

* fix(core): map xhigh→max + clamp max on non-DeepSeek anthropic + docs

Address PR review (copilot × 2) and add missing user docs:

1. (J698) `translateReasoningEffort` claimed in the PR description that
   it surfaces the DeepSeek backward-compat mapping client-side, but
   only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match
   the doc and stay symmetric with the low/medium branch.

2. (J6-A) `output_config.effort: 'max'` would have been emitted on
   any anthropic-protocol provider whenever a user configured it, even
   when the baseURL points at real `api.anthropic.com` (which only
   accepts low/medium/high and would 400). Reuse the existing
   `isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'`
   on non-DeepSeek anthropic backends, with a debugLogger.warn so the
   downgrade is visible. DeepSeek anthropic-compatible endpoints still
   pass through unchanged.

3. New docs:
   - `docs/users/configuration/model-providers.md`: a "Reasoning /
     thinking configuration" section under generationConfig — single
     example targeting DeepSeek + a per-provider behavior table
     (OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for
     other servers, real Anthropic clamp, Anthropic-compatible
     DeepSeek passthrough, Gemini thinkingLevel mapping).
   - `docs/users/configuration/settings.md`: extend the
     `model.generationConfig` description to mention `reasoning`
     (the field was undocumented before this PR even though it
     already existed as a typed field) and link to the new section.

96 anthropic + deepseek tests pass; lint + typecheck clean.

* refactor(core): single-source effort normalization for anthropic + doc fix

Address PR review round 2 (copilot × 2):

1. (J8aG) The `contentGenerator.ts` comment claimed passing
   `reasoning.effort: 'max'` to real Anthropic was "up to the user",
   but commit b5b05ae actively clamps 'max' → 'high' (with a debug
   log) on non-DeepSeek anthropic backends. Update the comment to
   describe current runtime behavior.

2. (J8aL) The clamp ran inside `buildOutputConfig()` only — the effort
   label was downgraded but `buildThinkingConfig()` still used the
   raw user value to size the budget, so a non-DeepSeek anthropic
   request could end up with `output_config.effort: 'high'` paired
   with a 'max'-sized 128K thinking budget. Inconsistent label vs.
   budget on the wire.

   Refactor: hoist the normalization into a single
   `resolveEffectiveEffort()` helper that runs once per request in
   `buildRequest()`. Both `buildThinkingConfig` and `buildOutputConfig`
   now consume the same clamped value, so the budget ladder and the
   effort label stay aligned. The debug log fires once per request.

Add a regression test asserting that on a non-DeepSeek anthropic
provider with `effort: 'max'` configured, the wire request carries
both `output_config.effort: 'high'` AND `thinking.budget_tokens:
64_000` (the 'high' tier), not the 128K 'max' budget.

96 tests pass; lint + typecheck clean.

* fix(core): tighten 'max' clamp + warn-once + strip reasoning_effort on side queries

Address PR review round 3 (copilot × 3):

1. (J-2v) When request.config.thinkingConfig.includeThoughts is false,
   pipeline.buildRequest's post-processing only deleted the nested
   `reasoning` key. The DeepSeek provider's translateReasoningEffort
   may have already flattened an extra_body-injected reasoning into
   top-level `reasoning_effort` by that point, so a side query (e.g.
   suggestionGenerator) could still ship reasoning_effort on the wire.
   Extend the post-processing to also delete `reasoning_effort`.

2. (J-2z) The warn for clamping 'max' on non-DeepSeek anthropic ran on
   every request needing the downgrade — the docstring claimed "first
   time only" but the implementation didn't latch. Add a private
   `effortClampWarned` boolean on the generator so the warning fires
   once per generator lifetime.

3. (J-23) `resolveEffectiveEffort` used the broad
   `isDeepSeekAnthropicProvider` detector for the clamp decision, but
   that helper falls back to model-name matching to cover sglang/vllm
   self-hosted DeepSeek deployments. A model configured as e.g.
   "deepseek-distill" but routed to real api.anthropic.com would
   bypass the clamp and trigger HTTP 400. Split the detector: keep
   `isDeepSeekAnthropicProvider` (broad) for the thinking-block
   injection workaround where false-positives are harmless, and add
   `isDeepSeekAnthropicHostname` (hostname-only) for decisions where
   a model-name false-positive would route DeepSeek-only behavior to
   a stricter backend. The clamp now uses the hostname-only check.

New regression test: a config with model name containing "deepseek"
but baseURL pointing at api.anthropic.com still clamps `'max'` to
`'high'`. Existing "passes max through" test updated to set a
DeepSeek baseURL since model name alone no longer suffices for the
clamp bypass.

385 tests pass; lint + typecheck clean.

* docs(core): correct pipeline timing comment + samplingParams caveat

Address PR review round 4 (copilot × 3) — three documentation accuracy
fixes, no behavior change:

1. (KBcw) The post-processing comment in pipeline.ts misdescribed the
   call order ("after this branch already ran during the same
   buildRequest pass") — provider.buildRequest actually runs BEFORE
   the includeThoughts=false post-processing in the same pass.
   Reword to match the actual order: provider hook flattens nested
   reasoning to reasoning_effort first, this cleanup runs after and
   strips both shapes.

2. (KBdC, KBdE) The "Reasoning / thinking configuration" section in
   model-providers.md and the model.generationConfig description in
   settings.md both implied `reasoning` is honored on every provider.
   For OpenAI-compatible providers, when `generationConfig.samplingParams`
   is set, `ContentGenerationPipeline.buildGenerateContentConfig()`
   ships samplingParams verbatim and skips the separate `reasoning`
   injection entirely. Configs like
   `{ samplingParams: { temperature: 0.5 }, reasoning: { effort: 'max' } }`
   would silently drop the reasoning field on OpenAI/DeepSeek
   requests.

   Add an explicit "Interaction with samplingParams" warning section
   in model-providers.md and a parenthetical note in settings.md
   directing users to put `reasoning_effort` inside `samplingParams`
   (or `extra_body`) when both are configured.

385 tests pass; lint + typecheck clean.

* docs(core): clarify explicit budget_tokens bypasses 'max' effort clamp

When user sets `{ effort: 'max', budget_tokens: N }` on a non-DeepSeek
anthropic backend, the effort label gets clamped to 'high' (otherwise
the server 400s on the unknown enum) but the explicit budget_tokens is
preserved verbatim. The wire-shape mismatch is intentional, not a bug:
the clamp only protects the enum field, while budget is a free integer
the server accepts within the context window, so an explicit override
stays explicit. Document the contract on the early-return and add a
regression test that locks it in.

* docs(deepseek): fix comments to match flatten + reasoning-strip behavior

Two doc-only nits called out in review:

1. `buildRequest` JSDoc said non-text parts are "rejected", but
   `flattenContentParts` actually substitutes a textual placeholder
   (`[Unsupported content type: <type>]`) so the request still goes
   through with a breadcrumb. Reword the JSDoc accordingly.

2. `translateReasoningEffort`'s strip comment claimed it strips the
   nested form to avoid shipping both shapes, but it only drops the
   duplicated `effort` key when other keys (e.g. `budget_tokens`) are
   present. Reword to describe the actual selective behavior and why
   keeping orthogonal keys is intentional.

Behavior unchanged.

* fix(deepseek): gate reasoning_effort translation on actual DeepSeek hostname

The provider class is selected via the broader `isDeepSeekProvider`
check, which falls back to model-name matching to cover self-hosted
DeepSeek deployments (sglang/vllm/ollama, see #3613). That fallback is
the right call for content-part flattening — it's a model-format
constraint baked into the model itself, not the API surface.

But the same broad detection was also gating
`translateReasoningEffort`, which rewrites the standard
`reasoning: { effort }` config into DeepSeek's flat `reasoning_effort`
body parameter. That's a wire-shape decision, not a model-format one:
strict OpenAI-compat backends in self-hosted setups may not accept the
DeepSeek extension and would have happily handled the original shape.

Split the two decisions: keep `isDeepSeekProvider` (broad) for
flattening, add a hostname-only `isDeepSeekHostname` and gate the body
rewrite on it. Self-hosted DeepSeek users who actually want the
translation can either use a baseUrl containing api.deepseek.com or
inject `reasoning_effort` directly via `samplingParams`/`extra_body`.

Regression tests:
  - self-hosted (sglang) with deepseek-named model + nested
    `reasoning.effort` → flattening still runs, body shape preserved
  - `isDeepSeekHostname` matches api.deepseek.com but not custom hosts

* fix(deepseek): use URL parsing in isDeepSeekHostname; fix log-level docs

CodeQL flagged a high-severity URL substring sanitization issue on the
new `isDeepSeekHostname` helper. The naive
`baseUrl.includes('api.deepseek.com')` check would false-positive on
hostile hosts like `https://api.deepseek.com.evil.com/v1` and
incorrectly inject the DeepSeek-only `reasoning_effort` body parameter
into requests routed elsewhere. Switch to `new URL(...).hostname` with
exact match against `api.deepseek.com` (and `.api.deepseek.com`
subdomains), mirroring `isDeepSeekAnthropicHostname` on the Anthropic
side. Invalid URLs treated as non-DeepSeek.

`isDeepSeekProvider` already routes through `isDeepSeekHostname`, so
the hardening applies to both decision paths.

Regression tests cover:
  - subdomain match (us.api.deepseek.com)
  - hostile substrings (api.deepseek.com.evil.com,
    evil.com/api.deepseek.com/v1, api.deepseek.comevil.com,
    api-deepseek-com.example.com)
  - invalid / empty baseUrl

Also fix two doc-level mismatches: the `'max'` clamp on Anthropic logs
via `debugLogger.warn` (warning level, once per generator), not "with
a debug log". Update both `ContentGeneratorConfig.reasoning` JSDoc and
the per-provider behavior table in model-providers.md.

* feat(deepseek): emit thinking:disabled signal when reasoning is off

DeepSeek V4+ defaults `thinking.type` to `'enabled'`, so just stripping
`reasoning_effort` from the request leaves the server happily thinking
on side queries — paying full thinking latency/cost without an effort
configured. Per yiliang114's review, emit the explicit
`thinking: { type: 'disabled' }` field on the wire whenever reasoning
is disabled.

Triggered when either:
  - `request.config.thinkingConfig.includeThoughts === false` (forked
    queries, e.g. suggestion generation)
  - `contentGeneratorConfig.reasoning === false` (config-level opt-out)

The previous post-processing block only fired on the per-request opt-out
path, so the config-level case was already leaking. Unify both under a
single `reasoningDisabled` predicate that runs the same strip + signal
logic.

Hostname-gated to `api.deepseek.com` (and subdomains): self-hosted
DeepSeek behind sglang/vllm/ollama, or older DeepSeek versions, may
not accept the V4 thinking parameter — pushing it there could trip an
unknown-key 400. Mirrors the round-7 decision to gate
`reasoning_effort` translation on hostname.

Regression tests cover all four matrix points:
  - DeepSeek hostname + includeThoughts false → emits disabled
  - DeepSeek hostname + reasoning false → emits disabled
  - non-DeepSeek hostname + includeThoughts false → does not emit
  - self-hosted DeepSeek (model-name fallback only) → does not emit

Docs: extend the `reasoning: false` section with the new behavior and
the self-hosted/non-DeepSeek caveat.

* refactor(deepseek): expose isDeepSeek* as free functions; clarify docs

Two doc/coupling nits from review:

1. The pipeline post-processing block was importing the concrete
   `DeepSeekOpenAICompatibleProvider` class just to reach
   `isDeepSeekHostname`. That couples the generic OpenAI pipeline to a
   specific provider implementation. Promote the helper (and its broad
   `isDeepSeekProvider` sibling) to free `export function`s in
   `provider/deepseek.ts` and import them by name. The class keeps thin
   static delegates for backward compat with existing callers and tests.

2. The per-provider behavior table on `model-providers.md` said
   `'low'/'medium' → 'high'` and `'xhigh' → 'max'` "client-side", but
   that normalization only fires inside `translateReasoningEffort`,
   which runs on the nested `reasoning.effort` config path. Explicit
   top-level overrides via `samplingParams.reasoning_effort` or
   `extra_body.reasoning_effort` skip the rewrite and ship verbatim.
   Reword the row to reflect that.

Behavior unchanged.
2026-05-04 22:42:23 +08:00
.github feat(sdk-python): add PyPI release workflow (#3685) 2026-05-04 21:07:21 +08:00
.husky Sync upstream Gemini-CLI v0.8.2 (#838) 2025-10-23 09:27:04 +08:00
.qwen feat(skills): add tmux-real-user-testing skill for readable TUI test logs (#3577) 2026-04-29 11:19:00 +08:00
.vscode Merge branch 'main' into feat/sandbox-config-improvements 2026-03-06 14:38:39 +08:00
docs feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800) 2026-05-04 22:42:23 +08:00
docs-site feat: update docs 2025-12-15 09:47:03 +08:00
eslint-rules pre-release commit 2025-07-22 23:26:01 +08:00
integration-tests fix(test): restore abort-and-lifecycle stdin-close test to pre-#3723 version (#3777) 2026-05-02 21:39:43 +08:00
packages feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800) 2026-05-04 22:42:23 +08:00
scripts feat(sdk-python): add PyPI release workflow (#3685) 2026-05-04 21:07:21 +08:00
.dockerignore fix(cli): skip stdin read for ACP mode 2026-03-27 11:47:01 +00:00
.editorconfig pre-release commit 2025-07-22 23:26:01 +08:00
.gitattributes pre-release commit 2025-07-22 23:26:01 +08:00
.gitignore Feat/openrouter auth (#3576) 2026-04-27 14:47:44 +08:00
.npmrc chore: remove google registry 2025-08-08 20:45:54 +08:00
.nvmrc chore: Expand node version test matrix (#2700) 2025-07-21 16:33:54 -07:00
.prettierignore Merge branch 'main' into feat/add-vscode-settings-json-schema 2026-03-03 11:21:57 +08:00
.prettierrc.json pre-release commit 2025-07-22 23:26:01 +08:00
.yamllint.yml Sync upstream Gemini-CLI v0.8.2 (#838) 2025-10-23 09:27:04 +08:00
AGENTS.md feat(vscode): expose /skills as slash command with secondary picker (#2548) 2026-04-24 23:28:53 +08:00
CONTRIBUTING.md docs: add Screenshots/Video Demo section to PR template 2026-03-20 16:59:53 +08:00
Dockerfile refactor: Extract web-templates package and unify build/pack workflow 2026-02-26 21:02:46 +08:00
esbuild.config.js feat: add wasm build config (#2985) 2026-04-09 14:21:00 +08:00
eslint.config.js feat(cli): add API preconnect to reduce first-call latency (#3318) 2026-04-27 06:54:55 +08:00
LICENSE Sync upstream Gemini-CLI v0.8.2 (#838) 2025-10-23 09:27:04 +08:00
Makefile feat: update docs 2025-12-22 21:11:33 +08:00
package-lock.json feat(sdk-python): add PyPI release workflow (#3685) 2026-05-04 21:07:21 +08:00
package.json feat(sdk-python): add PyPI release workflow (#3685) 2026-05-04 21:07:21 +08:00
README.md feat(SDK) Add Python SDK implementation for #3010 (#3494) 2026-04-25 07:02:58 +08:00
SECURITY.md fix: update security vulnerability reporting channel 2026-02-24 14:22:47 +08:00
tsconfig.json # 🚀 Sync Gemini CLI v0.2.1 - Major Feature Update (#483) 2025-09-01 14:48:55 +08:00
vitest.config.ts test(channels): add comprehensive test suites for channel adapters 2026-03-27 15:26:39 +00:00

npm version License Node.js Version Downloads

QwenLM%2Fqwen-code | Trendshift

An open-source AI agent that lives in your terminal.

中文 | Deutsch | français | 日本語 | Русский | Português (Brasil)

🎉 News

  • 2026-04-15: Qwen OAuth free tier has been discontinued. To continue using Qwen Code, switch to Alibaba Cloud Coding Plan, OpenRouter, Fireworks AI, or bring your own API key. Run qwen auth to configure.

  • 2026-04-13: Qwen OAuth free tier policy update: daily quota adjusted to 100 requests/day (from 1,000).

  • 2026-04-02: Qwen3.6-Plus is now live! Get an API key from Alibaba Cloud ModelStudio to access it through the OpenAI-compatible API.

  • 2026-02-16: Qwen3.5-Plus is now live!

Why Qwen Code?

Qwen Code is an open-source AI agent for the terminal, optimized for Qwen series models. It helps you understand large codebases, automate tedious work, and ship faster.

  • Multi-protocol, flexible providers: use OpenAI / Anthropic / Gemini-compatible APIs, Alibaba Cloud Coding Plan, OpenRouter, Fireworks AI, or bring your own API key.
  • Open-source, co-evolving: both the framework and the Qwen3-Coder model are open-source—and they ship and evolve together.
  • Agentic workflow, feature-rich: rich built-in tools (Skills, SubAgents) for a full agentic workflow and a Claude Code-like experience.
  • Terminal-first, IDE-friendly: built for developers who live in the command line, with optional integration for VS Code, Zed, and JetBrains IDEs.

Installation

Linux / macOS

bash -c "$(curl -fsSL https://qwen-code-assets.oss-cn-hangzhou.aliyuncs.com/installation/install-qwen.sh)"

Windows (Run as Administrator)

Works in both Command Prompt and PowerShell:

powershell -Command "Invoke-WebRequest 'https://qwen-code-assets.oss-cn-hangzhou.aliyuncs.com/installation/install-qwen.bat' -OutFile (Join-Path $env:TEMP 'install-qwen.bat'); & (Join-Path $env:TEMP 'install-qwen.bat')"

Note

: It's recommended to restart your terminal after installation to ensure environment variables take effect.

Manual Installation

Prerequisites

Make sure you have Node.js 20 or later installed. Download it from nodejs.org.

NPM

npm install -g @qwen-code/qwen-code@latest

Homebrew (macOS, Linux)

brew install qwen-code

Quick Start

# Start Qwen Code (interactive)
qwen

# Then, in the session:
/help
/auth

On first use, you'll be prompted to sign in. You can run /auth anytime to switch authentication methods.

Example prompts:

What does this project do?
Explain the codebase structure.
Help me refactor this function.
Generate unit tests for this module.
Click to watch a demo video

🦞 Use Qwen Code for Coding Tasks in Claw

Copy the prompt below and paste it into your agent:

Use Qwen Code as a coding agent. First, fetch and read this skill reference:
https://raw.githubusercontent.com/QwenLM/qwen-code/refs/heads/main/.qwen/skills/qwen-code-claw/SKILL.md
Then follow the instructions inside to install, authenticate, and use Qwen Code via acpx for all coding tasks.

Authentication

Qwen Code supports the following authentication methods:

  • API Key (recommended): use an API key from Alibaba Cloud Model Studio (Beijing / intl) or any supported provider (OpenAI, Anthropic, Google GenAI, and other compatible endpoints).
  • Coding Plan: subscribe to the Alibaba Cloud Coding Plan (Beijing / intl) for a fixed monthly fee with higher quotas.

⚠️ Qwen OAuth was discontinued on April 15, 2026. If you were previously using Qwen OAuth, please switch to one of the methods above. Run qwen and then /auth to reconfigure.

Use an API key to connect to Alibaba Cloud Model Studio or any supported provider. Supports multiple protocols:

  • OpenAI-compatible: Alibaba Cloud ModelStudio, ModelScope, OpenAI, OpenRouter, and other OpenAI-compatible providers
  • Anthropic: Claude models
  • Google GenAI: Gemini models

The recommended way to configure models and providers is by editing ~/.qwen/settings.json (create it if it doesn't exist). This file lets you define all available models, API keys, and default settings in one place.

Quick Setup in 3 Steps

Step 1: Create or edit ~/.qwen/settings.json

Here is a complete example:

{
  "modelProviders": {
    "openai": [
      {
        "id": "qwen3.6-plus",
        "name": "qwen3.6-plus",
        "baseUrl": "https://dashscope.aliyuncs.com/compatible-mode/v1",
        "description": "Qwen3-Coder via Dashscope",
        "envKey": "DASHSCOPE_API_KEY"
      }
    ]
  },
  "env": {
    "DASHSCOPE_API_KEY": "sk-xxxxxxxxxxxxx"
  },
  "security": {
    "auth": {
      "selectedType": "openai"
    }
  },
  "model": {
    "name": "qwen3.6-plus"
  }
}

Step 2: Understand each field

Field What it does
modelProviders Declares which models are available and how to connect to them. Keys like openai, anthropic, gemini represent the API protocol.
modelProviders[].id The model ID sent to the API (e.g. qwen3.6-plus, gpt-4o).
modelProviders[].envKey The name of the environment variable that holds your API key.
modelProviders[].baseUrl The API endpoint URL (required for non-default endpoints).
env A fallback place to store API keys (lowest priority; prefer .env files or export for sensitive keys).
security.auth.selectedType The protocol to use on startup (openai, anthropic, gemini, vertex-ai).
model.name The default model to use when Qwen Code starts.

Step 3: Start Qwen Code — your configuration takes effect automatically:

qwen

Use the /model command at any time to switch between all configured models.

More Examples
Coding Plan (Alibaba Cloud ModelStudio) — fixed monthly fee, higher quotas
{
  "modelProviders": {
    "openai": [
      {
        "id": "qwen3.6-plus",
        "name": "qwen3.6-plus (Coding Plan)",
        "baseUrl": "https://coding.dashscope.aliyuncs.com/v1",
        "description": "qwen3.6-plus from ModelStudio Coding Plan",
        "envKey": "BAILIAN_CODING_PLAN_API_KEY"
      },
      {
        "id": "qwen3.5-plus",
        "name": "qwen3.5-plus (Coding Plan)",
        "baseUrl": "https://coding.dashscope.aliyuncs.com/v1",
        "description": "qwen3.5-plus with thinking enabled from ModelStudio Coding Plan",
        "envKey": "BAILIAN_CODING_PLAN_API_KEY",
        "generationConfig": {
          "extra_body": {
            "enable_thinking": true
          }
        }
      },
      {
        "id": "glm-4.7",
        "name": "glm-4.7 (Coding Plan)",
        "baseUrl": "https://coding.dashscope.aliyuncs.com/v1",
        "description": "glm-4.7 with thinking enabled from ModelStudio Coding Plan",
        "envKey": "BAILIAN_CODING_PLAN_API_KEY",
        "generationConfig": {
          "extra_body": {
            "enable_thinking": true
          }
        }
      },
      {
        "id": "kimi-k2.5",
        "name": "kimi-k2.5 (Coding Plan)",
        "baseUrl": "https://coding.dashscope.aliyuncs.com/v1",
        "description": "kimi-k2.5 with thinking enabled from ModelStudio Coding Plan",
        "envKey": "BAILIAN_CODING_PLAN_API_KEY",
        "generationConfig": {
          "extra_body": {
            "enable_thinking": true
          }
        }
      }
    ]
  },
  "env": {
    "BAILIAN_CODING_PLAN_API_KEY": "sk-xxxxxxxxxxxxx"
  },
  "security": {
    "auth": {
      "selectedType": "openai"
    }
  },
  "model": {
    "name": "qwen3.6-plus"
  }
}

Subscribe to the Coding Plan and get your API key at Alibaba Cloud ModelStudio(Beijing) or Alibaba Cloud ModelStudio(intl).

Multiple providers (OpenAI + Anthropic + Gemini)
{
  "modelProviders": {
    "openai": [
      {
        "id": "gpt-4o",
        "name": "GPT-4o",
        "envKey": "OPENAI_API_KEY",
        "baseUrl": "https://api.openai.com/v1"
      }
    ],
    "anthropic": [
      {
        "id": "claude-sonnet-4-20250514",
        "name": "Claude Sonnet 4",
        "envKey": "ANTHROPIC_API_KEY"
      }
    ],
    "gemini": [
      {
        "id": "gemini-2.5-pro",
        "name": "Gemini 2.5 Pro",
        "envKey": "GEMINI_API_KEY"
      }
    ]
  },
  "env": {
    "OPENAI_API_KEY": "sk-xxxxxxxxxxxxx",
    "ANTHROPIC_API_KEY": "sk-ant-xxxxxxxxxxxxx",
    "GEMINI_API_KEY": "AIzaxxxxxxxxxxxxx"
  },
  "security": {
    "auth": {
      "selectedType": "openai"
    }
  },
  "model": {
    "name": "gpt-4o"
  }
}
Enable thinking mode (for supported models like qwen3.5-plus)
{
  "modelProviders": {
    "openai": [
      {
        "id": "qwen3.5-plus",
        "name": "qwen3.5-plus (thinking)",
        "envKey": "DASHSCOPE_API_KEY",
        "baseUrl": "https://dashscope.aliyuncs.com/compatible-mode/v1",
        "generationConfig": {
          "extra_body": {
            "enable_thinking": true
          }
        }
      }
    ]
  },
  "env": {
    "DASHSCOPE_API_KEY": "sk-xxxxxxxxxxxxx"
  },
  "security": {
    "auth": {
      "selectedType": "openai"
    }
  },
  "model": {
    "name": "qwen3.5-plus"
  }
}

Tip: You can also set API keys via export in your shell or .env files, which take higher priority than settings.jsonenv. See the authentication guide for full details.

Security note: Never commit API keys to version control. The ~/.qwen/settings.json file is in your home directory and should stay private.

Local Model Setup (Ollama / vLLM)

You can also run models locally — no API key or cloud account needed. This is not an authentication method; instead, configure your local model endpoint in ~/.qwen/settings.json using the modelProviders field.

Ollama setup
  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull qwen3:32b
  3. Configure ~/.qwen/settings.json:
{
  "modelProviders": {
    "openai": [
      {
        "id": "qwen3:32b",
        "name": "Qwen3 32B (Ollama)",
        "baseUrl": "http://localhost:11434/v1",
        "description": "Qwen3 32B running locally via Ollama"
      }
    ]
  },
  "security": {
    "auth": {
      "selectedType": "openai"
    }
  },
  "model": {
    "name": "qwen3:32b"
  }
}
vLLM setup
  1. Install vLLM: pip install vllm
  2. Start the server: vllm serve Qwen/Qwen3-32B
  3. Configure ~/.qwen/settings.json:
{
  "modelProviders": {
    "openai": [
      {
        "id": "Qwen/Qwen3-32B",
        "name": "Qwen3 32B (vLLM)",
        "baseUrl": "http://localhost:8000/v1",
        "description": "Qwen3 32B running locally via vLLM"
      }
    ]
  },
  "security": {
    "auth": {
      "selectedType": "openai"
    }
  },
  "model": {
    "name": "Qwen/Qwen3-32B"
  }
}

Usage

As an open-source terminal agent, you can use Qwen Code in four primary ways:

  1. Interactive mode (terminal UI)
  2. Headless mode (scripts, CI)
  3. IDE integration (VS Code, Zed)
  4. SDKs (TypeScript, Python, Java)

Interactive mode

cd your-project/
qwen

Run qwen in your project folder to launch the interactive terminal UI. Use @ to reference local files (for example @src/main.ts).

Headless mode

cd your-project/
qwen -p "your question"

Use -p to run Qwen Code without the interactive UI—ideal for scripts, automation, and CI/CD. Learn more: Headless mode.

IDE integration

Use Qwen Code inside your editor (VS Code, Zed, and JetBrains IDEs):

SDKs

Build on top of Qwen Code with the available SDKs:

Python SDK example:

import asyncio

from qwen_code_sdk import is_sdk_result_message, query


async def main() -> None:
    result = query(
        "Summarize the repository layout.",
        {
            "cwd": "/path/to/project",
            "path_to_qwen_executable": "qwen",
        },
    )

    async for message in result:
        if is_sdk_result_message(message):
            print(message["result"])


asyncio.run(main())

Commands & Shortcuts

Session Commands

  • /help - Display available commands
  • /clear - Clear conversation history
  • /compress - Compress history to save tokens
  • /stats - Show current session information
  • /bug - Submit a bug report
  • /exit or /quit - Exit Qwen Code

Keyboard Shortcuts

  • Ctrl+C - Cancel current operation
  • Ctrl+D - Exit (on empty line)
  • Up/Down - Navigate command history

Learn more about Commands

Tip: In YOLO mode (--yolo), vision switching happens automatically without prompts when images are detected. Learn more about Approval Mode

Configuration

Qwen Code can be configured via settings.json, environment variables, and CLI flags.

File Scope Description
~/.qwen/settings.json User (global) Applies to all your Qwen Code sessions. Recommended for modelProviders and env.
.qwen/settings.json Project Applies only when running Qwen Code in this project. Overrides user settings.

The most commonly used top-level fields in settings.json:

Field Description
modelProviders Define available models per protocol (openai, anthropic, gemini, vertex-ai).
env Fallback environment variables (e.g. API keys). Lower priority than shell export and .env files.
security.auth.selectedType The protocol to use on startup (e.g. openai).
model.name The default model to use when Qwen Code starts.

See the Authentication section above for complete settings.json examples, and the settings reference for all available options.

Benchmark Results

Terminal-Bench Performance

Agent Model Accuracy
Qwen Code Qwen3-Coder-480A35 37.5%
Qwen Code Qwen3-Coder-30BA3B 31.3%

Ecosystem

Looking for a graphical interface?

  • AionUi A modern GUI for command-line AI tools including Qwen Code
  • Gemini CLI Desktop A cross-platform desktop/web/mobile UI for Qwen Code

Troubleshooting

If you encounter issues, check the troubleshooting guide.

Common issues:

  • Qwen OAuth free tier was discontinued on 2026-04-15: Qwen OAuth is no longer available. Run qwen/auth and switch to API Key or Coding Plan. See the Authentication section above for setup instructions.

To report a bug from within the CLI, run /bug and include a short title and repro steps.

Connect with Us

Acknowledgments

This project is based on Google Gemini CLI. We acknowledge and appreciate the excellent work of the Gemini CLI team. Our main contribution focuses on parser-level adaptations to better support Qwen-Coder models.