* docs: add user + design docs for --json-schema structured output Follows up #3598 (cli/core feature shipped to main, no docs). **User doc** `docs/users/features/structured-output.md` — covers quick-start, schema input forms (inline + `@path`), output shapes per `--output-format`, parse-time restrictions, retry/failure modes, privacy redaction, permission gating, MCP shadow-tool handling, and a worked `jq`-piped pipeline example. Registered under the existing `features/_meta.ts` so it shows up in the docs sidebar between "Headless Mode" and "Dual Output". **Design doc** `docs/design/structured-output/structured-output.md` — why the synthetic-tool-whose-param-schema-is-the-user-schema approach, the four-stage parse-time validation pipeline, `schemaRootAcceptsObject`'s decided-vs-deferred boundaries, main-turn vs drain-turn parity via `processToolCallBatch`, the structured- success terminal block, the cross-surface privacy redaction sharing `STRUCTURED_OUTPUT_REDACTED_ARGS`, subagent context handling (`forSubAgent`), MCP shadow-tool guard, the compatibility surface, alternatives considered (and why rejected), and a file-by-file index. Both docs are English-only — repo convention is English-only for both `docs/users/features/` (zero zh-CN siblings) and `docs/design/` (only `customize-banner-area/` has a zh-CN twin). Open to adding zh-CN translations as a separate PR if there's demand. * docs(structured-output): address PR review feedback User doc: - explicit stdout-vs-stderr contract and `{}`-schema behavior. - 500 ms shutdown-holdback latency note. - ReDoS warning for user-supplied `pattern` keywords. - root `$ref` rejection + `allOf` workaround. - per-retry token cost note. - sibling-suppression success vs retry paths split out. - numeric exit codes (1 / 53 / 130) for every failure mode. - new "Session resumption" section for --continue / --resume. Design doc: - gloss the ToolSearch on-demand-loading reference. - `not` row: drop the array-indexing-lookalike `[…]`. - 500 ms holdback is best-effort, not guaranteed. - redaction rationale extends to validation-failure retries. - `CORE_TOOLS` phrasing: structured_output is excluded FROM the set; skill is in a separate dynamically-discovered category. - subagent suppression maintainer note (single brittle call path). - `--bare` parenthetical lists the three retained core tools. - PR #4001 status (closed 2026-05-11, superseded). * docs(structured-output): correct empty-schema / holdback / SIGINT claims Three doc claims were stronger than the actual code behaviour: - **Empty schema produces `{}`, not `null`.** `turn.ts` normalises the tool args via `(fnCall.args || {})` before they land in `structuredSubmission`, so a zero-arg call against `{}` is emitted as `{}` on stdout. The `?? null` in the adapter is defence-in-depth for the strictly-undefined case, which the upstream path doesn't produce. - **Holdback is a cap, not a fixed wait.** The loop guard is `Date.now() < deadline && registry.hasUnfinalizedTasks()`, so it exits immediately when nothing is in flight. Reword as "capped at ~500 ms" with an early-exit note. - **SIGINT can still flush a captured result.** The holdback loop does not poll the abort signal, so a SIGINT after the structured call is captured but before `adapter.emitResult` finishes may still land on stdout. Treat exit code 130 as the source of truth. Also addresses the new auto-review summary suggestion about per-turn schema cost: pull the cost callout up out of the bullet list (so it covers both retry cost and schema-embedded-every-turn cost), since the schema-embedding cost isn't retry-specific. * docs(structured-output): correct stdout/stderr + json-mode envelope claims Two doc claims didn't match `JsonOutputAdapter.emitResult`: - **Model prose doesn't go to stderr in text mode.** Only error messages and log lines do. Successful runs emit just the JSON-stringified payload on stdout; accumulated assistant prose is discarded entirely (not mirrored to stderr). Point users at `--output-format json` / `stream-json` when they need the prose. - **`--output-format json` emits a JSON array, not a single document with top-level fields.** The adapter calls `JSON.stringify(this.messages)` where `messages` is an array of message objects. `structured_result` lives on the final `type: "result"` element of that array, not at the document root, so consumers must read `.[-1].structured_result` rather than `.structured_result`. * docs(structured-output): note schema-itself reaches the provider The Privacy section so far only described `structured_output` *args* being redacted from local on-device surfaces (telemetry + chat recording). The schema body is a separate exposure surface — it ships as the function declaration's `parameters` block on every model request, so `enum`, `const`, `default`, `examples`, `description`, `$comment`, etc. travel to the provider in cleartext. Users defaulting to "redaction covers everything" could legitimately leak secrets via schema-literal fields. Add a callout in the user doc, plus a parallel paragraph in the design doc explaining why the redaction stops at on-device surfaces (the model needs the schema to satisfy the tool-call contract, so provider-side redaction isn't possible). * docs(structured-output): correct stdout-on-failure / ReDoS example / hooks / --bare deny / typo Five issues from the latest /qreview pass: - **stdout-vs-stderr is text-mode only.** In `--output-format json` and `stream-json`, the failure result message is emitted on stdout (final element of the JSON array, or the terminating `result` line on the JSONL stream). Wrappers in those modes must switch on `is_error`, not on whether stdout is empty. - **ReDoS example didn't actually demonstrate the threat.** JSON Schema `pattern` only fires on string instances, and tool args are always objects, so the bare `{"pattern": "(a+)+b"}` schema doesn't constrain anything the model can supply. Move the pattern inside a string-typed property. - **Hooks see raw `tool_input`.** `PreToolUse` / `PostToolUse` / `PostToolUseFailure` receive the unredacted args — including HTTP hooks that can forward off-device. Call this out explicitly so users with audit-style catch-all hooks know to filter or add hook-side redaction. - **`--bare` drops settings-level deny.** Bare mode builds `mergedDeny` as `[...(bareMode ? [] : settings.permissions.deny), …]` — settings-level denies are skipped while the synthetic tool stays registered. Argv-level `--exclude-tools` still applies. Document this exception in the user doc and the design doc. - **`maxSessionTurns` hint typo.** The hint points at "schema is unsatisfiable" — the original text inverted the polarity. * feat(core): PR-2.5 — post-promote stream redirect + natural-exit registry settle Closes the two limitations PR-2 (#3894) deferred for the Phase D part (b) Ctrl+B promote flow (#3831): 1. **Post-promote stream redirect**: today the `bg_xxx.output` file is frozen at promote time because `ShellExecutionService` detaches its data listener as part of PR-1's ownership-transfer contract. PR-2.5 wires a caller-side `onPostPromoteData` callback so bytes from the still-running child append to the file via an `fs.createWriteStream` opened in `handlePromotedForeground`. 2. **Natural-exit registry settle**: today the registry entry stays `'running'` until `task_stop` / session-end `abortAll` fires its abort listener. PR-2.5 wires `onPostPromoteSettle` so natural child exit transitions the entry to `'completed'` / `'failed'` with the right exitCode / signal / error message. ## Service (`shellExecutionService.ts`) - New exported types: `ShellExecuteOptions`, `ShellPostPromoteHandlers`, `ShellPostPromoteSettleInfo`. - `execute()` options bag now accepts `postPromote?: { onData, onSettle }`. Threaded through to both `executeWithPty` and `childProcessFallback`. - PTY's `performBackgroundPromote` (line ~1159): after disposing the foreground data + exit + error listeners, RE-ATTACH minimal forwarders that call `postPromote.onData` / `postPromote.onSettle` when the caller opted in. Backwards compat: when `postPromote` is unset the PR-2 detach-everything contract is preserved (the re-attach is gated on each callback being defined). - `childProcessFallback`'s `performBackgroundPromote` (line ~706): same pattern — re-attach `stdout.on('data', ...)`, `stderr.on('data', ...)`, `child.once('exit', ...)`, `child.once('error', ...)` when the caller opted in. `error` listener routes through `onSettle` with `error` populated, so spawn-side errors after the foreground errorHandler detached don't crash the daemon via the default unhandled `'error'` event. - Both paths wrap caller callbacks in try/catch so a thrown handler doesn't crash the child's data loop / unhandled-rejection the service. ## Shell tool (`shell.ts`) - New `PromoteArtifacts` type — slots shared between the foreground `execute()` postPromote handlers (which fire on the service side as soon as promote happens) and the post-resolve `handlePromotedForeground` finalizer (which runs after `await resultPromise` returns). The two race; the buffer + settle-queue absorb that race so neither chunks nor the eventual exit info are lost. - `executeForeground` wires `postPromote` handlers that route data to either `promoteArtifacts.stream` (if open) or `promoteArtifacts.buffer` (drained when the stream opens), and queue settle info if the wired handler isn't yet installed. - `handlePromotedForeground` opens `fs.createWriteStream(outputPath, { flags: 'w' })`, writes the initial snapshot first, drains the buffer, then registers the entry and wires `onSettleWired` with the full registry decision table: - `error` set → `registry.fail(shellId, error.message, endTime)` - `exitCode === 0` → `registry.complete(shellId, 0, endTime)` - non-zero exitCode → `registry.fail(shellId, "Exited with code N", endTime)` - signal !== null → `registry.fail(shellId, "Terminated by signal N", endTime)` - all-null fallback → `registry.fail(shellId, "Exited with unknown status", endTime)` - Fires queued settle synchronously after wiring so a fast command that exits between promote and finalizer doesn't get lost. - Self-audit catch: closes the output stream on the `registry.register` throw path so the FD doesn't leak past the orphan-child kill. ## Tests - 3 new in `shellExecutionService.test.ts`: - `post-promote bytes route to postPromote.onData when callback provided` - `postPromote.onSettle fires on natural child exit after promote` - `backwards compat: without postPromote, listeners stay fully detached` - 3 new in `shell.test.ts` under a `foreground → background promote PR-2.5` describe block: - `post-promote bytes APPEND to bg_xxx.output via write stream` - `natural child exit transitions registry entry to "completed"` - `non-zero exit / signal / error → "failed" with descriptive message` - Bulk-replaced 50 prior `{},` (empty 6th-arg shellExecutionConfig) with `expect.objectContaining({}),` + added `expect.objectContaining({ postPromote: expect.any(Object) }),` as the 7th-arg expectation for the foreground execute call. - Updated the existing `registers a bg_xxx entry on result.promoted` test to assert on `fs.createWriteStream` + `stream.write` instead of the now-removed `fs.writeFileSync` snapshot path. 182/182 shell.test.ts pass + 73/73 shellExecutionService.test.ts pass + 111/111 coreToolScheduler.test.ts pass + 60/60 AppContainer.test.tsx pass; tsc + ESLint clean. Self-audit: 3 rounds (positive / reverse / cross-file) found one issue — output stream FD leak on `registry.register` throw — and fixed it before flagging complete. All flagged edge cases (stream errors, child-exits-before-wire-up race, task_stop during natural- exit window, promote-never-happens cleanup, backwards compat without callbacks) have explicit handling and / or test pinning. * fix(core): #4102 review wave — 3 Critical + UTF-8 + tests 3 Critical race/correctness issues + 1 multibyte-corruption suggestion + 3 test coverage gaps addressed: **Critical 1 — child_process late-chunk drop (service)** Settle was fired on 'exit', but stdout/stderr can emit buffered data between 'exit' and 'close'. Late chunks landed in `promoteArtifacts.buffer` after shell.ts had already closed the stream + transitioned the registry → silently dropped → truncated `bg_xxx.output`. Switched to listening on 'close' which guarantees all stdio is fully drained. (code, signal) payload is identical to 'exit', just with proper ordering. **Critical 2 — stream-flush wait before registry transition (shell)** `stream.end()` is asynchronous; pending writes can still be in the libuv queue when it returns. The old code transitioned the registry immediately after `.end()`, so a /tasks consumer could observe a `completed` entry and read the output file BEFORE the trailing bytes were on disk. Fixed: wired settle now `stream.once('finish', ...)` BEFORE calling `registry.complete/fail`. `error` event also short-circuits to the transition so a late ENOSPC doesn't hang the settle path forever. **Critical 3 — stream-open-fail buffer leak (shell)** If `fs.createWriteStream` threw, the catch path set `stream = null` but the foreground `onData` handler would still take the `stream === null` branch and push chunks into `promoteArtifacts.buffer` — unbounded growth under a sustained child whose output file couldn't be opened. Added a `streamFailed: boolean` latch on `PromoteArtifacts`. When set, `onData` drops chunks (with a debug log) instead of buffering. The catch branch sets the latch. **Suggestion — shared TextDecoder corrupts multibyte UTF-8 (service)** child_process post-promote used ONE TextDecoder for both stdout AND stderr. The decoder's continuation-byte state machine assumes one byte source; interleaved multibyte chunks corrupted. Now uses separate decoders + flushes both with `decode()` (no `stream: true`) on settle so trailing bytes surface as their final characters. **Suggestion — llmContent reflects already-settled status (shell)** When the queued-settle drain transitions the registry synchronously (fast-exit race), the model-facing copy was still saying "Status: running. … task_stop({...})". Updated to branch on `postPromoteAlreadySettled` / `postPromoteFinalStatus` — when the process is already gone, the copy says "Status: completed/failed" and replaces the `task_stop` suggestion with "Process has already exited; no `task_stop` needed". **Suggestion — test coverage gaps** Added: (a) `queued-settle race: onSettle BEFORE handlePromotedForeground completes` — custom service impl fires onSettle synchronously before resolving the promote promise, pins the drain path. (b) child_process post-promote tests for stdout/stderr forwarding + 'close'-not-'exit' settle + spawn-error settle. **Self-audit**: Round 1 + reverse audit. Stream.once mock added to fire 'finish' synchronously so existing tests don't hang on the new flush wait. 76/76 shellExecutionService.test.ts (+3) + 183/183 shell.test.ts (+1) pass; tsc + ESLint clean. * fix(core): #4102 review wave-2 — 3 more from gpt-5.5 C1 (shell.ts:2227): the WriteStream `'error'` event handler only logged. `fs.createWriteStream` reports common open failures (ENOENT / EACCES / ENOSPC) asynchronously via that event rather than throwing. Result: `promoteArtifacts.stream` kept pointing at the failed stream; `onSettleWired` attached a `.once('finish')` listener that would never fire → registry stuck on `running` forever. Latch the failure (null the shared `stream` slot, set `streamFailed`); `onSettleWired`'s existing `if (!stream)` branch then transitions the registry immediately. C2 (shellExecutionService.ts:1468): the promote handoff removes the foreground `ptyErrorHandler` and only re-attaches data + exit listeners. A subsequent PTY `error` event had no listener — Node treats an unhandled `error` from an EventEmitter as a fatal exception that takes the whole CLI down. Attach a post-promote forwarder that ignores expected PTY read-exit codes (EIO / EAGAIN, same filter the foreground handler uses) and routes unexpected errors through `postPromote.onSettle` with `error` populated. Single-fire latch shared with `onExit` so settle never fires twice. C3 (shell.ts:2503): `onSettleWired` waits for the stream's asynchronous `'finish'` event before flipping `postPromoteAlreadySettled`, but the model-facing `statusLine` was built immediately after invoking `onSettleWired` on the queued settle. A fast-exited promoted command could therefore land "Status: running" + a `task_stop` instruction in production even though settle was already observed. Split into two flags: `postPromoteSettleObserved` (set synchronously when settle is classified) drives the model copy; the registry transition stays behind the stream flush. Tests: +1 PR-2.5 wave-2 PTY error-routing test; +2 shell.ts tests (stream open async error → registry still transitions; async `'finish'` after queued-settle drain → llmContent says 'completed' before registry transition fires). * fix(core): #4102 review wave-3 — 4 actionable from deepseek-v4-pro T2 (shell.ts:2456) — Critical buffer-leak race `onSettleWired` previously set `promoteArtifacts.stream = null` BEFORE calling `stream.end()`. Any `postPromote.onData` chunk that landed between that null assignment and the actual flush completing saw `stream === null && streamFailed === false` and pushed into `promoteArtifacts.buffer` — a buffer that has no further drain path (the foreground finalizer has already returned). Result: chunks stranded indefinitely; PTY mode in particular hits this because `onExit` can fire while kernel buffers still hold data. Fix drains the pre-settle buffer to the stream BEFORE nulling AND latches `streamFailed = true` so any subsequent chunk drops via the existing `else if (streamFailed)` arm in `onData` instead of leaking. Updates the `streamFailed` doc to cover both setters (open-fail and settle-done) so the dual semantic is explicit. T3 (shell.ts:2262) — silent chunk-drop in catch path When `fs.createWriteStream` throws synchronously (rare: ENOENT on a vanished tmpdir), chunks already in `promoteArtifacts.buffer` were silently lost with no observability — oncall reading a truncated `bg_xxx.output` had no way to distinguish "stream open failed" from "child produced nothing." Logs the dropped chunk count and empties the buffer. T5 (shell.ts:2443) — opaque all-null fallback The "Exited with unknown status" fallback fired the registry to 'failed' without any context about which fields were null. This branch is meant to be unreachable; hitting it indicates the service emitted a defective settle info object. Includes the field values in both the fail message and a warn log so the oncall engineer can tell this path apart from the other "failed" branches. T6 (shellExecutionService.ts:1452) — leaked PTY post-promote listeners `ptyProcess.onData(...)` returns an `IDisposable` that was being discarded; same for `onExit`. The `'error'` listener function was also not captured (no way to `removeListener` it). EventEmitter holds refs to listener closures, which transitively hold refs to `onPostData` / `onPostSettle` / the caller's `promoteArtifacts`. While bounded by the PTY's lifetime, the closures keep the caller's state pinned for the post-settle delay window. Captures all three handles into `postPromoteDataDisposable` / `postPromoteExitDisposable` / `postPromoteErrorListener`, then releases them via a shared `disposePostPromoteListeners()` call from `firePostSettle` (idempotent — each slot null-checked and nulled after disposal). Tests: +1 service test for IDisposable + error-listener cleanup; +2 shell.ts tests for buffer drain race and catch-path snapshot fallback. Existing tests stay green (262 → 265 in the touched suites; 7819 → 7822 across the core package). * fix(core/test): drop unused 'registry' in wave-3 T2 test (TS6133) CI build failed across all platforms with src/tools/shell.test.ts(4395,15): error TS6133. The variable was a leftover from copying the queued-settle test pattern; the wave-3 T2 test inspects writeStreamMock.write call history directly and never reads the registry, so the assignment is dead code. Drop it. * fix(core): #4102 review wave-4 — 6 actionable from gpt-5.5 + deepseek-v4-pro T1 (Critical, shellExecutionService.ts:860 child_process onSettle exactly-once) The PTY path used a `firePostSettle` latch but child_process wired `close` and `error` independently to `onPostSettle`. A spawn-side error followed by Node's auto-emitted `'close'` would call the caller's settle TWICE, racing the registry transition. Added the same single-fire latch on the child_process path. T2 (Critical, shell.ts:2264 handoff race reorder) Original order was `write(snapshot) -> drain buffer -> assign stream`. Synchronous today (no race in current code), but assign-after-drain leaves a hazard for any future refactor that adds an `await` inside the drain loop — a chunk arriving in that window would land in `promoteArtifacts.buffer`, then post-assign chunks would write to the stream first, producing out-of-order bytes until the settle drain. Reordered to `write(snapshot) -> assign stream -> drain buffer`, which closes the hazard regardless of future async additions. T3 (Suggestion, shellExecutionService.ts:816 decoder flush gated on onSettle) The trailing-multibyte flush ran inside the `child.once('close', ...)` handler, which was only installed when `onSettle` was set. An `onData`-only caller (no onSettle) lost trailing continuation bytes silently. Hoisted flush into `flushPostPromoteDecoders` called from `firePostSettle`, and made `firePostSettle` available on the `'close'` path independent of onSettle (T6 install). T4 (Suggestion, shell.ts:1700 promoted ANSI passthrough) The regular `executeBackground` path strips ANSI before writing to `bg_xxx.output`; the promoted-foreground onData path appended raw chunks. Reading `bg_xxx.output` after Ctrl+B showed plain text up to the snapshot then raw `\x1b[31m` / cursor-move / clear-screen sequences for the post-promote tail — unreadable. Apply `stripAnsi(rawChunk)` before write/buffer, matching the executeBackground contract. T5 (Suggestion, shellExecutionService.ts:786 UTF-8 hardcoded) The post-promote child_process decoders were hard-coded to `new TextDecoder('utf-8')`, but the foreground decoder runs encoding detection via `getCachedEncodingForBuffer`. On a non-UTF-8 child (e.g. GBK on a Chinese Windows shell), the snapshot decoded correctly but the post-promote tail was mojibake. Capture the foreground decoder's `.encoding` property and reuse it for post-promote (with utf-8 fallback if foreground hadn't seen any bytes yet, and a try/catch around `new TextDecoder` for the rare unsupported-encoding case). T6 (Suggestion, shellExecutionService.ts:1540 `error` listener gated on onSettle) The post-promote `error` listener was attached only when `onSettle` was set. An `onData`-only caller still had the foreground errorHandler detached; a post-promote spawn error would then crash the CLI via Node's unhandled-error default. Hoisted the close + error listeners into `if (postPromote)` so any caller opting into post-promote gets crash protection; if `onSettle` is absent the listeners log + drop instead of routing. T7 (Suggestion, shellExecutionService.ts:791 onSettle-only pipe-block deadlock) Same root cause as T6: when only `onSettle` is set, the foreground `stdout`/`stderr` 'data' listeners are detached and no post-promote listener replaces them. The Readables stay paused, the OS pipe buffer fills (~64KB on Linux), the child blocks on `stdout.write`, 'close' never fires, onSettle never fires. Added `child.stdout?.resume()` and `child.stderr?.resume()` in the no-onData branch so the child can drain its pipes and reach exit. T8 (Suggestion, shell.ts:2614 dead inspectLine ternary) `inspectLine`'s ternary returned the same string on both sides — copy-paste leftover from when the other two adjacent ternaries (statusLine / stopLine) were correctly varied. Collapsed to a single string assignment. Tests: +5 regression tests (4 child_process: T1 double-fire latch, T3 onData-only flush, T6 onData-only error survives, T7 onSettle- only resume; +1 shell.ts: T4 ANSI strip). 265 -> 270 in the touched suites; 7822 -> 7827 across the core package; full suite green. * fix(core/test): use ShellOutputEvent type in wave-4 onData callbacks (TS2345) CI lint failed on the wave-4 (T3 / T6) tests with TS2345: pushing ShellOutputEvent into Array<{type:string;chunk:unknown}> narrows incompatibly. Switch to ShellOutputEvent[] (matches earlier helpers at lines 758/966) and discriminate the union via .type === 'data' when reading .chunk so the narrowed multibyte assertion still type-checks. * docs(structured-output): address doudouOUC's four review findings - Tighten JSON/stream-json paragraph: not all failures emit a result to stdout (exit 53 / exit 130 are stderr-only); check exit code first - Fix suppressed-sibling retry guidance: re-issue in a separate turn that does not include structured_output (avoids re-suppression) - Distinguish settings-deny (exit 53) from --exclude-tools (exit 1) in Permission gating section - Replace <projectDir> placeholder with actual path ~/.qwen/projects/<sanitized-cwd>/chats/<sessionId>.jsonl in both docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(structured-output): fix Permission gating — both deny paths strip registration Forward audit against source code found that the Permission gating section incorrectly distinguished settings.permissions.deny (claiming tool stays visible, exit 53) from --exclude-tools (claiming declaration stripped, exit 1). Both go through the same mergedDeny → isToolEnabled path and both prevent registration — the model never sees the tool. Corrected both docs to reflect the actual mechanism: typical outcome is plain text (exit 1), with maxSessionTurns (exit 53) as the fallback if the model loops through other tools. * docs(structured-output): address doudouOUC's May 17 review (5 items) - Clarify validation is client-side Ajv, not provider-side - Qualify "same way" with DeclarativeTool abstraction parenthetical - Match symptom→cause structure for maxSessionTurns hint - Expand $ref workaround with concrete $defs example - Clarify Dual Output See Also doesn't require --json-schema * docs(structured-output): address 2 unresolved design-doc suggestions 1. Privacy/redaction section: note hooks as intentionally non-redacted surface (matches user-doc "Hooks see raw args" callout). 2. Dual call-site section: clarify differing post-helper termination flow between main-turn (direct return) and drain-turn (sentinel hop). * docs(structured-output): address doudouOUC's May 17 review (2 nits) 1. Failure-paths table: align "three common causes" cell with the symptom→cause framing already used at parse-time validation pipeline section ("common stuck-run symptom and its two likely causes"). 2. Dual call-site section: fix factual inaccuracy from prior commit — `drainOneItem` is `async (): Promise<void>` and returns nothing. The two-hop termination is via closure-mutated `structuredSubmission` (set by `processToolCallBatch`, checked by `drainLocalQueue` and the holdback loop), not a return-value sentinel. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
16 KiB
Structured Output (--json-schema)
Constrain the model's final answer to a JSON Schema you supply. Qwen Code registers a synthetic terminal tool the model is required to call, parses the call's arguments against your schema, and exposes the validated payload on stdout (or in the JSON / stream-json result envelope). The first valid call ends the run.
Headless only — works with qwen -p, a positional prompt, or a prompt
piped via stdin.
Quick start
qwen --prompt "Summarize the changes in HEAD with risk_level" \
--json-schema '{
"type": "object",
"properties": {
"summary": { "type": "string" },
"risk_level": { "type": "string", "enum": ["low", "medium", "high"] }
},
"required": ["summary", "risk_level"],
"additionalProperties": false
}'
Output on stdout (default --output-format text):
{ "summary": "…", "risk_level": "low" }
The line is exactly the JSON-stringified payload + newline — no
envelope, no event log. Pipe it straight into jq or another consumer.
In text mode, stdout is reserved for the JSON payload on success
and is empty on failure; error messages and log lines go to stderr.
That makes $(qwen --json-schema …) || exit 1 capture patterns safe
under text mode — failures land in stderr, not mixed into the captured
variable. The model's incidental prose during planning is not
mirrored to stderr either — text mode discards it; reach for
--output-format json or stream-json if you need to see it.
In --output-format json and stream-json, the failure result
message is emitted on stdout alongside the success path (as the
final element of the JSON array, or the terminating result line on
the JSONL stream). Not all failure modes emit a result to stdout —
max-session-turns (exit 53) and signal interrupts (exit 130) exit with
stderr output only. Check the exit code first; is_error on the
result object disambiguates within the subset of failures that do
produce a result event.
Empty schema: Passing
{}produces{}(an empty JSON object) on stdout. The model callsstructured_outputwith no arguments; the upstream argument-normalisation path turns the empty function call into an empty-object payload, which passes validation against the empty schema and is emitted verbatim.
Supplying the schema
Two equivalent forms:
# Inline JSON literal
qwen -p "…" --json-schema '{"type":"object", "properties":{…}}'
# Read from a file
qwen -p "…" --json-schema @./schemas/summary.json
The @path form expands ~, normalizes the path, and reads the file
with utf8 encoding.
Latency note: Successful runs incur a shutdown holdback capped at ~500 ms while in-flight background agents flush their final notifications before the result is emitted. The holdback exits early if no background tasks are pending, so simple runs barely notice it; batch pipelines that fan out hundreds of
--json-schemainvocations against busy agents should account for this upper bound.
Security note: Schemas may contain user-supplied regular expressions in
patternkeywords. Ajv compiles these with the ECMAScript regex engine, which is vulnerable to catastrophic backtracking. Because tool arguments are always objects, thepatternkeyword only fires inside string properties — a malicious schema like{"type":"object","properties":{"value":{"type":"string","pattern":"(a+)+b"}}}can hang the CLI when the model supplies a moderately long matching value. Only run--json-schemawith schemas from sources you trust.
Validation at parse time:
- The file must be a regular file (no FIFOs, character devices, or directories).
- File size is capped at 4 MiB. Real-world JSON schemas are well under this; multi-MiB files almost always indicate a wrong-path mistake.
- The schema must be valid JSON. For
@pathinput, the parse error is generic ("content of<path>is not valid JSON") rather than echoing the SyntaxError detail, so a wrapping process that surfaces stderr can't read a prefix of the file's contents back from the error. - The schema must compile under the strict Ajv configuration —
typos like
properteesare surfaced, but spec-valid patterns (e.g.requiredwithout listing every key inproperties) are accepted. - The schema root must accept object-typed values. Function-calling APIs (Gemini, OpenAI, Anthropic) all require tool arguments to be JSON objects, so a non-object root would register an unusable tool.
The root-acceptance check walks type, const, enum, anyOf,
oneOf, allOf, not, and if/then/else (best-effort for the
decidable cases). When in doubt it defers to Ajv at runtime.
Root
$refis rejected by the parse-time check. If your schema reuses a definition via$ref, wrap it inallOf:// Rejected: { "$ref": "#/$defs/MyObj", "$defs": { "MyObj": { "type": "object", "properties": { "name": { "type": "string" } } } } } // Accepted (root accepts objects via the allOf branch): { "allOf": [{ "$ref": "#/$defs/MyObj" }], "$defs": { "MyObj": { "type": "object", "properties": { "name": { "type": "string" } } } } }
$refinsideanyOf/oneOf/allOfis deferred to Ajv at runtime, so the wrapped form passes the root-acceptance check.
Output shape per format
--output-format |
What goes to stdout |
|---|---|
text (default) |
JSON.stringify(payload) + "\n" — one line, the validated object. |
json |
A single JSON array of message objects (the full event log). The final element is the type: "result" message, which carries both result (JSON.stringify(payload)) and structured_result (the raw object). |
stream-json |
Each event on its own line as JSONL. The terminating result line carries result (stringified) and structured_result (raw object). |
In both JSON formats, prefer reading structured_result over result
when you want the object; result is the stringified form provided for
consumers that always expect a string in that field. For --output-format json, read the last element of the array and pull structured_result
from there (e.g. jq '.[-1].structured_result'); for stream-json,
read the final type: "result" line on the stream.
Restrictions
| Combination | Behavior |
|---|---|
--json-schema + -i / --prompt-interactive |
Rejected at parse time. The synthetic tool's "session ends now" message has no terminator in the TUI loop. |
--json-schema + --input-format stream-json |
Rejected at parse time. The single-shot terminal contract is incompatible with the long-lived stream-json input protocol. |
--json-schema + --acp / --experimental-acp |
Rejected at parse time. ACP runs its own turn loop that doesn't honor the synthetic-tool terminal contract. |
--json-schema with no prompt and no piped stdin |
Rejected at parse time. Headless mode needs a prompt — pass -p, a positional argument, or pipe one in. |
--bare + --json-schema |
Supported. The synthetic tool is registered alongside the bare three (read_file, edit, run_shell_command). |
--json-schema inside a subagent |
Tool is NOT registered. Only the main / drain turns of the top-level run honor the terminal contract; a subagent calling the tool would receive "session ends now" and then keep running because its loop has no terminator. |
Retry and failure modes
Cost note. Two things multiply token spend in a
--json-schemarun, both worth designing for:
- Schema embedded in every turn. The schema ships as the
structured_outputfunction declaration'sparametersblock on every model request, not just the first. Large schemas (up to the 4 MiB parse cap) proportionally increase per-turn input tokens for the entire run.- Each validation retry is a full model turn. A schema the model misses repeatedly is multiplied per failure (request + inference + response). Keep schemas constrained enough to guide the model and simple enough to nail on the first try; raise
--max-session-turnswhen retries are expected.
The session ends on the first valid call. Until then:
- Args fail validation.
structured_outputreturns a tool-result error with Ajv's message, the model sees it on the next turn, and may correct the arguments and call again. - Model calls a side-effecting tool in the same turn as
structured_output. The pre-scan suppresses the sibling — it never runs, regardless of whether the structured call ultimately validates. The two paths split on what the model sees next:- Validation succeeds: the run ends immediately, and the model never gets another turn — the suppressed sibling is silently discarded.
- Validation fails: the model gets another turn and sees a
synthesised "Skipped:"
tool_resultfor the suppressed call, so it can re-issue that call in a separate turn (one that does not includestructured_output).
- Model emits plain text instead of calling
structured_output. Exit code1. The error message includes the turn count and a truncated preview of the model's output so you can see what it actually said. - Run reaches
maxSessionTurns. Exit code53. Standard "Reached max session turns" exit, plus a--json-schema-specific hint that points at the three common stuck-run causes: model never called the tool,structured_outputis denied by permission rules, or the schema is unsatisfiable. - Run is interrupted (SIGINT / Ctrl-C). Exit code
130. The structured result is normally not emitted, but the shutdown holdback loop does not poll the abort signal, so a SIGINT that arrives after a successful call has been captured but before the result reaches stdout may still land on stdout. Treat the exit code as the source of truth.
Privacy
The args you submit through structured_output ARE the structured
payload — already emitted on stdout. To avoid persisting the same
payload a second time into on-device surfaces that may be exported off
the machine, args are redacted with the placeholder
{ __redacted: 'structured_output payload (see stdout result)' } on:
- The
ToolCallEventtelemetry path (OTLP exports, QwenLogger, ui-telemetry stream, chat-recording UI event mirror). - The on-disk chat-recording JSONL at
~/.qwen/projects/<sanitized-cwd>/chats/<sessionId>.jsonl(re-fed into model context on--continue/--resume), including every validation-failure retry.
Tool-call metrics (duration, success, decision) and surrounding event metadata are preserved.
Schema is sent to the model provider. Redaction covers the call arguments on local surfaces only. The schema itself rides on every model request as the
structured_outputfunction declaration'sparametersblock — so any literal values you put inside it (enum,const,default,examples,description,$comment, etc.) reach the provider in cleartext just like prompt text. Schemas should describe shape and constraints; treat them as public toward the provider and keep secrets, customer records, and other sensitive payloads out of the schema body.
Hooks see raw args. The redaction described above only applies to telemetry and chat-recording.
PreToolUse,PostToolUse, andPostToolUseFailurehooks (including HTTP hooks that can forward payloads off-device) receive the unredactedtool_inputforstructured_output, since the hook contract is "see what the tool sees." If you operate audit-style catch-all hooks, either disable them forstructured_output(filter ontool_name) or add hook-side redaction before running--json-schemaagainst sensitive data.
Session resumption (--continue / --resume)
--json-schema is a per-run flag, not a per-session property. The
synthetic tool is registered when the CLI parses its arguments, so:
- Re-pass
--json-schemaon every--continue/--resumeyou want the terminal contract to apply to. The same schema as the original run is the safe default — a mid-session schema swap is allowed but changes the contract the model is being held to. - If you
--continuewithout--json-schema, the resumed run is an ordinary headless session:structured_outputsimply doesn't exist as a tool, and the model will respond in free-form text. - The
__redactedplaceholder in the resumed chat-recording does not affect resumability in practice. A successfulstructured_outputcall terminates the session immediately, so the only redacted args a resumed run could see are from failed attempts. The model still has each attempt's Ajv validation error in the recordedtool_resultand the live parameter schema (re-registered from--json-schema), which is enough to retry.
Permission gating
structured_output deliberately bypasses the --core-tools allowlist:
the tool only exists when --json-schema is set, so excluding it
would leave the run with no terminal contract.
Explicit permissions.deny rules and --exclude-tools settings DO
take effect — both use the same deny mechanism and both prevent
structured_output from being registered, so the model never sees
the tool declaration. The typical result is that the model answers in
plain text (exit 1). If the model loops through other tools without
ever producing text, it will eventually hit maxSessionTurns
(exit 53) and the --json-schema hint in the error message tells you
where to look.
--barecaveat. Bare mode ignores most settings-derived inputs, including settings-levelpermissions.denyandtools.exclude. The synthetic tool stays registered, so a settings-only deny ofstructured_outputwill silently no-op under--bare. Argv-level--exclude-tools structured_outputstill applies in bare mode — use the flag rather than settings if you need to lock down a bare run.
Conflict with MCP tools
If an MCP server registers a tool literally named structured_output,
the tool-registry collision check renames the MCP tool to
mcp__<server-name>__structured_output so the synthetic tool keeps
the bare name. The user-supplied schema is always the one the model
sees.
Example: gating a multi-step run on the structured output
RESULT=$(qwen --prompt "Audit this diff and rate its risk." \
--json-schema @./schemas/audit.json) || exit 1
risk=$(jq -r '.risk_level' <<<"$RESULT")
if [ "$risk" = "high" ]; then
echo "High-risk diff; pausing pipeline." >&2
exit 2
fi
See also
- Headless Mode — the
-p-based flow--json-schemabuilds on. - Dual Output — records a JSON-event sidecar
alongside the TUI (a different approach to machine-readable output;
does not require
--json-schema).