qwen-code/docs/design
jinye 64401e1d17
feat(telemetry): support custom resource attributes and add metric cardinality controls (#4367)
* feat(telemetry): support custom resource attributes and add metric cardinality controls

Resolves #4365.

Adds two coupled OpenTelemetry capabilities to make qwen-code's telemetry
production-ready in multi-team / multi-tenant deployments:

1. Custom resource attributes via standard `OTEL_RESOURCE_ATTRIBUTES` and
   `OTEL_SERVICE_NAME` env vars and a new `telemetry.resourceAttributes`
   setting. Operators can now tag every span / log / metric with `team`,
   `env`, `cost_center`, or anything else their backend needs.
2. Metric cardinality controls. `session.id` is moved off the OpenTelemetry
   Resource (where it auto-attached to every metric data point and caused
   unbounded time-series fan-out on Prometheus / ARMS Metric / etc.) and
   gated behind a new opt-in `telemetry.metrics.includeSessionId` toggle.
   Spans and logs still carry `session.id` for trace and log correlation.

Reserved keys (`service.version`, `session.id`) are stripped from both env
and settings sources with a `diag.warn`. `OTEL_SERVICE_NAME` follows the
OTel spec precedence (highest priority for `service.name`). Settings JSON
values are runtime-coerced to strings as defense against hand-edited
non-conforming JSON.

Breaking change: metrics no longer carry `session.id` by default. Operators
who need it can restore the previous behavior with
`QWEN_TELEMETRY_METRICS_INCLUDE_SESSION_ID=true` or
`telemetry.metrics.includeSessionId: true` in settings.json; recommended
only for short-term debugging since it re-introduces the cardinality
problem. For long-term session-level analysis, prefer trace and log
backends which handle per-event data without cardinality pressure.

Design doc: docs/design/telemetry-resource-attributes-design.md

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* docs(telemetry): align reserved-key descriptions with implementation

Round 1 review fixes (#4367). After session.id was added to
RESERVED_RESOURCE_ATTRIBUTE_KEYS in Codex review, four user-facing
descriptions still claimed only service.version was reserved:

- packages/core/src/telemetry/config.ts (merge comment)
- packages/core/src/config/config.ts (TelemetrySettings JSDoc)
- packages/cli/src/config/settingsSchema.ts (schema description)
- packages/vscode-ide-companion/schemas/settings.schema.json (regenerated)

Also corrects scope claim: resource attributes apply to every signal
the SDK exports (OTLP and file outfile share the same Resource), not
just OTLP.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* docs(telemetry): clarify warning destination and surface percent-encoding hint

Round 2 self-review fixes (#4367). Two small but real UX gaps:

1. Reserved-key / malformed-pair / coerce warnings route to the debug
   log (per #3986), not the console — so a user who types
   `OTEL_RESOURCE_ATTRIBUTES=service.version=2.0` sees no feedback that
   the value was silently dropped. Adds a "Troubleshooting" section in
   telemetry.md telling users where to look, and a note in the parser
   docstring documenting where warns go.

2. A literal (unencoded) comma in an env var value is a common foot-gun:
   the parser splits on it, producing a malformed second half that is
   silently dropped. Updates the warn text to include a "hint:
   percent-encode literal commas as %2C" callout, and adds the same
   guidance to the docs.

Deferred to a follow-up: startup-time stderr summary of dropped
attributes. Stderr during TUI render could break Ink rendering, so the
right surface needs separate design.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* test(telemetry): cover first-`=` split contract in OTEL_RESOURCE_ATTRIBUTES parser

Per review feedback on #4367. The parser uses `indexOf('=')` so
the first `=` separates key and value while subsequent `=` stay in
the value. The behavior was correct but untested; a future refactor
to `split('=')` would silently break base64-padded, JWT, or
connection-string values.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* feat(telemetry): tighten resource-attribute input validation + startup summary

Adopts review feedback from #4367 (wenshao via Qwen Code /review).

Five accepted suggestions, bundled because they all touch the same
parse/coerce/strip pipeline:

1. Key percent-decoding (CRITICAL). `parseOtelResourceAttributes` now
   percent-decodes both keys and values per the OTel / W3C Baggage spec.
   Without this, `OTEL_RESOURCE_ATTRIBUTES=service%2Eversion=99` lands
   on Resource as the literal key `service%2Eversion`, bypassing the
   reserved-key filter; a collector that decodes keys downstream could
   then resurrect `service.version` and spoof the version label.

2. Startup summary of dropped attributes. Every `diag.warn` in
   resource-attributes.ts routes only to the OTel debug log (per
   #3986), giving operators zero feedback when their attributes are
   silently dropped. Helpers now optionally accumulate diagnostics
   into a `ResourceAttributeWarnings` array; the resolver collects
   them and the SDK emits a one-time console summary at init (before
   Ink renders, so no TUI conflict).

3. `||` instead of `??` for service.name fallback. Settings can put
   an empty string through `??`, producing a blank `service.name`
   that some backends reject. `||` falls through to the default.

4. `coerceStringResourceAttributes` now trims keys and skips
   empty/whitespace-only keys, matching `parseOtelResourceAttributes`.
   Previously `{"  ": "x"}` or `{"team ": "y"}` from settings.json
   would land as malformed Resource attributes.

5. `OTEL_SERVICE_NAME` is trimmed before the truthy check, so values
   like `'  '` or `'\t'` are treated as unset rather than producing
   a whitespace-only service name on Resource.

One suggestion declined (in-thread reply on PR):

- "Redundant `?? {}` in sdk.ts:160" — intentional defense-in-depth for
  `vi.mock('../config/config.js')` callers in `telemetry.test.ts` where
  auto-stub returns undefined. The reviewer is right that production
  code paths never hit it, but tests do.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)

* fix(telemetry): trim whitespace-only service.name + add invalid-key-encoding test

Adopts two review suggestions on #4367 (wenshao via Qwen Code /review):

1. `service.name` fallback uses `.trim() || SERVICE_NAME` instead of plain
   `||`. Plain `||` lets whitespace-only values (`" "`, `"\t"`) through as
   truthy, producing a blank service name on Resource that some backends
   reject. Both settings (no value trimming) and env (`%20` decodes to `" "`)
   can deliver such values. Test added.

2. Adds `key%ZZ=val` to the parameterized parser test to cover the
   invalid-percent-encoding-on-key catch branch. Previously only the
   value-side catch was tested.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
2026-05-21 13:54:37 +08:00
..
adaptive-output-token-escalation fix(core): recover from truncated tool calls via multi-turn continuation (#3313) 2026-04-21 17:04:24 +08:00
auth refactor(cli): provider-first auth registry with unified install pipeline (#3864) 2026-05-08 12:19:28 +08:00
auto-memory feat(memory): managed auto-memory and auto-dream system (#3087) 2026-04-16 20:05:45 +08:00
channels docs(channels): consolidate design docs into single file 2026-04-02 11:17:37 +08:00
compact-mode feat: optimize compact mode UX — shortcuts, settings sync, and safety (#3100) 2026-04-16 09:29:24 +08:00
compaction-image-stripping feat(core): strip inline media before chat compaction summary (#4101) 2026-05-14 10:20:11 +08:00
customize-banner-area feat(cli): customize banner area (logo, title, hide) (#3710) 2026-05-07 10:17:53 +08:00
fork-subagent feat(core): implement fork subagent for context sharing (#2936) 2026-04-14 14:27:38 +08:00
prompt-suggestion fix(followup): prevent tool call UI leak and Enter accept buffer race (#2872) 2026-04-09 00:07:03 +08:00
session-recap fix(cli): rework session recap rendering and add blur threshold setting (#3482) 2026-04-21 14:39:13 +08:00
session-title feat(session): auto-title sessions via fast model, add /rename --auto (#3540) 2026-04-23 20:37:05 +08:00
skill-nudge feat(memory): add autoSkill background project skill extraction (#3673) 2026-05-09 14:25:02 +08:00
slash-command feat(cli): improve slash command discovery (#3736) 2026-05-09 14:25:44 +08:00
structured-output docs: user + design docs for --json-schema structured output (#4051) 2026-05-17 23:10:34 +08:00
tool-use-summary fix(cli): add API Key option to qwen auth interactive menu (#3624) 2026-04-27 22:01:47 +08:00
2026-05-15-async-memory-recall-design.md fix(core): decouple auto-memory recall from main-agent request path (#4172) 2026-05-19 13:58:58 +08:00
auto-compaction-threshold-redesign.md fix(core): replace structuredClone with shallow copy to prevent OOM in long sessions (#4286) 2026-05-21 10:28:59 +08:00
custom-api-key-auth-wizard-prd.md docs(auth): add custom API key wizard PRD (#3583) 2026-05-13 14:04:41 +08:00
markdown-syntax-extension.md feat(cli): expand TUI markdown rendering (#3680) 2026-05-07 16:24:13 +08:00
openrouter-auth-and-models.md refactor(cli): remove legacy qwen auth CLI subcommand, redirect to /auth TUI dialog (#3959) 2026-05-11 16:44:09 +08:00
telemetry-resource-attributes-design.md feat(telemetry): support custom resource attributes and add metric cardinality controls (#4367) 2026-05-21 13:54:37 +08:00
workflow-tracing-gaps.md feat(telemetry): unify span creation paths for hierarchical trace tree (#4126) 2026-05-16 22:29:55 +08:00
worktree.md feat(worktree): Phase C — session persistence, hooksPath, Footer + WorktreeExitDialog, three-mode --resume restore (#4174) 2026-05-19 13:59:35 +08:00