* feat(telemetry): support custom resource attributes and add metric cardinality controls
Resolves#4365.
Adds two coupled OpenTelemetry capabilities to make qwen-code's telemetry
production-ready in multi-team / multi-tenant deployments:
1. Custom resource attributes via standard `OTEL_RESOURCE_ATTRIBUTES` and
`OTEL_SERVICE_NAME` env vars and a new `telemetry.resourceAttributes`
setting. Operators can now tag every span / log / metric with `team`,
`env`, `cost_center`, or anything else their backend needs.
2. Metric cardinality controls. `session.id` is moved off the OpenTelemetry
Resource (where it auto-attached to every metric data point and caused
unbounded time-series fan-out on Prometheus / ARMS Metric / etc.) and
gated behind a new opt-in `telemetry.metrics.includeSessionId` toggle.
Spans and logs still carry `session.id` for trace and log correlation.
Reserved keys (`service.version`, `session.id`) are stripped from both env
and settings sources with a `diag.warn`. `OTEL_SERVICE_NAME` follows the
OTel spec precedence (highest priority for `service.name`). Settings JSON
values are runtime-coerced to strings as defense against hand-edited
non-conforming JSON.
Breaking change: metrics no longer carry `session.id` by default. Operators
who need it can restore the previous behavior with
`QWEN_TELEMETRY_METRICS_INCLUDE_SESSION_ID=true` or
`telemetry.metrics.includeSessionId: true` in settings.json; recommended
only for short-term debugging since it re-introduces the cardinality
problem. For long-term session-level analysis, prefer trace and log
backends which handle per-event data without cardinality pressure.
Design doc: docs/design/telemetry-resource-attributes-design.md
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
* docs(telemetry): align reserved-key descriptions with implementation
Round 1 review fixes (#4367). After session.id was added to
RESERVED_RESOURCE_ATTRIBUTE_KEYS in Codex review, four user-facing
descriptions still claimed only service.version was reserved:
- packages/core/src/telemetry/config.ts (merge comment)
- packages/core/src/config/config.ts (TelemetrySettings JSDoc)
- packages/cli/src/config/settingsSchema.ts (schema description)
- packages/vscode-ide-companion/schemas/settings.schema.json (regenerated)
Also corrects scope claim: resource attributes apply to every signal
the SDK exports (OTLP and file outfile share the same Resource), not
just OTLP.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
* docs(telemetry): clarify warning destination and surface percent-encoding hint
Round 2 self-review fixes (#4367). Two small but real UX gaps:
1. Reserved-key / malformed-pair / coerce warnings route to the debug
log (per #3986), not the console — so a user who types
`OTEL_RESOURCE_ATTRIBUTES=service.version=2.0` sees no feedback that
the value was silently dropped. Adds a "Troubleshooting" section in
telemetry.md telling users where to look, and a note in the parser
docstring documenting where warns go.
2. A literal (unencoded) comma in an env var value is a common foot-gun:
the parser splits on it, producing a malformed second half that is
silently dropped. Updates the warn text to include a "hint:
percent-encode literal commas as %2C" callout, and adds the same
guidance to the docs.
Deferred to a follow-up: startup-time stderr summary of dropped
attributes. Stderr during TUI render could break Ink rendering, so the
right surface needs separate design.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
* test(telemetry): cover first-`=` split contract in OTEL_RESOURCE_ATTRIBUTES parser
Per review feedback on #4367. The parser uses `indexOf('=')` so
the first `=` separates key and value while subsequent `=` stay in
the value. The behavior was correct but untested; a future refactor
to `split('=')` would silently break base64-padded, JWT, or
connection-string values.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
* feat(telemetry): tighten resource-attribute input validation + startup summary
Adopts review feedback from #4367 (wenshao via Qwen Code /review).
Five accepted suggestions, bundled because they all touch the same
parse/coerce/strip pipeline:
1. Key percent-decoding (CRITICAL). `parseOtelResourceAttributes` now
percent-decodes both keys and values per the OTel / W3C Baggage spec.
Without this, `OTEL_RESOURCE_ATTRIBUTES=service%2Eversion=99` lands
on Resource as the literal key `service%2Eversion`, bypassing the
reserved-key filter; a collector that decodes keys downstream could
then resurrect `service.version` and spoof the version label.
2. Startup summary of dropped attributes. Every `diag.warn` in
resource-attributes.ts routes only to the OTel debug log (per
#3986), giving operators zero feedback when their attributes are
silently dropped. Helpers now optionally accumulate diagnostics
into a `ResourceAttributeWarnings` array; the resolver collects
them and the SDK emits a one-time console summary at init (before
Ink renders, so no TUI conflict).
3. `||` instead of `??` for service.name fallback. Settings can put
an empty string through `??`, producing a blank `service.name`
that some backends reject. `||` falls through to the default.
4. `coerceStringResourceAttributes` now trims keys and skips
empty/whitespace-only keys, matching `parseOtelResourceAttributes`.
Previously `{" ": "x"}` or `{"team ": "y"}` from settings.json
would land as malformed Resource attributes.
5. `OTEL_SERVICE_NAME` is trimmed before the truthy check, so values
like `' '` or `'\t'` are treated as unset rather than producing
a whitespace-only service name on Resource.
One suggestion declined (in-thread reply on PR):
- "Redundant `?? {}` in sdk.ts:160" — intentional defense-in-depth for
`vi.mock('../config/config.js')` callers in `telemetry.test.ts` where
auto-stub returns undefined. The reviewer is right that production
code paths never hit it, but tests do.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
* fix(telemetry): trim whitespace-only service.name + add invalid-key-encoding test
Adopts two review suggestions on #4367 (wenshao via Qwen Code /review):
1. `service.name` fallback uses `.trim() || SERVICE_NAME` instead of plain
`||`. Plain `||` lets whitespace-only values (`" "`, `"\t"`) through as
truthy, producing a blank service name on Resource that some backends
reject. Both settings (no value trimming) and env (`%20` decodes to `" "`)
can deliver such values. Test added.
2. Adds `key%ZZ=val` to the parameterized parser test to cover the
invalid-percent-encoding-on-key catch branch. Previously only the
value-side catch was tested.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Layer detailed content attributes onto the existing hierarchical spans
(qwen-code.interaction / qwen-code.llm_request / qwen-code.tool) gated
by includeSensitiveSpanAttributes:
- Interaction span: user prompt (new_context)
- LLM request span: system prompt + hash + preview + length (full text
deduped per session via SHA-256), tool schemas (per-tool tool_schema
events, also hash-deduped), model output
- Tool span: tool input, tool result on every exit path (success +
pre-hook block + post-hook stop + tool error + try-block cancel +
catch-block cancel + execution exception)
All large content truncated at 60KB with *_truncated and
*_original_length metadata. Heavy serialization (safeJsonStringify on
tool I/O, partToString on user prompt) is guarded by the sensitive
flag at the call site so it doesn't run when telemetry is off.
Also adds:
- getActiveInteractionSpan() helper for client.ts to attach prompt
attributes to the interaction span.
- Updated config schema description and docs (telemetry.md +
settings.md) to reflect expanded scope and add security/cost notes.
- 28 unit tests for detailed-span-attributes, 4 tests for
getActiveInteractionSpan, integration mocks updated.
* docs(telemetry): align config and docs semantics for target, outfile, and CLI flags
- Remove stale warning note "This feature requires corresponding code
changes" — the OTLP implementation is now complete (#3779, #4061)
- Clarify that `target` is an informational destination label and does
not control exporter routing; `otlpEndpoint` or `outfile` must be set
to configure where data is sent
- Mark `--telemetry-target` CLI flag as deprecated in the configuration
table to match the deprecateOption() call in cli/src/config/config.ts
- Fix `outfile` / `QWEN_TELEMETRY_OUTFILE` descriptions: remove the
incorrect "when target is local" qualifier — outfile overrides OTLP
export regardless of the target value
- Simplify the file-based output example by removing the now-redundant
`"target": "local"` and `"otlpEndpoint": ""` fields
Closes the "Align telemetry config and docs semantics for target,
useCollector, otlpEndpoint, otlpProtocol, and outfile" checklist item
in #3731.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
* docs(telemetry): address Copilot review comments on outfile and target descriptions
- Fix outfile table row in telemetry.md: "overrides `otlpEndpoint`" →
"overrides OTLP export" (outfile disables all OTLP exporting, not
just the base endpoint)
- Use fully-qualified setting names (`telemetry.otlpEndpoint`,
`telemetry.outfile`) in the target description in settings.md for
consistency with the rest of the table
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
* docs(telemetry): update QWEN_TELEMETRY_TARGET env var description and add outfile note
- Align QWEN_TELEMETRY_TARGET env var description with the updated
telemetry.target setting semantics (informational label, not routing)
- Add a note after the file-based output example clarifying that outfile
automatically disables OTLP export
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
useCollector was plumbed through config (interface, constructor, getter,
env var resolution) but never consumed by the telemetry SDK — the setting
had no runtime effect. TelemetryTarget.QWEN existed in the enum but
parseTelemetryTargetValue() only accepted 'local' and 'gcp', making
'qwen' unreachable (it would throw FatalConfigError).
Remove both dead code paths along with their tests and documentation.
Part of #3731
* feat(telemetry): add sensitive span attribute opt-in
Add a telemetry setting and environment override for including sensitive attributes in spans created by the log-to-span bridge. Keep the default filtering behavior for prompt, function_args, and response_text unless explicitly enabled.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(telemetry): clarify span bridge options
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* feat(telemetry): populate api response text
Populate response_text on API response telemetry events for non-internal prompts so opted-in bridge spans can include model response bodies.
Exclude thought text from the recorded response text and keep internal prompt responses omitted.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* docs(telemetry): clarify sensitive span attribute scope
Clarify that the sensitive span attribute setting only controls log-to-span bridge spans, while response text may still reach other telemetry sinks from API response events.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(telemetry): cap recorded response text
Limit response_text captured for API response telemetry to a bounded length and mark truncated values to avoid oversized OTLP attributes.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
---------
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* feat(telemetry): define HTTP OTLP endpoint behavior and signal routing
- Add resolveHttpOtlpUrl() that appends /v1/traces, /v1/logs, /v1/metrics
to base HTTP OTLP endpoints per the OpenTelemetry specification
- Add per-signal endpoint overrides (otlpTracesEndpoint, otlpLogsEndpoint,
otlpMetricsEndpoint) for backends with non-standard paths (e.g. Alibaba Cloud)
- Add LogToSpanProcessor that bridges OTel log records to spans for
traces-only backends, with session-based traceId correlation and
error status propagation
- Auto-wire LogToSpanProcessor when traces URL exists but logs URL doesn't
- Validate per-signal URLs gracefully (log error + skip, don't crash)
- Preserve query strings when appending signal paths to URLs
- Guard gRPC branch against missing base endpoint with per-signal config
- Update telemetry documentation with signal routing semantics and
Alibaba Cloud HTTP per-signal endpoint examples
Closes#3734
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(telemetry): fix TS noPropertyAccessFromIndexSignature errors in tests
Use typed ExportedSpan interface and bracket notation for index signature
properties to satisfy strict TypeScript checks in CI.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(telemetry): replace MD5 with SHA-256 for traceId derivation
CodeQL flagged MD5 as a weak cryptographic algorithm when used with
session.id (considered sensitive data). Switch to SHA-256 truncated
to 32 hex chars to satisfy CodeQL while maintaining the same traceId
format required by the OTel specification.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(telemetry): address review feedback for LogToSpanProcessor robustness
- Wrap JSON.stringify in try/catch to handle circular refs and BigInt
- Add export timeout (30s) and try/catch to prevent hung shutdown
- Track in-flight exports to avoid interval-vs-shutdown race condition
- Fix deriveSpanStatus: use truthy checks (!!), drop success===false
heuristic since declined tool calls are normal, not errors
- Enforce http(s) scheme in validateUrl to reject file:/javascript: URLs
- Change DiagLogLevel from ERROR to WARN to preserve operational diagnostics
- Preserve logRecord.instrumentationScope instead of hardcoding
- Forward severityNumber/severityText as span attributes
- Add tests for circular refs, error status edge cases, severity
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(telemetry): flush sdk shutdown through cleanup
Remove async process exit handlers from telemetry initialization and route SDK shutdown through Config cleanup so normal CLI exit paths await pending telemetry exports. Keep shutdown idempotent while an SDK shutdown is in flight.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(telemetry): harden bridged log shutdown
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(telemetry): address review follow-ups
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
---------
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The `tool_token_count` field was sourced from `toolUsePromptTokenCount`
on the GenAI usage metadata, but none of the providers we adapt
(OpenAI/DashScope, Anthropic) populate it, and Google's Gemini API only
emits it for built-in server-side tools that qwen-code does not use.
The metric was therefore always zero in practice, so the dedicated
counter, telemetry field, UI row, and supporting plumbing are removed
end-to-end (telemetry types, OTEL counter type, UI aggregation, model
stats display, qwen-logger payload, VS Code session schema, and docs).
- Remove deprecated fields: embedding_model, api_key_enabled, vertex_ai_enabled, log_prompts_enabled
- Add new fields: truncate_tool_output_threshold, truncate_tool_output_lines, hooks, ide_enabled, interactive_shell_enabled
This aligns telemetry data with the current CLI configuration options.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>