- Add cachedContentTokenCount tracking in uiTelemetry service
- Collect cached_content_token_count from streaming usage metadata
- Use cached tokens instead of estimated overhead when available
- Fix messages token calculation to avoid 'messages = 0' issue
This improves context window display accuracy when using providers
that support prefix caching (e.g., DashScope).
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add comprehensive documentation for the Agent Arena feature, covering
usage, configuration, best practices, troubleshooting, and limitations.
Update navigation metadata to include the new page.
This enables users to discover and learn about the multi-model comparison
capability for competitive task execution.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Move getErrorStatus from retry.ts to errors.ts for better organization
- Add getErrorType utility to extract error class/category names
- Enhance getErrorMessage to include cause chain for better debugging
- Refactor ApiErrorEvent to use options object pattern (more readable)
- Rename 'error' to 'error_message' in ApiErrorEvent for clarity
- Make isQwenQuotaExceededError more precise: requires status=429,
code='insufficient_quota', and 'free allocated quota exceeded' message
- Update all tests to match new error detection behavior
This improves error telemetry and makes quota detection more reliable.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Rename skippableDelay to delay and thread AbortSignal into all three
retry delay sites (rate-limit, invalid-stream, content-error). When the
request's AbortSignal fires during any countdown, the delay promise
rejects, the generator exits via its finally block, and sendPromise is
released so the next sendMessageStream call can proceed immediately.
Previously, cancelling during a rate-limit countdown (up to 60s) would
leave the generator blocked on the delay, preventing any subsequent
request from starting due to sendPromise serialization.
Add skippableDelay utility that resolves early via a skip() callback,
replacing the plain setTimeout in rate-limit retry paths. The skipDelay
callback is included in RetryInfo, flowing naturally through turn.ts to
the UI via startRetryCountdown/clearRetryCountdown with symmetric
lifecycle management. When the user presses Ctrl+Y during a rate-limit
countdown, retryLastPrompt calls skipDelay() to resolve the delay
promise, letting the generator continue its retry loop naturally
without aborting or re-submitting the query.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add truncateToolOutput helper in truncation.ts to centralize threshold reading, file saving, and telemetry logging
- Refactor shell.ts to use the new helper, removing duplicate code
- Add truncation support for MCP tool output while preserving non-text content (images, audio, resources)
- Refactor getDisplayFromParts to work on transformed Part[] instead of raw MCP response
This reduces code duplication and ensures consistent truncation behavior across shell and MCP tools.
- Added candidatesTokens prop to LoadingIndicator for displaying token counts.
- Updated formatting to show elapsed time and token counts inline.
- Refactored tests to validate new token display functionality and formatting changes.
- Introduced formatTokenCount utility for consistent token count representation.
This improves user feedback during loading states by providing clearer information on token usage.
- Add hasExplicitOutputLimit() to detect models with defined output limits
- For known models: cap user max_tokens to model limit (avoid API errors)
- For unknown models (deployment aliases, self-hosted): respect user config
- Update tests to cover new behavior
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add os.tmpdir() to allowed paths in read-file tool
- Add tests for reading files from OS temp directory
- Add terminal capture scenario for PR review testing
This supports the PR review workflow which saves context to temp files.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Increase DEFAULT_OUTPUT_TOKEN_LIMIT from 16K to 32K
- Remove auto-detection from modelsConfig, apply at provider level
- Use conservative default (min of model limit and 32K) when user hasn't configured max_tokens
- Respect user configuration but cap at model's max output limit to avoid API errors
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Checkout PR branch instead of remote viewing for full file access
- Save PR context to temp file to avoid repeating in agent prompts
- Add guidance to prevent 4x diff duplication across agents
- Include environment restoration step after review
This enables agents to read files directly and use git diff against base branch,
improving review quality and reducing prompt bloat.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>