GLM/ZhipuAI rejects system role messages with 422 'Input should be
user or assistant'. When memory injection adds a system-role message,
GLM combo targets fail because the system message survives into the
upstream request.
Fix:
- injection.ts: add glm, glmt, glm-cn, zai, qianfan to
PROVIDERS_WITHOUT_SYSTEM_MESSAGE so memory is injected as user role
- roleNormalizer.ts: add exact 'glm' model match to
MODELS_WITHOUT_SYSTEM_ROLE for Pollinations and bare model ids
Test: 22 new unit tests covering all GLM variants + regression checks
for openai/anthropic providers.
Closes#1701
Add 'Invalid signature in thinking block' to COMBO_BAD_REQUEST_FALLBACK_PATTERNS
so combo routing falls through to the next target instead of returning 400 directly.
This error occurs when extended thinking signatures expire between turns,
which is a model-specific issue that won't be fixed by retrying the same provider.
Closes#1696
- Add getCombosCached() with 10s TTL in chatCore.ts to avoid per-request DB lookups
- Pass allCombosData to resolveComboTargets() instead of null for nested combo resolution
- Consolidate COMBO_BAD_REQUEST_FALLBACK_PATTERNS with CONTEXT_OVERFLOW_REGEX from errorClassifier.ts
- Remove 10 duplicated context overflow patterns from combo.ts
- Export clearCombosCache() for cache invalidation
- Fix rebase artifact (>) from contributor's branch
Co-authored-by: Javier Ardila <hjasgr@gmail.com>
Closes#1470
Bottleneck v2.19.5 does not support a `maxWait` limiter/constructor option — it
was silently ignored, causing queued jobs to wait indefinitely when no 429 response
triggered the drop mechanism.
Replace with Bottleneck's supported `expiration` job-schedule option which rejects
any job that waits+executes longer than maxWaitMs. Also log expiration rejections
so they are observable in production.
- Replace polynomial regex /\/+$/ with loop-based stripTrailingSlashes()
across 8 enterprise provider configs (azure-openai, azureAi, bedrock,
datarobot, oci, sap, watsonx, audioSpeech) — fixes js/polynomial-redos
- Add prototype-pollution denylist guard in usageHistory.ts to reject
__proto__/constructor/prototype as model keys — fixes
js/prototype-polluting-assignment (#167, #168)
- Suppress 3 false-positive js/insufficient-password-hash alerts in
chatgpt-web.ts and builtins.ts where SHA-256 is used for cache-key
derivation, not password storage (#176, #177, #178)
- Add stripTrailingSlashes unit tests with ReDoS regression check
- Treat status 499 as terminal non-retryable error in both priority and
round-robin combo loops — no fallback to other models when client is gone
- Propagate AbortSignal from request into handleComboChat so the combo
loop can detect client disconnects before starting new model attempts
- Make retry/fallback delays abort-aware via signal.addEventListener
- Add 5 unit tests covering 499 early-exit, signal.aborted pre-check,
multi-model abort, 502 contrast behavior, and abort-during-wait
Integrated into release/v3.7.2 — implements conversation continuity for muse-spark-web executor with SHA-256 prefix hashing, TTL cache, and eviction-on-error
Centralize remote image downloads behind a shared helper that
validates outbound URLs, enforces redirect and size limits, and
applies request timeouts before bytes are read.
Wire the helper into image generation and vision bridge flows so
remote image inputs and result URLs follow the same fetch policy and
block redirects to private hosts. Update key management routes to use
structured logging and document the WebSocket bridge secret in the
example environment file.
The billing header fingerprint was computed from the first user message text
via computeFingerprint(), which changes every conversation turn. This mutated
the system[] prefix on each request, invalidating Anthropic's prompt-cache
prefix and forcing ~100% cache_create (vs 96% cache_read with stable prefix).
Now uses a per-day SHA-256 hash of the date + ccVersion, keeping the billing
header format while preserving prompt-cache prefix stability across turns.
Includes 6 unit tests.
The prompt_cache_key was derived from the account-wide workspaceId, meaning
all conversations from the same OAuth account shared one cache partition.
The official Codex CLI uses conversation_id (a unique UUID per session).
Priority: body.session_id > body.conversation_id > workspaceId.
Session IDs are captured BEFORE deletion from the body.
Includes 10 unit tests.
When a provider returns 429 with quota_exhausted reason, set cooldown until
tomorrow 00:00 instead of exponential backoff. Includes isDailyQuotaExhausted()
detection in chat handler and unit tests.
Co-authored-by: clousky2020 <clousky2020@users.noreply.github.com>
Prevent upstream 400 failures when clients omit prior
reasoning_content in multi-turn tool-calling conversations.
Capture reasoning_content from streaming and non-streaming assistant
responses, persist it in a memory-plus-SQLite cache keyed by
tool_call_id, and re-inject it on later requests when available.
Add the reasoning cache migration, service layer, authenticated API
endpoints, dashboard tab, and unit coverage to support inspection,
cleanup, and crash recovery.