Why:
- Add a real bidirectional streaming TTS path: raw LLM tokens are
forwarded to the upstream model (Volcengine v3 via the unspeech ws
bridge) without client-side segmentation, so the model owns sentence
splitting and audio chunks play as they arrive.
- Move audio endpoints out of /api/v1/openai/. `/audio/voices`,
`/audio/models`, `/audio/voices/streaming` are not real OpenAI public
APIs, and the streaming TTS surface has nothing to do with OpenAI —
keeping them under /openai/ mislabelled the contract.
- Introduce `capabilities.speech.transport` on ProviderDefinition so
future streaming providers (ElevenLabs / Cartesia / OpenAI Realtime)
opt in without touching Stage.vue or the session factory.
- Unify Stage.vue's TTS path through a single StageTtsSession so the
chat-orchestrator hooks no longer branch on provider id.
What:
- apps/server: new ws proxy /api/v1/audio/speech/ws bridges client ↔
unspeech with auth, pre-flight flux check, billing from upstream
session.finished.usage, OTel spans.
- apps/server: audio routes moved from /api/v1/openai/audio/* to
/api/v1/audio/* (hard cutover; 404 sentinel tests added).
- apps/server: new /api/v1/audio/voices/streaming proxy reads voices
from unspeech /api/voices?provider=volcengine.
- apps/server: new STREAMING_TTS_UPSTREAM configKV entry +
scripts/seed-streaming-tts.ts.
- stage-ui: new libs/speech/streaming-pipeline.ts opens one ws per LLM
intent (appendText / finish / cancel + onSentence / onError / onDone).
- stage-ui: new libs/speech/tts-session.ts — StageTtsSession interface
with segmenter and streaming adapters; factory dispatches by
capabilities.speech.transport instead of hard-coded provider id.
- stage-ui: providerOfficialSpeechStreaming with capabilities.speech =
{ transport: 'bidirectional-ws' }; settings page with model/voice
picker + ws-based preview.
- stage-ui: Stage.vue chat hooks collapsed to a single currentSession;
hot-swap watcher cancels mid-session on provider/voice/model change;
unmount cancels and drains playback.
Tests:
- 9 streaming-pipeline tests (happy path / buffered / error / cancel /
truncation)
- 11 tts-session tests (factory branch coverage + adapter contracts)
- 4 audio-speech-ws route tests (forwarding / billing / pre-flight /
config-missing)
- 3 legacy-path 404 sentinels in v1 route tests
- Verification doc updated to reflect automated coverage.
The `/v1/openai/{chat,audio}` handlers used to be silent past
`hono/logger`'s `<-- POST` / `--> 502` lines — no userId, no model,
no token counts, no flux billed. Operators looking at a real gateway
incident had to cross-reference traces, request-log rows, and billing
ledger entries by timestamp alone. For a gateway whose value is
auditable per-request metering, that's not enough.
Hoists `requestId = nanoid()` to handler entry so the same
correlation id flows through:
- the inbound log line (model / stream / messageCount or inputChars
for TTS)
- the per-stream / non-streaming delivered log (status / durationMs /
promptTokens / completionTokens / fluxConsumed)
- the upstream-error degraded path (warn level)
- the partial-debit and debit-failure paths (already used requestId)
- `billingService.consumeFluxForLLM` / `ttsMeter.accumulate` for
DB-level idempotency (replaces the previous local nanoid() calls)
handleListVoices gets a `debug`-level line — the route is high-
frequency from UI voice pickers and we don't want it in the regular
audit feed, but it's useful when chasing voice-picker drift bugs.
No new schemas, no metric emission changes; this is purely logger
output. Pairs with the cause-propagation change so errors carry the
upstream snippet AND the request can be traced end-to-end by id.
When `mapUpstreamError` produced the final 502/503/504, it only carried
`{triedKeys, triedUpstreams, lastStatusCode}` in `details`. The upstream
response body was `.cancel()`'d on the wire and the network error
message vanished into the catch arm — operators staring at a 502 had no
way to tell "OpenRouter region-blocked us" from "key revoked" from
"DNS failed" without re-probing the upstream by hand.
Now each recorded failure carries the diagnostic snippet:
- chat upstreams read at most 256 bytes of the failed body via a
drain-aware reader before cancelling the rest (socket still returns
to the pool, no fallback-storm pool exhaustion).
- TTS upstreams reuse `errorMessageFrom(err)` — adapters already bake
the status + body snippet into `err.message`, so one field carries
both.
- network / timeout attempts record `errorMessageFromUnknown(err)` so
"attempt-timeout" vs "ECONNRESET" vs "DNS failed" stays
distinguishable.
The collected `UpstreamAttempt[]` is attached to `ApiError.cause`
rather than `details`. SEC-5 (no upstream content in client-facing
response body) still holds — only the server-side logger + OTel pick
the cause up. `app.onError` now logs `{details, cause}` together so a
single log line tells the operator both the contract-level summary
and the actual upstream payload.
Adds a router.test.ts regression covering both shapes (HTTP 401 body
snippet + network ECONNRESET errorMessage), with an explicit assertion
that `details` does NOT contain the body text so SEC-5 doesn't drift.
DashScope dropped cosyvoice-v1 from its REST-supported model list. v2
(and v3+) speak a different shape: voice / format / sample_rate live
under `input`, not `parameters`; non-streaming responses return
`output.audio.url` (signed OSS URL) instead of inline `output.audio.data`
base64. The previous adapter sent v1-shaped bodies to a bare
`https://dashscope-intl.aliyuncs.com/api/v1` baseURL and parsed
`audio.data`, which 404'd before the migration and would 200-with-no-
audio after — both invisible regressions for the gateway.
Adapter changes:
- Rewrite request body to v2 schema (voice/format under input).
- Add follow-up GET against `output.audio.url`; stream into ArrayBuffer
with a 25 MB hard cap and explicit drain-tracking finally, so a
misbehaving URL cannot exhaust memory and a half-read body cannot
hang a connection.
- Re-document baseURL contract: adapters do NOT append path; ops must
configure the FULL endpoint URL (root cause of the original 404
storm). DEFAULT_COSYVOICE_MODEL bumped to `cosyvoice-v2`, default
voice to `longxiaochun_v2`.
Voice catalog: regenerated with 19 representative cosyvoice-v2 voices
(assistant / customer-service / child / en-US / en-GB / ja-JP / ko-KR)
so the frontend voice picker is no longer a 2-entry stub. Full catalog
(100+) remains on the Alibaba docs page — we'll sync on demand rather
than scrape.
Seed script: `--dashscope-region intl|cn` (default `intl`),
`--dashscope-upstream-model cosyvoice-v2`, baseURL now resolves to
`https://<host>/api/v1/services/audio/tts/SpeechSynthesizer` so a
mis-typed region or path cannot reintroduce the 404.
Tests: new dashscope-cosyvoice.test.ts covers v2 body shape (asserts
`parameters` absent — regression), audio.url follow-up fetch, 401
propagation with `.status`, empty-envelope falling back into the
router's recoverable-error path, and catalog freshness (no leftover v1
ids). Verified locally against the staging DashScope key: 200 +
playable mp3 end to end.
End-state of the multi-step KTD-5 / KTD-6 / U8 work. The knoway sidecar
is no longer reachable from server code; the router is required at boot
and now owns chat completions, TTS synthesis, and voice catalog listing.
Highlights:
- LLM_ROUTER_MASTER_KEY becomes required; app.ts drops the graceful-
skip branch and the chat fallback fetch path is gone.
- /audio/speech and /audio/voices route through new routeTts /
listTtsVoices entries that reuse the chat key-rotator + per-attempt
timeout + abort propagation.
- DEFAULT_CHAT_MODEL / DEFAULT_TTS_MODEL move from env to configKV so
default-model swaps are hot-reloadable via Pub/Sub.
- GATEWAY_BASE_URL removed from env schema, .env, .env.local, smoke,
verification harness. Redis upstream-voices cache deleted — catalogs
come from in-process adapter JSON.
- routeTts splits adapter error contract by ApiError statusCode:
4xx propagates without fallback; 5xx folds into the network-failure
fallback path. handleTTS wraps billing + span attribute in try/finally
to plug a span leak when ttsMeter.accumulate() throws.
- seed-router-config.ts rewritten with --merge (default) / --reset /
--dry-run modes and env-var key handoff (OPENROUTER_KEY / AZURE_KEY /
DASHSCOPE_KEY) so prod seed flows never put plaintext on the CLI.
Adds DashScope CosyVoice seeding.
Docs (CLAUDE.md, architecture-overview.md, transport-and-routes.md)
reflect the new boundary. verifications/llm-router.md replaces the
overstated "U1-U9 shipped" line with an evidence-vs-pending table.
Tests: full 40-file / 343-case server suite green. New regressions pin
ApiError 4xx → no-fallback, ApiError 5xx → fallback, TTS billing
failure → span closed and error propagated.
- Added a new PostHog client for capturing server-side business events such as Stripe webhooks and subscription state changes.
- Implemented various tracking functions for pricing funnel steps, character creation, and chat session starts.
- Enhanced the flux meter tests to handle partial charges and report unbilled flux correctly.
- Updated the CharacterDialog and Flux settings pages to track user interactions with analytics events.
- Introduced a mechanism to identify users on PostHog based on authentication state to ensure accurate funnel tracking.
- Added necessary dependencies for PostHog integration in the project.
## Description
Adds optional env **`ADDITIONAL_TRUSTED_ORIGINS`**: comma-separated
browser origins that are trusted for **CORS (`/api/*`)**, **Stripe
return URLs**, **Better Auth `trustedOrigins`**, and **dynamic web OIDC
redirect URIs**.
LAN / non-localhost Capacitor dev (e.g. Pocket + Vite on
`https://10.x:5273`) no longer relies on broad private-IP regex;
operators list exact origins in `.env.local` and restart the API server
after changes.
## Linked Issues
<!-- N/A -->
## Additional Context
Pocket iOS dev workflow: `cap`/`capacitor.config` often points at a LAN
HTTPS origin; without this allowlist the API rejects those
`Origin`/`Referer`/`redirect_uri` bases. Review can stay focused on
**`apps/server/src/libs/env.ts`**,
**`apps/server/src/utils/origin.ts`**, and wiring in **`app.ts`**,
**Stripe**, **auth routes**.
- Moved RateLimitMetrics import path to a more centralized location.
- Introduced a new file for active sessions gauge to track user sessions in the database.
- Updated index.ts to include new metrics and ensure proper initialization of observability metrics.
- Modified various routes and services to utilize the new observability structure.
- Added smoke tests for HTTP and WebSocket metrics to ensure proper metric registration and functionality.
- Enhanced error handling for metrics reading failures to improve observability.
- Updated the EngagementMetrics interface to use ObservableGauge for tracking active WebSocket connections.
- Added detailed comments explaining the rationale for this change, highlighting the benefits of using a pull-based gauge over a delta-based counter.
- Implemented the ObservableGauge in the createChatWsHandlers function, ensuring it accurately reflects the live count of active connections.
- Removed the previous UpDownCounter logic to prevent issues with connection drift during process crashes or network interruptions.
- Added new metrics for email service including send, failures, and duration tracking.
- Introduced rate limit metrics to monitor blocked requests and improve abuse detection.
- Enhanced billing metrics to track credited and unbilled flux, as well as TTS character processing.
- Updated OpenAI and Stripe routes to utilize new metrics for better revenue tracking and rate limiting.
- Implemented a smoke test for OpenTelemetry metrics registration to ensure visibility at startup.
The Redis Stream `billing-events` + `worker` Railway role +
advisory-lock poller layered together didn't actually buy us reliability
— `debitFlux` swallowed XADD failures, leaving the door open to "balance
updated, ledger row never written". Collapse the whole thing back to:
`creditFlux` and `debitFlux` write `flux_transaction` ledger rows inline
within the same DB transaction that mutates `user_flux`, and `(user_id,
request_id)` remains the partial unique index that keeps retries safe.
Concrete changes:
- Inline ledger inserts in `BillingService.{debitFlux, creditFlux,
creditFluxFromStripeCheckout, creditFluxFromInvoice}`; drop `billingMq`
and `publishEvent` plumbing entirely.
- `routes/openai/v1` writes `llm_request_log` synchronously via the
existing `requestLogService`; the duplicate `llm-request-log.ts` service
module is removed.
- `bin/run-worker.ts`, `libs/mq/*`,
`services/billing/billing-events.ts`,
`services/billing/billing-consumer-handler.ts`, and matching tests are
deleted. CLI now exposes only `api`.
- `BILLING_EVENTS_*` env vars and the `DEFAULT_BILLING_EVENTS_STREAM`
helper are dropped; `docker-compose.yml` no longer ships a worker
service.
- `docs/ai-context/{workers-and-runtime, billing-architecture,
redis-boundaries-and-pubsub, data-model-and-state,
architecture-overview, README}.md`, `CLAUDE.md`, and the existing
verification docs are updated to describe the single-process synchronous
pipeline.
Tests: 29 files / 247 cases pass. Production deployments need to drop
the worker Railway service after this lands.
Without a stable secret, better-auth generates a random one per process,
which invalidates every session cookie and JWKS private key on redeploy
and across multi-instance deployments. Make it a required env var and
wire it into betterAuth({ secret }) explicitly so missing config fails
fast at boot instead of silently rotating keys.