Why:
- Add a real bidirectional streaming TTS path: raw LLM tokens are
forwarded to the upstream model (Volcengine v3 via the unspeech ws
bridge) without client-side segmentation, so the model owns sentence
splitting and audio chunks play as they arrive.
- Move audio endpoints out of /api/v1/openai/. `/audio/voices`,
`/audio/models`, `/audio/voices/streaming` are not real OpenAI public
APIs, and the streaming TTS surface has nothing to do with OpenAI —
keeping them under /openai/ mislabelled the contract.
- Introduce `capabilities.speech.transport` on ProviderDefinition so
future streaming providers (ElevenLabs / Cartesia / OpenAI Realtime)
opt in without touching Stage.vue or the session factory.
- Unify Stage.vue's TTS path through a single StageTtsSession so the
chat-orchestrator hooks no longer branch on provider id.
What:
- apps/server: new ws proxy /api/v1/audio/speech/ws bridges client ↔
unspeech with auth, pre-flight flux check, billing from upstream
session.finished.usage, OTel spans.
- apps/server: audio routes moved from /api/v1/openai/audio/* to
/api/v1/audio/* (hard cutover; 404 sentinel tests added).
- apps/server: new /api/v1/audio/voices/streaming proxy reads voices
from unspeech /api/voices?provider=volcengine.
- apps/server: new STREAMING_TTS_UPSTREAM configKV entry +
scripts/seed-streaming-tts.ts.
- stage-ui: new libs/speech/streaming-pipeline.ts opens one ws per LLM
intent (appendText / finish / cancel + onSentence / onError / onDone).
- stage-ui: new libs/speech/tts-session.ts — StageTtsSession interface
with segmenter and streaming adapters; factory dispatches by
capabilities.speech.transport instead of hard-coded provider id.
- stage-ui: providerOfficialSpeechStreaming with capabilities.speech =
{ transport: 'bidirectional-ws' }; settings page with model/voice
picker + ws-based preview.
- stage-ui: Stage.vue chat hooks collapsed to a single currentSession;
hot-swap watcher cancels mid-session on provider/voice/model change;
unmount cancels and drains playback.
Tests:
- 9 streaming-pipeline tests (happy path / buffered / error / cancel /
truncation)
- 11 tts-session tests (factory branch coverage + adapter contracts)
- 4 audio-speech-ws route tests (forwarding / billing / pre-flight /
config-missing)
- 3 legacy-path 404 sentinels in v1 route tests
- Verification doc updated to reflect automated coverage.
The `/v1/openai/{chat,audio}` handlers used to be silent past
`hono/logger`'s `<-- POST` / `--> 502` lines — no userId, no model,
no token counts, no flux billed. Operators looking at a real gateway
incident had to cross-reference traces, request-log rows, and billing
ledger entries by timestamp alone. For a gateway whose value is
auditable per-request metering, that's not enough.
Hoists `requestId = nanoid()` to handler entry so the same
correlation id flows through:
- the inbound log line (model / stream / messageCount or inputChars
for TTS)
- the per-stream / non-streaming delivered log (status / durationMs /
promptTokens / completionTokens / fluxConsumed)
- the upstream-error degraded path (warn level)
- the partial-debit and debit-failure paths (already used requestId)
- `billingService.consumeFluxForLLM` / `ttsMeter.accumulate` for
DB-level idempotency (replaces the previous local nanoid() calls)
handleListVoices gets a `debug`-level line — the route is high-
frequency from UI voice pickers and we don't want it in the regular
audit feed, but it's useful when chasing voice-picker drift bugs.
No new schemas, no metric emission changes; this is purely logger
output. Pairs with the cause-propagation change so errors carry the
upstream snippet AND the request can be traced end-to-end by id.
When `mapUpstreamError` produced the final 502/503/504, it only carried
`{triedKeys, triedUpstreams, lastStatusCode}` in `details`. The upstream
response body was `.cancel()`'d on the wire and the network error
message vanished into the catch arm — operators staring at a 502 had no
way to tell "OpenRouter region-blocked us" from "key revoked" from
"DNS failed" without re-probing the upstream by hand.
Now each recorded failure carries the diagnostic snippet:
- chat upstreams read at most 256 bytes of the failed body via a
drain-aware reader before cancelling the rest (socket still returns
to the pool, no fallback-storm pool exhaustion).
- TTS upstreams reuse `errorMessageFrom(err)` — adapters already bake
the status + body snippet into `err.message`, so one field carries
both.
- network / timeout attempts record `errorMessageFromUnknown(err)` so
"attempt-timeout" vs "ECONNRESET" vs "DNS failed" stays
distinguishable.
The collected `UpstreamAttempt[]` is attached to `ApiError.cause`
rather than `details`. SEC-5 (no upstream content in client-facing
response body) still holds — only the server-side logger + OTel pick
the cause up. `app.onError` now logs `{details, cause}` together so a
single log line tells the operator both the contract-level summary
and the actual upstream payload.
Adds a router.test.ts regression covering both shapes (HTTP 401 body
snippet + network ECONNRESET errorMessage), with an explicit assertion
that `details` does NOT contain the body text so SEC-5 doesn't drift.
DashScope dropped cosyvoice-v1 from its REST-supported model list. v2
(and v3+) speak a different shape: voice / format / sample_rate live
under `input`, not `parameters`; non-streaming responses return
`output.audio.url` (signed OSS URL) instead of inline `output.audio.data`
base64. The previous adapter sent v1-shaped bodies to a bare
`https://dashscope-intl.aliyuncs.com/api/v1` baseURL and parsed
`audio.data`, which 404'd before the migration and would 200-with-no-
audio after — both invisible regressions for the gateway.
Adapter changes:
- Rewrite request body to v2 schema (voice/format under input).
- Add follow-up GET against `output.audio.url`; stream into ArrayBuffer
with a 25 MB hard cap and explicit drain-tracking finally, so a
misbehaving URL cannot exhaust memory and a half-read body cannot
hang a connection.
- Re-document baseURL contract: adapters do NOT append path; ops must
configure the FULL endpoint URL (root cause of the original 404
storm). DEFAULT_COSYVOICE_MODEL bumped to `cosyvoice-v2`, default
voice to `longxiaochun_v2`.
Voice catalog: regenerated with 19 representative cosyvoice-v2 voices
(assistant / customer-service / child / en-US / en-GB / ja-JP / ko-KR)
so the frontend voice picker is no longer a 2-entry stub. Full catalog
(100+) remains on the Alibaba docs page — we'll sync on demand rather
than scrape.
Seed script: `--dashscope-region intl|cn` (default `intl`),
`--dashscope-upstream-model cosyvoice-v2`, baseURL now resolves to
`https://<host>/api/v1/services/audio/tts/SpeechSynthesizer` so a
mis-typed region or path cannot reintroduce the 404.
Tests: new dashscope-cosyvoice.test.ts covers v2 body shape (asserts
`parameters` absent — regression), audio.url follow-up fetch, 401
propagation with `.status`, empty-envelope falling back into the
router's recoverable-error path, and catalog freshness (no leftover v1
ids). Verified locally against the staging DashScope key: 200 +
playable mp3 end to end.
End-state of the multi-step KTD-5 / KTD-6 / U8 work. The knoway sidecar
is no longer reachable from server code; the router is required at boot
and now owns chat completions, TTS synthesis, and voice catalog listing.
Highlights:
- LLM_ROUTER_MASTER_KEY becomes required; app.ts drops the graceful-
skip branch and the chat fallback fetch path is gone.
- /audio/speech and /audio/voices route through new routeTts /
listTtsVoices entries that reuse the chat key-rotator + per-attempt
timeout + abort propagation.
- DEFAULT_CHAT_MODEL / DEFAULT_TTS_MODEL move from env to configKV so
default-model swaps are hot-reloadable via Pub/Sub.
- GATEWAY_BASE_URL removed from env schema, .env, .env.local, smoke,
verification harness. Redis upstream-voices cache deleted — catalogs
come from in-process adapter JSON.
- routeTts splits adapter error contract by ApiError statusCode:
4xx propagates without fallback; 5xx folds into the network-failure
fallback path. handleTTS wraps billing + span attribute in try/finally
to plug a span leak when ttsMeter.accumulate() throws.
- seed-router-config.ts rewritten with --merge (default) / --reset /
--dry-run modes and env-var key handoff (OPENROUTER_KEY / AZURE_KEY /
DASHSCOPE_KEY) so prod seed flows never put plaintext on the CLI.
Adds DashScope CosyVoice seeding.
Docs (CLAUDE.md, architecture-overview.md, transport-and-routes.md)
reflect the new boundary. verifications/llm-router.md replaces the
overstated "U1-U9 shipped" line with an evidence-vs-pending table.
Tests: full 40-file / 343-case server suite green. New regressions pin
ApiError 4xx → no-fallback, ApiError 5xx → fallback, TTS billing
failure → span closed and error propagated.
- Added a new PostHog client for capturing server-side business events such as Stripe webhooks and subscription state changes.
- Implemented various tracking functions for pricing funnel steps, character creation, and chat session starts.
- Enhanced the flux meter tests to handle partial charges and report unbilled flux correctly.
- Updated the CharacterDialog and Flux settings pages to track user interactions with analytics events.
- Introduced a mechanism to identify users on PostHog based on authentication state to ensure accurate funnel tracking.
- Added necessary dependencies for PostHog integration in the project.
## Summary
Adds the experimental Godot stage sidecar path for `stage-tamagotchi`.
This PR wires the existing Tamagotchi model selection flow into an
external Godot runtime window. The renderer gates Godot scene input to
VRM models, Electron main materialises the selected model bytes to a
local file, and the Godot sidecar receives the native path over a local
WebSocket bridge before importing and displaying the avatar at runtime.
## What Changed
- Added a typed Godot scene input contract with `format: "vrm"`.
- Added renderer-side VRM-only gating before sending selected model data
to Electron main.
- Added Electron main sidecar management for:
- launching Godot
- starting the local WebSocket bridge
- materialising selected VRM bytes under app `userData`
- forwarding scene apply messages to Godot
- optional remote debugging support
- Added Godot runtime scripts for:
- sidecar startup and WebSocket orchestration
- message envelope parsing
- avatar import and atomic replacement
- runtime VRM import through Godot `GLTFDocument`
- Added engine-local docs for runtime import, live debugging, vendor
patches, and current VRM support boundaries.
- Removed temporary tests after using them to verify the glue behaviour
locally, to keep the review surface smaller.
## Vendor Code Note
A large part of this PR is vendored Godot add-on code, not AIRI business
logic.
The bulk of the added files under:
- `engines/stage-tamagotchi-godot/addons/vrm/**`
- `engines/stage-tamagotchi-godot/addons/Godot-MToon-Shader/**`
comes from V-Sekai Godot VRM / MToon add-ons. These files are required
because Godot plugins are project-local source/assets rather than
package-manager dependencies.
The intended review scope for vendor code is limited to:
- source baseline metadata
- license/plugin config
- Godot-generated metadata notes
- the documented local patch in `addons/vrm/vrm_extension.gd`
The application/runtime code to review is mainly under:
- `apps/stage-tamagotchi/src/shared/eventa/index.ts`
- `apps/stage-tamagotchi/src/renderer/pages/settings/models/`
- `apps/stage-tamagotchi/src/main/services/airi/godot-stage/`
- `engines/stage-tamagotchi-godot/scripts/`
## Current Boundary
This is still an experimental G1 Godot sidecar path.
The runtime scene input contract accepts `.vrm` files only. The current
Godot runtime importer covers the VRM 0.x path used by the local fixture
through AIRI’s runtime bridge over the vendored VRM extension. VRM 1.0
editor import support exists in the vendored add-on, but the sidecar
runtime importer does not yet register the full `VRMC_*` extension set,
so this PR does not claim full VRM 1.0 runtime support.
---------
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
## Description
Adds optional env **`ADDITIONAL_TRUSTED_ORIGINS`**: comma-separated
browser origins that are trusted for **CORS (`/api/*`)**, **Stripe
return URLs**, **Better Auth `trustedOrigins`**, and **dynamic web OIDC
redirect URIs**.
LAN / non-localhost Capacitor dev (e.g. Pocket + Vite on
`https://10.x:5273`) no longer relies on broad private-IP regex;
operators list exact origins in `.env.local` and restart the API server
after changes.
## Linked Issues
<!-- N/A -->
## Additional Context
Pocket iOS dev workflow: `cap`/`capacitor.config` often points at a LAN
HTTPS origin; without this allowlist the API rejects those
`Origin`/`Referer`/`redirect_uri` bases. Review can stay focused on
**`apps/server/src/libs/env.ts`**,
**`apps/server/src/utils/origin.ts`**, and wiring in **`app.ts`**,
**Stripe**, **auth routes**.
- Moved RateLimitMetrics import path to a more centralized location.
- Introduced a new file for active sessions gauge to track user sessions in the database.
- Updated index.ts to include new metrics and ensure proper initialization of observability metrics.
- Modified various routes and services to utilize the new observability structure.
- Added smoke tests for HTTP and WebSocket metrics to ensure proper metric registration and functionality.
- Enhanced error handling for metrics reading failures to improve observability.
1. Introduce the global shortcut service
1. Add more concrete failure reasons for shortcut registration attempts
1. Add a devtool page to test (un) registering and triggering shortcuts
---
<img width="1174" height="921" alt="Screenshot 2026-05-10 at 19 33 45"
src="https://github.com/user-attachments/assets/10712013-fd49-4285-bdc9-4e6955d9c3a7"
/>
---------
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
- Updated the EngagementMetrics interface to use ObservableGauge for tracking active WebSocket connections.
- Added detailed comments explaining the rationale for this change, highlighting the benefits of using a pull-based gauge over a delta-based counter.
- Implemented the ObservableGauge in the createChatWsHandlers function, ensuring it accurately reflects the live count of active connections.
- Removed the previous UpDownCounter logic to prevent issues with connection drift during process crashes or network interruptions.
- Added new metrics for email service including send, failures, and duration tracking.
- Introduced rate limit metrics to monitor blocked requests and improve abuse detection.
- Enhanced billing metrics to track credited and unbilled flux, as well as TTS character processing.
- Updated OpenAI and Stripe routes to utilize new metrics for better revenue tracking and rate limiting.
- Implemented a smoke test for OpenTelemetry metrics registration to ensure visibility at startup.
The Redis Stream `billing-events` + `worker` Railway role +
advisory-lock poller layered together didn't actually buy us reliability
— `debitFlux` swallowed XADD failures, leaving the door open to "balance
updated, ledger row never written". Collapse the whole thing back to:
`creditFlux` and `debitFlux` write `flux_transaction` ledger rows inline
within the same DB transaction that mutates `user_flux`, and `(user_id,
request_id)` remains the partial unique index that keeps retries safe.
Concrete changes:
- Inline ledger inserts in `BillingService.{debitFlux, creditFlux,
creditFluxFromStripeCheckout, creditFluxFromInvoice}`; drop `billingMq`
and `publishEvent` plumbing entirely.
- `routes/openai/v1` writes `llm_request_log` synchronously via the
existing `requestLogService`; the duplicate `llm-request-log.ts` service
module is removed.
- `bin/run-worker.ts`, `libs/mq/*`,
`services/billing/billing-events.ts`,
`services/billing/billing-consumer-handler.ts`, and matching tests are
deleted. CLI now exposes only `api`.
- `BILLING_EVENTS_*` env vars and the `DEFAULT_BILLING_EVENTS_STREAM`
helper are dropped; `docker-compose.yml` no longer ships a worker
service.
- `docs/ai-context/{workers-and-runtime, billing-architecture,
redis-boundaries-and-pubsub, data-model-and-state,
architecture-overview, README}.md`, `CLAUDE.md`, and the existing
verification docs are updated to describe the single-process synchronous
pipeline.
Tests: 29 files / 247 cases pass. Production deployments need to drop
the worker Railway service after this lands.