Commit graph

4 commits

Author SHA1 Message Date
RainbowBird
9a182e0b68
feat(server,stage-ui): bidirectional streaming TTS + audio path refactor
Some checks are pending
Cloudflare Workers (server-dev) / Deploy - stage-web (server-dev) (push) Waiting to run
Why:
- Add a real bidirectional streaming TTS path: raw LLM tokens are
  forwarded to the upstream model (Volcengine v3 via the unspeech ws
  bridge) without client-side segmentation, so the model owns sentence
  splitting and audio chunks play as they arrive.
- Move audio endpoints out of /api/v1/openai/. `/audio/voices`,
  `/audio/models`, `/audio/voices/streaming` are not real OpenAI public
  APIs, and the streaming TTS surface has nothing to do with OpenAI —
  keeping them under /openai/ mislabelled the contract.
- Introduce `capabilities.speech.transport` on ProviderDefinition so
  future streaming providers (ElevenLabs / Cartesia / OpenAI Realtime)
  opt in without touching Stage.vue or the session factory.
- Unify Stage.vue's TTS path through a single StageTtsSession so the
  chat-orchestrator hooks no longer branch on provider id.

What:
- apps/server: new ws proxy /api/v1/audio/speech/ws bridges client ↔
  unspeech with auth, pre-flight flux check, billing from upstream
  session.finished.usage, OTel spans.
- apps/server: audio routes moved from /api/v1/openai/audio/* to
  /api/v1/audio/* (hard cutover; 404 sentinel tests added).
- apps/server: new /api/v1/audio/voices/streaming proxy reads voices
  from unspeech /api/voices?provider=volcengine.
- apps/server: new STREAMING_TTS_UPSTREAM configKV entry +
  scripts/seed-streaming-tts.ts.
- stage-ui: new libs/speech/streaming-pipeline.ts opens one ws per LLM
  intent (appendText / finish / cancel + onSentence / onError / onDone).
- stage-ui: new libs/speech/tts-session.ts — StageTtsSession interface
  with segmenter and streaming adapters; factory dispatches by
  capabilities.speech.transport instead of hard-coded provider id.
- stage-ui: providerOfficialSpeechStreaming with capabilities.speech =
  { transport: 'bidirectional-ws' }; settings page with model/voice
  picker + ws-based preview.
- stage-ui: Stage.vue chat hooks collapsed to a single currentSession;
  hot-swap watcher cancels mid-session on provider/voice/model change;
  unmount cancels and drains playback.

Tests:
- 9 streaming-pipeline tests (happy path / buffered / error / cancel /
  truncation)
- 11 tts-session tests (factory branch coverage + adapter contracts)
- 4 audio-speech-ws route tests (forwarding / billing / pre-flight /
  config-missing)
- 3 legacy-path 404 sentinels in v1 route tests
- Verification doc updated to reflect automated coverage.
2026-05-17 01:35:40 +08:00
RainbowBird
adaddb58eb
feat(server/tts): upgrade dashscope-cosyvoice adapter to v2 two-step REST
DashScope dropped cosyvoice-v1 from its REST-supported model list. v2
(and v3+) speak a different shape: voice / format / sample_rate live
under `input`, not `parameters`; non-streaming responses return
`output.audio.url` (signed OSS URL) instead of inline `output.audio.data`
base64. The previous adapter sent v1-shaped bodies to a bare
`https://dashscope-intl.aliyuncs.com/api/v1` baseURL and parsed
`audio.data`, which 404'd before the migration and would 200-with-no-
audio after — both invisible regressions for the gateway.

Adapter changes:
- Rewrite request body to v2 schema (voice/format under input).
- Add follow-up GET against `output.audio.url`; stream into ArrayBuffer
  with a 25 MB hard cap and explicit drain-tracking finally, so a
  misbehaving URL cannot exhaust memory and a half-read body cannot
  hang a connection.
- Re-document baseURL contract: adapters do NOT append path; ops must
  configure the FULL endpoint URL (root cause of the original 404
  storm). DEFAULT_COSYVOICE_MODEL bumped to `cosyvoice-v2`, default
  voice to `longxiaochun_v2`.

Voice catalog: regenerated with 19 representative cosyvoice-v2 voices
(assistant / customer-service / child / en-US / en-GB / ja-JP / ko-KR)
so the frontend voice picker is no longer a 2-entry stub. Full catalog
(100+) remains on the Alibaba docs page — we'll sync on demand rather
than scrape.

Seed script: `--dashscope-region intl|cn` (default `intl`),
`--dashscope-upstream-model cosyvoice-v2`, baseURL now resolves to
`https://<host>/api/v1/services/audio/tts/SpeechSynthesizer` so a
mis-typed region or path cannot reintroduce the 404.

Tests: new dashscope-cosyvoice.test.ts covers v2 body shape (asserts
`parameters` absent — regression), audio.url follow-up fetch, 401
propagation with `.status`, empty-envelope falling back into the
router's recoverable-error path, and catalog freshness (no leftover v1
ids). Verified locally against the staging DashScope key: 200 +
playable mp3 end to end.
2026-05-16 17:22:13 +08:00
RainbowBird
fb0e149c4a
feat(server): finalize in-process LLM/TTS router cutover
Some checks are pending
Cloudflare Workers (server-dev) / Deploy - stage-web (server-dev) (push) Waiting to run
End-state of the multi-step KTD-5 / KTD-6 / U8 work. The knoway sidecar
is no longer reachable from server code; the router is required at boot
and now owns chat completions, TTS synthesis, and voice catalog listing.

Highlights:
- LLM_ROUTER_MASTER_KEY becomes required; app.ts drops the graceful-
  skip branch and the chat fallback fetch path is gone.
- /audio/speech and /audio/voices route through new routeTts /
  listTtsVoices entries that reuse the chat key-rotator + per-attempt
  timeout + abort propagation.
- DEFAULT_CHAT_MODEL / DEFAULT_TTS_MODEL move from env to configKV so
  default-model swaps are hot-reloadable via Pub/Sub.
- GATEWAY_BASE_URL removed from env schema, .env, .env.local, smoke,
  verification harness. Redis upstream-voices cache deleted — catalogs
  come from in-process adapter JSON.
- routeTts splits adapter error contract by ApiError statusCode:
  4xx propagates without fallback; 5xx folds into the network-failure
  fallback path. handleTTS wraps billing + span attribute in try/finally
  to plug a span leak when ttsMeter.accumulate() throws.
- seed-router-config.ts rewritten with --merge (default) / --reset /
  --dry-run modes and env-var key handoff (OPENROUTER_KEY / AZURE_KEY /
  DASHSCOPE_KEY) so prod seed flows never put plaintext on the CLI.
  Adds DashScope CosyVoice seeding.

Docs (CLAUDE.md, architecture-overview.md, transport-and-routes.md)
reflect the new boundary. verifications/llm-router.md replaces the
overstated "U1-U9 shipped" line with an evidence-vs-pending table.

Tests: full 40-file / 343-case server suite green. New regressions pin
ApiError 4xx → no-fallback, ApiError 5xx → fallback, TTS billing
failure → span closed and error propagated.
2026-05-16 03:21:24 +08:00
RainbowBird
a28dcedede
feat(server): llm & tts gateway (#1837) 2026-05-15 19:00:38 +08:00