mirror of
https://github.com/moeru-ai/airi.git
synced 2026-05-17 04:20:26 +00:00
|
Some checks are pending
Cloudflare Workers (server-dev) / Deploy - stage-web (server-dev) (push) Waiting to run
Why:
- Add a real bidirectional streaming TTS path: raw LLM tokens are
forwarded to the upstream model (Volcengine v3 via the unspeech ws
bridge) without client-side segmentation, so the model owns sentence
splitting and audio chunks play as they arrive.
- Move audio endpoints out of /api/v1/openai/. `/audio/voices`,
`/audio/models`, `/audio/voices/streaming` are not real OpenAI public
APIs, and the streaming TTS surface has nothing to do with OpenAI —
keeping them under /openai/ mislabelled the contract.
- Introduce `capabilities.speech.transport` on ProviderDefinition so
future streaming providers (ElevenLabs / Cartesia / OpenAI Realtime)
opt in without touching Stage.vue or the session factory.
- Unify Stage.vue's TTS path through a single StageTtsSession so the
chat-orchestrator hooks no longer branch on provider id.
What:
- apps/server: new ws proxy /api/v1/audio/speech/ws bridges client ↔
unspeech with auth, pre-flight flux check, billing from upstream
session.finished.usage, OTel spans.
- apps/server: audio routes moved from /api/v1/openai/audio/* to
/api/v1/audio/* (hard cutover; 404 sentinel tests added).
- apps/server: new /api/v1/audio/voices/streaming proxy reads voices
from unspeech /api/voices?provider=volcengine.
- apps/server: new STREAMING_TTS_UPSTREAM configKV entry +
scripts/seed-streaming-tts.ts.
- stage-ui: new libs/speech/streaming-pipeline.ts opens one ws per LLM
intent (appendText / finish / cancel + onSentence / onError / onDone).
- stage-ui: new libs/speech/tts-session.ts — StageTtsSession interface
with segmenter and streaming adapters; factory dispatches by
capabilities.speech.transport instead of hard-coded provider id.
- stage-ui: providerOfficialSpeechStreaming with capabilities.speech =
{ transport: 'bidirectional-ws' }; settings page with model/voice
picker + ws-based preview.
- stage-ui: Stage.vue chat hooks collapsed to a single currentSession;
hot-swap watcher cancels mid-session on provider/voice/model change;
unmount cancels and drains playback.
Tests:
- 9 streaming-pipeline tests (happy path / buffered / error / cancel /
truncation)
- 11 tts-session tests (factory branch coverage + adapter contracts)
- 4 audio-speech-ws route tests (forwarding / billing / pre-flight /
config-missing)
- 3 legacy-path 404 sentinels in v1 route tests
- Verification doc updated to reflect automated coverage.
|
||
|---|---|---|
| .. | ||
| airi-screenshot | ||
| audio | ||
| audio-pipelines-transcribe | ||
| cap-vite | ||
| ccc | ||
| core-agent | ||
| core-character | ||
| drizzle-duckdb-wasm | ||
| duckdb-wasm | ||
| electron-eventa | ||
| electron-screen-capture | ||
| electron-vueuse | ||
| font-chillroundm | ||
| font-cjkfonts-allseto | ||
| font-departure-mono | ||
| font-xiaolai | ||
| i18n | ||
| memory-pgvector | ||
| model-driver-lipsync | ||
| model-driver-mediapipe | ||
| pipelines-audio | ||
| plugin-protocol | ||
| plugin-sdk | ||
| plugin-sdk-tamagotchi | ||
| scenarios-stage-tamagotchi-browser | ||
| scenarios-stage-tamagotchi-electron | ||
| server-runtime | ||
| server-schema | ||
| server-sdk | ||
| server-sdk-shared | ||
| server-shared | ||
| stage-layouts | ||
| stage-pages | ||
| stage-shared | ||
| stage-ui | ||
| stage-ui-live2d | ||
| stage-ui-three | ||
| stream-kit | ||
| ui | ||
| ui-loading-screens | ||
| ui-transitions | ||
| unocss-preset-fonts | ||
| vishot-runner-browser | ||
| vishot-runner-electron | ||
| vishot-runtime | ||