## Problem
The WebGPU inference pipeline had several structural issues:
1. **No unified protocol** — Kokoro TTS, Whisper ASR, and Background
Removal workers each used their own ad-hoc message formats. Adding a new
model meant reinventing worker communication from scratch.
2. **Infrastructure existed but was disconnected** —
`GPUResourceCoordinator`, `LoadQueue`, `InferenceWorkerManager`, and
`protocol.ts` were all implemented but had zero consumers. The adapters
duplicated the same lifecycle/timeout/mutex patterns independently.
3. **Performance gaps** — Kokoro only offered fp32 on WebGPU (no fp16),
Whisper warm-up compiled shaders for 187.5s of dummy audio, audio
transfer went through unnecessary WAV blob encode/decode, and
`listVoices` reloaded the model every time.
4. **Silent failures** — Whisper worker's `generate()` had no try-catch;
errors were swallowed and the main thread waited until timeout.
5. **No graceful degradation** — Whisper and Background Removal workers
hardcoded `device: 'webgpu'` with no WASM fallback.
6. **No observability** — Only Kokoro had performance tracing. No
adapter reported status to `useInferenceStatus`. No cache management UI
existed.
7. **Dead code accumulation** — Old `KokoroWorkerManager` (232 lines),
legacy Whisper message types, and scattered duplicate constants.
## Changes
### Phase 0 — Critical Performance & Bugs
- Add `fp16-webgpu` dtype for Kokoro TTS (~2x inference speed on
supported GPUs)
- Fix Whisper warm-up tensor from `[1, 128, 3000]` → `[1, 128, 1]`
(minimal shader compilation)
- Fix Whisper worker silent error bug (add try-catch to `generate()` and
`load()`)
### Phase 1 — Data Transfer & Caching
- Switch Kokoro audio to Float32Array transferable (skip WAV blob encode
in worker, lightweight WAV encode on main thread)
- Cache `listVoices` results (skip redundant model reload when adapter
state is `ready`)
- Normalize progress reporting to 0-100 across all adapters,
differentiate `warmup` phase
### Phase 2 — Protocol Unification & Infrastructure
- Migrate all 3 workers + 3 adapters to unified `protocol.ts` message
types (`load-model`, `run-inference`, `model-ready`, `inference-result`,
`progress`, `error`)
- Wire `GPUResourceCoordinator` into all adapters (VRAM allocation
tracking, LRU ordering, memory pressure events)
- Wire `LoadQueue` into all adapters (priority-based sequential model
loading: TTS=10 > ASR=5 > BG_REMOVAL=1)
- Add `coordinator.ts` global singleton for GPU coordinator + load queue
- Add WebGPU detection + WASM fallback in Whisper and Background Removal
workers
### Phase 3 — Error Recovery & Observability
- Add restart logic with exponential backoff to Whisper adapter
(matching Kokoro's existing pattern)
- Integrate `classifyError()` (OOM / DEVICE_LOST / TIMEOUT
classification) in Whisper adapter
- Extend `defaultPerfTracer` to Whisper `transcribe()` and Background
Removal `processImage()`
- Wire `useInferenceStatus` into all 3 adapters (downloading → ready →
terminated lifecycle)
### Phase 4 — Tests
- Add unit tests for `AsyncMutex` (4 tests), `LoadQueue` (4 tests),
`GPUResourceCoordinator` (7 tests) — all 15 passing
### Phase 5 — Cleanup & Features
- Delete old `KokoroWorkerManager` (232 lines, zero consumers)
- Delete orphaned `libs/workers/types.ts` (old Whisper message types)
- Clean up `workers/kokoro/types.ts` (remove legacy message types, keep
domain types)
- Create centralized `constants.ts` (MODEL_IDS, MODEL_NAMES, TIMEOUTS,
MAX_RESTARTS)
- Remove hardcoded WebGPU check from background-removal devtools pages
(worker auto-detects)
- Add `useModelPreload` composable for generic idle-time preloading
- Add `useInferencePreload` composable that reads provider config and
preloads configured local models
- Wire preloading into both `stage-web` and `stage-tamagotchi` App.vue
(Kokoro TTS preloads 3s after init)
- Add `ModelCacheManager.vue` settings component (cache size display,
per-model status, clear cache)
- Document GPU Device isolation architecture in protocol.ts
## After This PR
- All inference workers speak the same protocol → adding a new model
adapter is straightforward
- GPU memory is tracked across all models with automatic pressure
warnings at 80%/95% of VRAM budget
- Models load sequentially via priority queue → no bandwidth/VRAM
contention
- Workers auto-detect WebGPU and fall back to WASM → works on browsers
without WebGPU
- Kokoro TTS preloads during idle time → "instant" first use for
configured users
- All adapters auto-restart on worker crashes (max 3 attempts,
exponential backoff)
- 15 unit tests cover core infrastructure (mutex, queue, coordinator)
- Zero dead code remains in the inference pipeline
## Test Plan
- [x] `pnpm exec vitest run packages/stage-ui/src/libs/inference/` — 15
tests pass
- [x] `pnpm -F @proj-airi/stage-ui exec tsc --noEmit` — no TypeScript
errors
- [x] `pnpm lint:fix` — no lint errors in changed files
- [ ] Manual: verify Kokoro TTS works with fp16-webgpu on a supported
browser
- [ ] Manual: verify Whisper ASR loads and transcribes correctly
- [ ] Manual: verify Background Removal works in devtools page
- [ ] Manual: verify preloading triggers in console (`[Preload] Loading
kokoro-...`)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* feat(unocss): centralize font configuration and fix cyrillic font handling (#389)
This commit refactors the UnoCSS font configuration to establish a single source of truth and resolve issues with Cyrillic character rendering.
- Centralizes all font definitions within the root uno.config.ts.
- Updates the theme.fontFamily to use a correct fallback chain, ensuring "Comfortaa" is used for Cyrillic characters.
- Restricts the "Kiwi Maru" font to Latin and Japanese character subsets to prevent incorrect application.
- Simplifies application-specific uno.config.ts files by removing redundant local font overrides.
* fix(ui): ensure provider models reload reactively (#407)
This commit resolves a persistent reactivity issue where the model list would not update automatically after changing provider credentials. This was a recurring problem reported by users on Discord.
Previously, the model list was only fetched when the consciousness settings page was first mounted. This meant that if a user updated an API key and navigated back, the page would still display the old, stale model list until the application was fully restarted.
The fix introduces two key changes:
1. A `watch` effect has been added to the `providers.ts` store. It now monitors `providerCredentials` and automatically refetches the model list for a specific provider whenever its credentials change.
2. The `consciousness.vue` component now uses the `onActivated` lifecycle hook in addition to `onMounted`. This ensures that the model list is refreshed every time the component becomes active, such as when navigating back to it. A `watch` on the `activeProvider` was also added to handle provider switching within the page.
These changes ensure that the model list is always synchronized with the current provider configuration, providing a much smoother and more intuitive user experience.
Closes#407
---------
Co-authored-by: Neko <neko@ayaka.moe>