mirror of
https://github.com/moeru-ai/airi.git
synced 2026-04-26 13:40:42 +00:00
## Problem The WebGPU inference pipeline had several structural issues: 1. **No unified protocol** — Kokoro TTS, Whisper ASR, and Background Removal workers each used their own ad-hoc message formats. Adding a new model meant reinventing worker communication from scratch. 2. **Infrastructure existed but was disconnected** — `GPUResourceCoordinator`, `LoadQueue`, `InferenceWorkerManager`, and `protocol.ts` were all implemented but had zero consumers. The adapters duplicated the same lifecycle/timeout/mutex patterns independently. 3. **Performance gaps** — Kokoro only offered fp32 on WebGPU (no fp16), Whisper warm-up compiled shaders for 187.5s of dummy audio, audio transfer went through unnecessary WAV blob encode/decode, and `listVoices` reloaded the model every time. 4. **Silent failures** — Whisper worker's `generate()` had no try-catch; errors were swallowed and the main thread waited until timeout. 5. **No graceful degradation** — Whisper and Background Removal workers hardcoded `device: 'webgpu'` with no WASM fallback. 6. **No observability** — Only Kokoro had performance tracing. No adapter reported status to `useInferenceStatus`. No cache management UI existed. 7. **Dead code accumulation** — Old `KokoroWorkerManager` (232 lines), legacy Whisper message types, and scattered duplicate constants. ## Changes ### Phase 0 — Critical Performance & Bugs - Add `fp16-webgpu` dtype for Kokoro TTS (~2x inference speed on supported GPUs) - Fix Whisper warm-up tensor from `[1, 128, 3000]` → `[1, 128, 1]` (minimal shader compilation) - Fix Whisper worker silent error bug (add try-catch to `generate()` and `load()`) ### Phase 1 — Data Transfer & Caching - Switch Kokoro audio to Float32Array transferable (skip WAV blob encode in worker, lightweight WAV encode on main thread) - Cache `listVoices` results (skip redundant model reload when adapter state is `ready`) - Normalize progress reporting to 0-100 across all adapters, differentiate `warmup` phase ### Phase 2 — Protocol Unification & Infrastructure - Migrate all 3 workers + 3 adapters to unified `protocol.ts` message types (`load-model`, `run-inference`, `model-ready`, `inference-result`, `progress`, `error`) - Wire `GPUResourceCoordinator` into all adapters (VRAM allocation tracking, LRU ordering, memory pressure events) - Wire `LoadQueue` into all adapters (priority-based sequential model loading: TTS=10 > ASR=5 > BG_REMOVAL=1) - Add `coordinator.ts` global singleton for GPU coordinator + load queue - Add WebGPU detection + WASM fallback in Whisper and Background Removal workers ### Phase 3 — Error Recovery & Observability - Add restart logic with exponential backoff to Whisper adapter (matching Kokoro's existing pattern) - Integrate `classifyError()` (OOM / DEVICE_LOST / TIMEOUT classification) in Whisper adapter - Extend `defaultPerfTracer` to Whisper `transcribe()` and Background Removal `processImage()` - Wire `useInferenceStatus` into all 3 adapters (downloading → ready → terminated lifecycle) ### Phase 4 — Tests - Add unit tests for `AsyncMutex` (4 tests), `LoadQueue` (4 tests), `GPUResourceCoordinator` (7 tests) — all 15 passing ### Phase 5 — Cleanup & Features - Delete old `KokoroWorkerManager` (232 lines, zero consumers) - Delete orphaned `libs/workers/types.ts` (old Whisper message types) - Clean up `workers/kokoro/types.ts` (remove legacy message types, keep domain types) - Create centralized `constants.ts` (MODEL_IDS, MODEL_NAMES, TIMEOUTS, MAX_RESTARTS) - Remove hardcoded WebGPU check from background-removal devtools pages (worker auto-detects) - Add `useModelPreload` composable for generic idle-time preloading - Add `useInferencePreload` composable that reads provider config and preloads configured local models - Wire preloading into both `stage-web` and `stage-tamagotchi` App.vue (Kokoro TTS preloads 3s after init) - Add `ModelCacheManager.vue` settings component (cache size display, per-model status, clear cache) - Document GPU Device isolation architecture in protocol.ts ## After This PR - All inference workers speak the same protocol → adding a new model adapter is straightforward - GPU memory is tracked across all models with automatic pressure warnings at 80%/95% of VRAM budget - Models load sequentially via priority queue → no bandwidth/VRAM contention - Workers auto-detect WebGPU and fall back to WASM → works on browsers without WebGPU - Kokoro TTS preloads during idle time → "instant" first use for configured users - All adapters auto-restart on worker crashes (max 3 attempts, exponential backoff) - 15 unit tests cover core infrastructure (mutex, queue, coordinator) - Zero dead code remains in the inference pipeline ## Test Plan - [x] `pnpm exec vitest run packages/stage-ui/src/libs/inference/` — 15 tests pass - [x] `pnpm -F @proj-airi/stage-ui exec tsc --noEmit` — no TypeScript errors - [x] `pnpm lint:fix` — no lint errors in changed files - [ ] Manual: verify Kokoro TTS works with fp16-webgpu on a supported browser - [ ] Manual: verify Whisper ASR loads and transcribes correctly - [ ] Manual: verify Background Removal works in devtools page - [ ] Manual: verify preloading triggers in console (`[Preload] Loading kokoro-...`) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
36 lines
1.4 KiB
TypeScript
36 lines
1.4 KiB
TypeScript
/// <reference types="vite/client" />
|
|
|
|
import type { PostHogConfig } from 'posthog-js'
|
|
|
|
function isEnvFlagEnabled(value: string | undefined): boolean {
|
|
if (value == null)
|
|
return false
|
|
|
|
return /^(?:1|true|t|yes|y|on)$/i.test(value.trim())
|
|
}
|
|
|
|
// For Release workflows set `VITE_ENABLE_POSTHOG=true`.
|
|
export const POSTHOG_ENABLED = isEnvFlagEnabled(import.meta.env.VITE_ENABLE_POSTHOG)
|
|
|
|
export const POSTHOG_PROJECT_KEY_WEB
|
|
= import.meta.env.VITE_POSTHOG_PROJECT_KEY_WEB
|
|
?? 'phc_pzjziJjrVZpa9SqnQqq0QEKvkmuCPH7GDTA6TbRTEf9' // cspell:disable-line
|
|
|
|
export const POSTHOG_PROJECT_KEY_DESKTOP
|
|
= import.meta.env.VITE_POSTHOG_PROJECT_KEY_DESKTOP
|
|
?? 'phc_rljw376z5gt6vXJlc3sTr7hFbXodciY9THEQXIRnW53'// cspell:disable-line
|
|
|
|
// FIXME: Using the same key for 'web' for now.
|
|
export const POSTHOG_PROJECT_KEY_POCKET
|
|
= import.meta.env.VITE_POSTHOG_PROJECT_KEY_POCKET
|
|
?? 'phc_pzjziJjrVZpa9SqnQqq0QEKvkmuCPH7GDTA6TbRTEf9' // cspell:disable-line
|
|
|
|
// FIXME: Using the same key for 'web' for now.
|
|
export const POSTHOG_PROJECT_KEY_DOCS
|
|
= import.meta.env.VITE_POSTHOG_PROJECT_KEY_DOCS
|
|
?? 'phc_pzjziJjrVZpa9SqnQqq0QEKvkmuCPH7GDTA6TbRTEf9' // cspell:disable-line
|
|
|
|
export const DEFAULT_POSTHOG_CONFIG = {
|
|
api_host: 'https://us.i.posthog.com',
|
|
person_profiles: 'identified_only', // or 'always' to create profiles for anonymous users as well
|
|
} as const satisfies Partial<PostHogConfig>
|