airi/posthog.config.ts
NJX 147d077aa7
feat(inference): unify and optimize WebGPU inference pipeline (#1622)
## Problem

The WebGPU inference pipeline had several structural issues:

1. **No unified protocol** — Kokoro TTS, Whisper ASR, and Background
Removal workers each used their own ad-hoc message formats. Adding a new
model meant reinventing worker communication from scratch.
2. **Infrastructure existed but was disconnected** —
`GPUResourceCoordinator`, `LoadQueue`, `InferenceWorkerManager`, and
`protocol.ts` were all implemented but had zero consumers. The adapters
duplicated the same lifecycle/timeout/mutex patterns independently.
3. **Performance gaps** — Kokoro only offered fp32 on WebGPU (no fp16),
Whisper warm-up compiled shaders for 187.5s of dummy audio, audio
transfer went through unnecessary WAV blob encode/decode, and
`listVoices` reloaded the model every time.
4. **Silent failures** — Whisper worker's `generate()` had no try-catch;
errors were swallowed and the main thread waited until timeout.
5. **No graceful degradation** — Whisper and Background Removal workers
hardcoded `device: 'webgpu'` with no WASM fallback.
6. **No observability** — Only Kokoro had performance tracing. No
adapter reported status to `useInferenceStatus`. No cache management UI
existed.
7. **Dead code accumulation** — Old `KokoroWorkerManager` (232 lines),
legacy Whisper message types, and scattered duplicate constants.

## Changes

### Phase 0 — Critical Performance & Bugs
- Add `fp16-webgpu` dtype for Kokoro TTS (~2x inference speed on
supported GPUs)
- Fix Whisper warm-up tensor from `[1, 128, 3000]` → `[1, 128, 1]`
(minimal shader compilation)
- Fix Whisper worker silent error bug (add try-catch to `generate()` and
`load()`)

### Phase 1 — Data Transfer & Caching
- Switch Kokoro audio to Float32Array transferable (skip WAV blob encode
in worker, lightweight WAV encode on main thread)
- Cache `listVoices` results (skip redundant model reload when adapter
state is `ready`)
- Normalize progress reporting to 0-100 across all adapters,
differentiate `warmup` phase

### Phase 2 — Protocol Unification & Infrastructure
- Migrate all 3 workers + 3 adapters to unified `protocol.ts` message
types (`load-model`, `run-inference`, `model-ready`, `inference-result`,
`progress`, `error`)
- Wire `GPUResourceCoordinator` into all adapters (VRAM allocation
tracking, LRU ordering, memory pressure events)
- Wire `LoadQueue` into all adapters (priority-based sequential model
loading: TTS=10 > ASR=5 > BG_REMOVAL=1)
- Add `coordinator.ts` global singleton for GPU coordinator + load queue
- Add WebGPU detection + WASM fallback in Whisper and Background Removal
workers

### Phase 3 — Error Recovery & Observability
- Add restart logic with exponential backoff to Whisper adapter
(matching Kokoro's existing pattern)
- Integrate `classifyError()` (OOM / DEVICE_LOST / TIMEOUT
classification) in Whisper adapter
- Extend `defaultPerfTracer` to Whisper `transcribe()` and Background
Removal `processImage()`
- Wire `useInferenceStatus` into all 3 adapters (downloading → ready →
terminated lifecycle)

### Phase 4 — Tests
- Add unit tests for `AsyncMutex` (4 tests), `LoadQueue` (4 tests),
`GPUResourceCoordinator` (7 tests) — all 15 passing

### Phase 5 — Cleanup & Features
- Delete old `KokoroWorkerManager` (232 lines, zero consumers)
- Delete orphaned `libs/workers/types.ts` (old Whisper message types)
- Clean up `workers/kokoro/types.ts` (remove legacy message types, keep
domain types)
- Create centralized `constants.ts` (MODEL_IDS, MODEL_NAMES, TIMEOUTS,
MAX_RESTARTS)
- Remove hardcoded WebGPU check from background-removal devtools pages
(worker auto-detects)
- Add `useModelPreload` composable for generic idle-time preloading
- Add `useInferencePreload` composable that reads provider config and
preloads configured local models
- Wire preloading into both `stage-web` and `stage-tamagotchi` App.vue
(Kokoro TTS preloads 3s after init)
- Add `ModelCacheManager.vue` settings component (cache size display,
per-model status, clear cache)
- Document GPU Device isolation architecture in protocol.ts

## After This PR

- All inference workers speak the same protocol → adding a new model
adapter is straightforward
- GPU memory is tracked across all models with automatic pressure
warnings at 80%/95% of VRAM budget
- Models load sequentially via priority queue → no bandwidth/VRAM
contention
- Workers auto-detect WebGPU and fall back to WASM → works on browsers
without WebGPU
- Kokoro TTS preloads during idle time → "instant" first use for
configured users
- All adapters auto-restart on worker crashes (max 3 attempts,
exponential backoff)
- 15 unit tests cover core infrastructure (mutex, queue, coordinator)
- Zero dead code remains in the inference pipeline

## Test Plan

- [x] `pnpm exec vitest run packages/stage-ui/src/libs/inference/` — 15
tests pass
- [x] `pnpm -F @proj-airi/stage-ui exec tsc --noEmit` — no TypeScript
errors
- [x] `pnpm lint:fix` — no lint errors in changed files
- [ ] Manual: verify Kokoro TTS works with fp16-webgpu on a supported
browser
- [ ] Manual: verify Whisper ASR loads and transcribes correctly
- [ ] Manual: verify Background Removal works in devtools page
- [ ] Manual: verify preloading triggers in console (`[Preload] Loading
kokoro-...`)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
2026-04-13 23:19:27 +08:00

36 lines
1.4 KiB
TypeScript

/// <reference types="vite/client" />
import type { PostHogConfig } from 'posthog-js'
function isEnvFlagEnabled(value: string | undefined): boolean {
if (value == null)
return false
return /^(?:1|true|t|yes|y|on)$/i.test(value.trim())
}
// For Release workflows set `VITE_ENABLE_POSTHOG=true`.
export const POSTHOG_ENABLED = isEnvFlagEnabled(import.meta.env.VITE_ENABLE_POSTHOG)
export const POSTHOG_PROJECT_KEY_WEB
= import.meta.env.VITE_POSTHOG_PROJECT_KEY_WEB
?? 'phc_pzjziJjrVZpa9SqnQqq0QEKvkmuCPH7GDTA6TbRTEf9' // cspell:disable-line
export const POSTHOG_PROJECT_KEY_DESKTOP
= import.meta.env.VITE_POSTHOG_PROJECT_KEY_DESKTOP
?? 'phc_rljw376z5gt6vXJlc3sTr7hFbXodciY9THEQXIRnW53'// cspell:disable-line
// FIXME: Using the same key for 'web' for now.
export const POSTHOG_PROJECT_KEY_POCKET
= import.meta.env.VITE_POSTHOG_PROJECT_KEY_POCKET
?? 'phc_pzjziJjrVZpa9SqnQqq0QEKvkmuCPH7GDTA6TbRTEf9' // cspell:disable-line
// FIXME: Using the same key for 'web' for now.
export const POSTHOG_PROJECT_KEY_DOCS
= import.meta.env.VITE_POSTHOG_PROJECT_KEY_DOCS
?? 'phc_pzjziJjrVZpa9SqnQqq0QEKvkmuCPH7GDTA6TbRTEf9' // cspell:disable-line
export const DEFAULT_POSTHOG_CONFIG = {
api_host: 'https://us.i.posthog.com',
person_profiles: 'identified_only', // or 'always' to create profiles for anonymous users as well
} as const satisfies Partial<PostHogConfig>