v2 model trained on 8,201 pairs (5x expansion): - Val accuracy: 75.7% → 95.7% (+20 points) - Val loss: 0.914 → 0.149 (6x improvement) - Beats JSNice (63%), DIRE (65.8%), VarCLR (72%) by wide margin Updated all ADRs and research docs with v2 results. Exported weights-v2.bin (2.6MB) for pure Rust inference. Co-Authored-By: claude-flow <ruv@ruv.net>
7.9 KiB
ADR-132: E2E Browser Testing with @claude-flow/browser
Status
Proposed
Date
2026-04-02
Context
The ui/ruvocal dashboard (SvelteKit + Svelte 5) has unit and SSR tests via Vitest but lacks end-to-end browser tests that validate real user flows. The @claude-flow/browser skill provides AI-optimized browser automation via Playwright, enabling agents to navigate, interact, screenshot, and assert against live UI — making it ideal for E2E testing orchestrated by claude-flow swarms.
Current Test Gap
| Layer | Coverage | Tool |
|---|---|---|
| Unit (client) | *.svelte.test.ts |
Vitest + Playwright env |
| SSR | *.ssr.test.ts |
Vitest + Node env |
| Server | *.test.ts / *.spec.ts |
Vitest + Node env |
| E2E (browser) | None | Proposed: @claude-flow/browser |
Key UI Routes to Cover
| Route | Purpose | Priority |
|---|---|---|
/login |
Authentication flow | P0 |
/conversation/[id] |
Core chat + streaming | P0 |
/settings |
User preferences | P1 |
/admin/stats |
Admin dashboard stats | P1 |
/metrics |
System metrics view | P1 |
/models |
Model selection | P2 |
/r/[id] |
Shared conversation view | P2 |
Decision
Adopt @claude-flow/browser as the E2E testing framework for ui/ruvocal, integrated with claude-flow swarm orchestration for parallel test execution.
Architecture
┌─────────────────────────────────────┐
│ claude-flow swarm (hierarchical) │
│ ┌───────────┐ ┌───────────┐ │
│ │ test-agent│ │ test-agent│ ... │
│ │ (auth) │ │ (chat) │ │
│ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │
│ ┌─────▼───────────────▼─────┐ │
│ │ @claude-flow/browser │ │
│ │ (Playwright engine) │ │
│ └─────────────┬─────────────┘ │
│ │ │
│ ┌─────────────▼─────────────┐ │
│ │ SvelteKit dev server │ │
│ │ localhost:5173 │ │
│ └───────────────────────────┘ │
└─────────────────────────────────────┘
@claude-flow/browser Tool Reference
The browser skill exposes these MCP tools for E2E automation:
| Tool | Purpose | E2E Use |
|---|---|---|
browser_open |
Navigate to URL | Load pages under test |
browser_click |
Click elements | Interact with buttons, links |
browser_fill |
Fill form inputs | Login forms, settings, chat input |
browser_type |
Type text | Chat messages, search queries |
browser_press |
Press keys | Enter to send, Escape to close |
browser_snapshot |
AI-optimized DOM snapshot | Assert page state |
browser_screenshot |
Visual capture | Visual regression testing |
browser_get-text |
Extract text content | Verify rendered output |
browser_get-title |
Get page title | Route validation |
browser_get-url |
Get current URL | Navigation assertions |
browser_wait |
Wait for condition | Loading states, streaming |
browser_eval |
Run JS in page | Custom assertions, state checks |
browser_select |
Select dropdown option | Model selection, settings |
browser_scroll |
Scroll viewport | Long conversation history |
browser_hover |
Hover elements | Tooltip verification |
browser_check/uncheck |
Toggle checkboxes | Settings toggles |
browser_back/forward |
Navigation history | Back/forward flow |
browser_reload |
Reload page | State persistence checks |
browser_close |
Close browser | Cleanup |
browser_session-list |
List active sessions | Multi-tab testing |
E2E Test Patterns
Pattern 1: Authentication Flow
1. browser_open → http://localhost:5173/login
2. browser_snapshot → verify login form rendered
3. browser_fill → username/password fields
4. browser_click → submit button
5. browser_wait → redirect to /conversation
6. browser_get-url → assert URL changed
7. browser_snapshot → verify authenticated state
Pattern 2: Chat Conversation
1. browser_open → http://localhost:5173/conversation/[id]
2. browser_snapshot → verify chat UI loaded
3. browser_fill → message input
4. browser_press → Enter
5. browser_wait → streaming response appears
6. browser_get-text → verify assistant response
7. browser_screenshot → capture conversation state
Pattern 3: Settings Management
1. browser_open → http://localhost:5173/settings
2. browser_snapshot → verify settings page
3. browser_select → change model preference
4. browser_check → toggle feature flag
5. browser_click → save button
6. browser_reload → verify persistence
7. browser_snapshot → assert settings retained
Pattern 4: Admin Dashboard
1. browser_open → http://localhost:5173/admin/stats
2. browser_wait → stats data loaded
3. browser_snapshot → verify dashboard components
4. browser_get-text → extract metric values
5. browser_eval → assert metric ranges
6. browser_screenshot → visual baseline
Swarm-Based Parallel Execution
# Initialize test swarm
npx @claude-flow/cli@latest swarm init \
--topology hierarchical \
--max-agents 6 \
--strategy specialized
# Spawn parallel test agents
# Agent 1: Auth tests
# Agent 2: Chat flow tests
# Agent 3: Settings tests
# Agent 4: Admin dashboard tests
# Agent 5: Model selection tests
# Agent 6: Shared conversation tests
Each agent uses @claude-flow/browser independently with isolated browser sessions, enabling full parallel execution.
Test File Organization
tests/
└── e2e/
├── auth.e2e.ts # Login/logout flows
├── conversation.e2e.ts # Chat and streaming
├── settings.e2e.ts # User preferences
├── admin.e2e.ts # Admin dashboard
├── models.e2e.ts # Model selection
├── shared.e2e.ts # Shared conversation views
├── fixtures/
│ ├── test-users.ts # Test credentials
│ └── test-data.ts # Seed data
└── helpers/
├── browser.ts # Browser helper wrappers
└── assertions.ts # Custom assertion utilities
CI Integration
E2E tests run as a GitHub Actions workflow:
- Start SvelteKit dev server (
npm run dev) - Initialize claude-flow swarm
- Spawn browser test agents in parallel
- Collect results and screenshots
- Fail pipeline on assertion failures
- Archive screenshots as artifacts
Consequences
Positive
- Real browser coverage for all critical user flows
- Parallel execution via swarm reduces total test time
- AI-optimized snapshots enable intelligent assertions (not just CSS selectors)
- Visual regression detection via screenshots
- Reuses existing claude-flow infrastructure
Negative
- Browser tests are inherently slower than unit tests
- Requires running dev server during CI
- Playwright dependency adds ~100MB to CI image
- Flaky test risk with streaming/async UI states
Mitigations
- Use
browser_waitwith explicit conditions to reduce flakiness - Run E2E only on PR merges to main (not every push)
- Implement retry logic for network-dependent tests
- Use
browser_evalfor deterministic state checks over visual assertions