From 3bd4ff279d40a751f4766efde0a13f63bc3a79f5 Mon Sep 17 00:00:00 2001 From: rUv Date: Wed, 31 Dec 2025 04:19:26 +0000 Subject: [PATCH] docs(onnx-wasm): add comprehensive README with badges and API reference MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added npm and crates.io version badges - WebAssembly and MIT license badges - Quick start examples for Browser, Node.js, and Cloudflare Workers - Complete API reference for WasmEmbedder, WasmEmbedderConfig - Model comparison table with 6 HuggingFace models - Performance benchmarks and use case examples Published to npm as ruvector-onnx-embeddings-wasm@0.1.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- examples/onnx-embeddings-wasm/README.md | 439 +++++++++++++----------- 1 file changed, 238 insertions(+), 201 deletions(-) diff --git a/examples/onnx-embeddings-wasm/README.md b/examples/onnx-embeddings-wasm/README.md index bb679b48c..a50dac5ad 100644 --- a/examples/onnx-embeddings-wasm/README.md +++ b/examples/onnx-embeddings-wasm/README.md @@ -1,35 +1,209 @@ -# RuVector ONNX Embeddings - WASM Edition +# RuVector ONNX Embeddings WASM + +[![npm version](https://img.shields.io/npm/v/ruvector-onnx-embeddings-wasm.svg)](https://www.npmjs.com/package/ruvector-onnx-embeddings-wasm) +[![crates.io](https://img.shields.io/crates/v/ruvector-onnx-embeddings-wasm.svg)](https://crates.io/crates/ruvector-onnx-embeddings-wasm) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +[![WebAssembly](https://img.shields.io/badge/WebAssembly-654FF0?logo=webassembly&logoColor=white)](https://webassembly.org/) > **Portable embedding generation that runs anywhere WebAssembly runs** -This is a WASM-compatible companion to `ruvector-onnx-embeddings`. It provides the same embedding capabilities but uses [Tract](https://github.com/sonos/tract) for inference, enabling deployment to browsers, edge workers, and any WASM runtime. +Generate text embeddings directly in browsers, Cloudflare Workers, Deno, and any WASM runtime. Built with [Tract](https://github.com/sonos/tract) for pure Rust ONNX inference. ## Features | Feature | Description | |---------|-------------| -| **Browser Support** | Generate embeddings directly in web browsers | -| **Edge Computing** | Deploy to Cloudflare Workers, Vercel Edge, Deno | -| **Portable** | Single WASM binary, no platform dependencies | -| **Same API** | Compatible interface with native crate | -| **Small Size** | ~5-10MB WASM bundle (compressed) | +| 🌐 **Browser Support** | Generate embeddings client-side, no server needed | +| ⚡ **Edge Computing** | Deploy to Cloudflare Workers, Vercel Edge, Deno Deploy | +| 📦 **Zero Dependencies** | Single WASM binary, no native modules | +| 🤗 **HuggingFace Models** | Pre-configured URLs for popular models | +| 🔄 **Auto Caching** | Browser Cache API for instant reloads | +| 🎯 **Same API** | Compatible with native `ruvector-onnx-embeddings` | -## Installation +## Quick Start -### Rust (as library) +### Browser (ES Modules) -```toml -[dependencies] -ruvector-onnx-embeddings-wasm = "0.1" +```html + ``` -### JavaScript/TypeScript +### Node.js ```bash npm install ruvector-onnx-embeddings-wasm ``` -### Build from source +```javascript +import { createEmbedder, similarity, embed } from 'ruvector-onnx-embeddings-wasm/loader.js'; + +// One-liner similarity +const score = await similarity("I love dogs", "I adore puppies"); +console.log(score); // ~0.85 + +// One-liner embedding +const embedding = await embed("Hello world"); +console.log(embedding.length); // 384 + +// Full control +const embedder = await createEmbedder('bge-small-en-v1.5'); +const emb1 = embedder.embedOne("First text"); +const emb2 = embedder.embedOne("Second text"); +``` + +### Cloudflare Workers + +```javascript +import { WasmEmbedder, WasmEmbedderConfig } from 'ruvector-onnx-embeddings-wasm'; + +export default { + async fetch(request, env) { + // Load model from R2 or KV + const modelBytes = await env.MODELS.get('model.onnx', 'arrayBuffer'); + const tokenizerJson = await env.MODELS.get('tokenizer.json', 'text'); + + const embedder = new WasmEmbedder( + new Uint8Array(modelBytes), + tokenizerJson + ); + + const { text } = await request.json(); + const embedding = embedder.embedOne(text); + + return Response.json({ + embedding: Array.from(embedding), + dimension: embedding.length + }); + } +}; +``` + +## Available Models + +| Model | Dimension | Size | Speed | Quality | Best For | +|-------|-----------|------|-------|---------|----------| +| **all-MiniLM-L6-v2** ⭐ | 384 | 23MB | ⚡⚡⚡ | ⭐⭐⭐ | Default, fast | +| **all-MiniLM-L12-v2** | 384 | 33MB | ⚡⚡ | ⭐⭐⭐⭐ | Better quality | +| **bge-small-en-v1.5** | 384 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | State-of-the-art | +| **bge-base-en-v1.5** | 768 | 110MB | ⚡ | ⭐⭐⭐⭐⭐ | Best quality | +| **e5-small-v2** | 384 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | Search/retrieval | +| **gte-small** | 384 | 33MB | ⚡⚡⚡ | ⭐⭐⭐⭐ | Multilingual | + +## API Reference + +### ModelLoader + +```javascript +import { ModelLoader, MODELS, DEFAULT_MODEL } from './loader.js'; + +// List available models +console.log(ModelLoader.listModels()); + +// Load with progress +const loader = new ModelLoader({ + cache: true, + onProgress: ({ percent }) => console.log(`${percent}%`) +}); + +const { modelBytes, tokenizerJson, config } = await loader.loadModel('all-MiniLM-L6-v2'); +``` + +### WasmEmbedder + +```typescript +class WasmEmbedder { + constructor(modelBytes: Uint8Array, tokenizerJson: string); + + static withConfig( + modelBytes: Uint8Array, + tokenizerJson: string, + config: WasmEmbedderConfig + ): WasmEmbedder; + + embedOne(text: string): Float32Array; + embedBatch(texts: string[]): Float32Array; + similarity(text1: string, text2: string): number; + + dimension(): number; + maxLength(): number; +} +``` + +### WasmEmbedderConfig + +```typescript +class WasmEmbedderConfig { + constructor(); + setMaxLength(length: number): WasmEmbedderConfig; + setNormalize(normalize: boolean): WasmEmbedderConfig; + setPooling(strategy: number): WasmEmbedderConfig; + // 0=Mean, 1=Cls, 2=Max, 3=MeanSqrtLen, 4=LastToken +} +``` + +### Utility Functions + +```typescript +function cosineSimilarity(a: Float32Array, b: Float32Array): number; +function normalizeL2(embedding: Float32Array): Float32Array; +function version(): string; +function simd_available(): boolean; +``` + +## Pooling Strategies + +| Value | Strategy | Description | +|-------|----------|-------------| +| 0 | **Mean** | Average all tokens (default, recommended) | +| 1 | **Cls** | Use [CLS] token only (BERT-style) | +| 2 | **Max** | Max pooling across tokens | +| 3 | **MeanSqrtLen** | Mean normalized by sqrt(length) | +| 4 | **LastToken** | Last token (decoder models) | + +## Performance + +| Environment | Throughput | Latency | +|-------------|------------|---------| +| Chrome (M1 Mac) | ~50 texts/sec | ~20ms | +| Firefox (M1 Mac) | ~45 texts/sec | ~22ms | +| Node.js 20 | ~80 texts/sec | ~12ms | +| Cloudflare Workers | ~30 texts/sec | ~33ms | +| Deno | ~75 texts/sec | ~13ms | + +*Tested with all-MiniLM-L6-v2, 128 token inputs* + +## Comparison: Native vs WASM + +| Aspect | Native (`ort`) | WASM (`tract`) | +|--------|----------------|----------------| +| Speed | ⚡⚡⚡ Native | ⚡⚡ ~2-3x slower | +| Browser | ❌ | ✅ | +| Edge Workers | ❌ | ✅ | +| GPU | CUDA, TensorRT | ❌ | +| Bundle Size | ~50MB | ~8MB | +| Portability | Platform-specific | Universal | + +**Use native** for: servers, high throughput, GPU acceleration +**Use WASM** for: browsers, edge, portability + +## Building from Source ```bash # Install wasm-pack @@ -41,213 +215,73 @@ wasm-pack build --target web # Build for Node.js wasm-pack build --target nodejs -# Build for bundlers (webpack, etc.) +# Build for bundlers (webpack, vite) wasm-pack build --target bundler ``` -## Usage +## Use Cases -### JavaScript (Browser) - -```html - -``` - -### JavaScript (Node.js) +### Semantic Search ```javascript -const { WasmEmbedder } = require('ruvector-onnx-embeddings-wasm'); -const fs = require('fs'); +const embedder = await createEmbedder(); -// Load model and tokenizer -const modelBytes = fs.readFileSync('./model.onnx'); -const tokenizerJson = fs.readFileSync('./tokenizer.json', 'utf8'); +// Index documents +const docs = ["Rust is fast", "Python is easy", "JavaScript runs everywhere"]; +const embeddings = docs.map(d => embedder.embedOne(d)); -// Create embedder -const embedder = new WasmEmbedder(modelBytes, tokenizerJson); - -// Generate embeddings -const embedding = embedder.embedOne("Hello from Node.js!"); -console.log("Embedding dimension:", embedding.length); +// Search +const query = embedder.embedOne("Which language is performant?"); +const scores = embeddings.map((e, i) => ({ + doc: docs[i], + score: cosineSimilarity(query, e) +})); +scores.sort((a, b) => b.score - a.score); +console.log(scores[0]); // { doc: "Rust is fast", score: 0.82 } ``` -### Cloudflare Workers +### Text Clustering ```javascript -import { WasmEmbedder } from 'ruvector-onnx-embeddings-wasm'; +const texts = [ + "Machine learning is amazing", + "Deep learning uses neural networks", + "I love pizza", + "Italian food is delicious" +]; -export default { - async fetch(request, env) { - // Load model from R2 or KV - const modelBytes = await env.MODELS.get('model.onnx', 'arrayBuffer'); - const tokenizerJson = await env.MODELS.get('tokenizer.json', 'text'); - - const embedder = new WasmEmbedder( - new Uint8Array(modelBytes), - tokenizerJson - ); - - const { text } = await request.json(); - const embedding = embedder.embedOne(text); - - return Response.json({ embedding: Array.from(embedding) }); - } -}; +const embeddings = texts.map(t => embedder.embedOne(t)); +// Use k-means or hierarchical clustering on embeddings ``` -### Rust (WASM target) +### RAG (Retrieval-Augmented Generation) -```rust -use ruvector_onnx_embeddings_wasm::{WasmEmbedder, WasmEmbedderConfig}; +```javascript +// Build knowledge base +const knowledge = [ + "RuVector is a vector database", + "Embeddings capture semantic meaning", + // ... more docs +]; +const knowledgeEmbeddings = knowledge.map(k => embedder.embedOne(k)); -fn main() -> Result<(), Box> { - let model_bytes = include_bytes!("../model.onnx"); - let tokenizer_json = include_str!("../tokenizer.json"); - - let embedder = WasmEmbedder::new(model_bytes, tokenizer_json)?; - - let embedding = embedder.embed_one("Hello from Rust WASM!")?; - println!("Dimension: {}", embedding.len()); - - Ok(()) +// Retrieve relevant context for LLM +function getContext(query, topK = 3) { + const queryEmb = embedder.embedOne(query); + const scores = knowledgeEmbeddings.map((e, i) => ({ + text: knowledge[i], + score: cosineSimilarity(queryEmb, e) + })); + return scores.sort((a, b) => b.score - a.score).slice(0, topK); } ``` -## Configuration +## Related Packages -```javascript -import { WasmEmbedder, WasmEmbedderConfig } from 'ruvector-onnx-embeddings-wasm'; - -// Create custom config -const config = new WasmEmbedderConfig() - .setMaxLength(512) // Max tokens - .setNormalize(true) // L2 normalize - .setPooling(0); // 0=Mean, 1=Cls, 2=Max - -const embedder = WasmEmbedder.withConfig(modelBytes, tokenizerJson, config); -``` - -### Pooling Strategies - -| Value | Strategy | Description | -|-------|----------|-------------| -| 0 | Mean | Average all tokens (default) | -| 1 | Cls | Use [CLS] token only | -| 2 | Max | Max pooling across tokens | -| 3 | MeanSqrtLen | Mean normalized by sqrt(length) | -| 4 | LastToken | Use last token (decoder models) | - -## Supported Models - -Any ONNX model with standard transformer inputs works: -- `input_ids`: Token IDs `[batch, seq_len]` -- `attention_mask`: Attention mask `[batch, seq_len]` -- `token_type_ids`: Token types `[batch, seq_len]` - -### Recommended Models - -| Model | Dimension | Size | Notes | -|-------|-----------|------|-------| -| all-MiniLM-L6-v2 | 384 | 23MB | Fast, good quality | -| all-MiniLM-L12-v2 | 384 | 33MB | Better quality | -| bge-small-en-v1.5 | 384 | 33MB | State-of-the-art small | - -### Converting Models - -```bash -# Install optimum -pip install optimum[onnxruntime] - -# Export to ONNX -optimum-cli export onnx \ - --model sentence-transformers/all-MiniLM-L6-v2 \ - --task feature-extraction \ - ./model_output -``` - -## Performance - -| Environment | Throughput | Latency (single) | -|-------------|------------|------------------| -| Chrome (M1 Mac) | ~50 texts/sec | ~20ms | -| Firefox (M1 Mac) | ~45 texts/sec | ~22ms | -| Node.js | ~80 texts/sec | ~12ms | -| Cloudflare Workers | ~30 texts/sec | ~33ms | -| Deno | ~75 texts/sec | ~13ms | - -*Tested with all-MiniLM-L6-v2, 128 token inputs* - -## Comparison with Native Crate - -| Aspect | Native (`ort`) | WASM (`tract`) | -|--------|----------------|----------------| -| Speed | ⚡⚡⚡ | ⚡⚡ | -| Browser | ❌ | ✅ | -| Edge Workers | ❌ | ✅ | -| GPU | CUDA, TensorRT | ❌ | -| Bundle Size | ~50MB | ~5-10MB | -| Portability | Platform-specific | Universal | - -**Use native** for: servers, high throughput, GPU acceleration -**Use WASM** for: browsers, edge computing, portability - -## API Reference - -### WasmEmbedder - -```typescript -class WasmEmbedder { - constructor(modelBytes: Uint8Array, tokenizerJson: string); - static withConfig(modelBytes: Uint8Array, tokenizerJson: string, config: WasmEmbedderConfig): WasmEmbedder; - - embedOne(text: string): Float32Array; - embedBatch(texts: string[]): Float32Array; - similarity(text1: string, text2: string): number; - - dimension(): number; - maxLength(): number; -} -``` - -### Utility Functions - -```typescript -function cosineSimilarity(a: Float32Array, b: Float32Array): number; -function normalizeL2(embedding: Float32Array): Float32Array; -function version(): string; -function simdAvailable(): boolean; -``` +| Package | Runtime | Use Case | +|---------|---------|----------| +| [ruvector-onnx-embeddings](https://crates.io/crates/ruvector-onnx-embeddings) | Native | High-performance servers | +| **ruvector-onnx-embeddings-wasm** | WASM | Browsers, edge, portable | ## License @@ -255,4 +289,7 @@ MIT License - See [LICENSE](../../LICENSE) for details. --- -**Part of the RuVector ecosystem** - High-performance vector operations in Rust +

+ Part of the RuVector ecosystem
+ High-performance vector operations in Rust +