mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-25 23:24:03 +00:00
feat: Add quantum simulation engine ADR series (QE-001 to QE-012) and DDD design documents
Comprehensive architecture decision records and domain-driven design documentation for integrating a Rust-based quantum simulation engine (ruQu) into the ruVector stack. ADR Series (12 documents): - QE-001: Core Architecture - pure Rust state-vector simulator decision - QE-002: Crate Structure - three-crate architecture (ruqu-core, ruqu-wasm, ruqu-algorithms) - QE-003: WASM Compilation - WebAssembly strategy with 25-qubit limit enforcement - QE-004: Performance Optimization - SIMD, multithreading, gate fusion, benchmarks - QE-005: VQE Algorithm - variational eigensolver with exact expectation values - QE-006: Grover Search - O(1) oracle optimization via direct state vector access - QE-007: QAOA MaxCut - graph-based optimization with Rzz native gates - QE-008: Surface Code Error Correction - mid-circuit measurement, syndrome extraction - QE-009: Tensor Network Evaluation - MPS/contraction for shallow circuits - QE-010: Observability & Monitoring - metrics, tracing, health checks integration - QE-011: Memory Gating & Power Management - zero-idle, on-demand allocation - QE-012: Min-Cut Coherence Integration - syndrome-to-decoder bridge with ruQu DDD Design (3 documents): - Strategic Design: 6 bounded contexts, context map, ubiquitous language - Tactical Design: 6 aggregates, 20+ value objects, 15+ domain events, services - Integration Patterns: anti-corruption layers, shared kernel, event flows https://claude.ai/code/session_01B1NkbLDWYPaacS9miKsnvW
This commit is contained in:
parent
6de9c3d41f
commit
f27616ead1
15 changed files with 10087 additions and 0 deletions
|
|
@ -0,0 +1,305 @@
|
|||
# ADR-QE-001: Quantum Engine Core Architecture
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Context
|
||||
|
||||
### Problem Statement
|
||||
|
||||
ruVector needs a quantum simulation engine for on-device quantum algorithm
|
||||
experimentation. The platform runs on distributed edge systems, primarily
|
||||
targeting Cognitum's 256-core low-power processors, and emphasizes ultra-low-power
|
||||
event-driven computing. Quantum simulation is a natural extension of ruVector's
|
||||
mathematical computation capabilities: the same SIMD-optimized linear algebra
|
||||
that powers vector search and neural inference can drive state-vector manipulation
|
||||
for quantum circuits.
|
||||
|
||||
### Requirements
|
||||
|
||||
The engine must support gate-model quantum circuit simulation up to approximately
|
||||
25 qubits, covering the following algorithm families:
|
||||
|
||||
| Algorithm Family | Use Case | Typical Qubits | Gate Depth |
|
||||
|------------------|----------|-----------------|------------|
|
||||
| VQE (Variational Quantum Eigensolver) | Molecular simulation, optimization | 8-20 | 50-500 per iteration |
|
||||
| Grover's Search | Unstructured database search | 8-25 | O(sqrt(2^n)) |
|
||||
| QAOA (Quantum Approximate Optimization) | Combinatorial optimization | 10-25 | O(p * edges) |
|
||||
| Quantum Error Correction | Surface code, stabilizer circuits | 9-25 (logical + ancilla) | Repetitive syndrome rounds |
|
||||
|
||||
### Memory Scaling Analysis
|
||||
|
||||
Quantum state-vector simulation stores the full amplitude vector of 2^n complex
|
||||
numbers. Each amplitude is a pair of f64 values (real + imaginary = 16 bytes).
|
||||
Memory grows exponentially:
|
||||
|
||||
```
|
||||
Qubits Amplitudes State Size With Scratch Buffer
|
||||
------ ----------- ---------- -------------------
|
||||
10 1,024 16 KB 32 KB
|
||||
15 32,768 512 KB 1 MB
|
||||
20 1,048,576 16 MB 32 MB
|
||||
22 4,194,304 64 MB 128 MB
|
||||
24 16,777,216 256 MB 512 MB
|
||||
25 33,554,432 512 MB 1.07 GB
|
||||
26 67,108,864 1.07 GB 2.14 GB
|
||||
28 268,435,456 4.29 GB 8.59 GB
|
||||
30 1,073,741,824 17.18 GB 34.36 GB
|
||||
```
|
||||
|
||||
At 25 qubits the state vector requires approximately 512 MB (1.07 GB with a
|
||||
scratch buffer for intermediate calculations). This is the practical ceiling
|
||||
for WebAssembly's 32-bit address space. Native execution with sufficient RAM
|
||||
can push to 30+ qubits.
|
||||
|
||||
### Edge Computing Constraints
|
||||
|
||||
Cognitum's 256-core processors operate under strict power and memory budgets:
|
||||
|
||||
- **Power envelope**: Event-driven activation; cores idle at near-zero draw
|
||||
- **Memory**: Shared pool, typically 2-8 GB per node
|
||||
- **Interconnect**: Low-latency mesh between cores, suitable for parallel simulation
|
||||
- **Workload model**: Burst computation triggered by agent events, not continuous
|
||||
|
||||
The quantum engine must respect this model: allocate state only when a simulation
|
||||
is triggered, execute the circuit, return results, and immediately release all
|
||||
memory.
|
||||
|
||||
## Decision
|
||||
|
||||
Implement a **pure Rust state-vector quantum simulator** as a new crate family
|
||||
(`ruQu` quantum engine) within the ruVector workspace. The following architectural
|
||||
decisions define the engine.
|
||||
|
||||
### 1. Pure Rust Implementation (No C/C++ FFI)
|
||||
|
||||
The entire simulation engine is written in Rust with no foreign function interface
|
||||
dependencies. This ensures:
|
||||
|
||||
- Compilation to `wasm32-unknown-unknown` without emscripten or C toolchains
|
||||
- Memory safety guarantees throughout the simulation pipeline
|
||||
- Unified build system via Cargo across all targets
|
||||
- No external library version conflicts or platform-specific linking issues
|
||||
|
||||
### 2. State-Vector Simulation as Primary Backend
|
||||
|
||||
The engine uses explicit full-amplitude state-vector representation as its
|
||||
primary simulation mode. Each gate application transforms the full 2^n
|
||||
amplitude vector via matrix-vector multiplication.
|
||||
|
||||
```
|
||||
Circuit Execution Model:
|
||||
|
||||
|psi_0> ──[H]──[CNOT]──[Rz(theta)]──[Measure]── classical bits
|
||||
| | | |
|
||||
v v v v
|
||||
[init] [apply_H] [apply_CNOT] [apply_Rz] [sample]
|
||||
| | | | |
|
||||
2^n f64 2^n f64 2^n f64 2^n f64 collapse
|
||||
complex complex complex complex to basis
|
||||
```
|
||||
|
||||
Gate application follows the standard decomposition:
|
||||
|
||||
- **Single-qubit gates**: Iterate amplitude pairs (i, i XOR 2^target), apply 2x2
|
||||
unitary. O(2^n) operations per gate.
|
||||
- **Two-qubit gates**: Iterate amplitude quadruples, apply 4x4 unitary.
|
||||
O(2^n) operations per gate.
|
||||
- **Multi-qubit gates**: Decompose into single and two-qubit gates, or apply
|
||||
directly via 2^k x 2^k matrix on k target qubits.
|
||||
|
||||
### 3. Qubit Limits and Precision
|
||||
|
||||
| Parameter | WASM Target | Native Target |
|
||||
|-----------|-------------|---------------|
|
||||
| Max qubits (default) | 25 | 30+ (RAM-dependent) |
|
||||
| Max qubits (hard limit) | 26 (with f32) | Memory-limited |
|
||||
| Precision (default) | Complex f64 | Complex f64 |
|
||||
| Precision (optional) | Complex f32 | Complex f32 |
|
||||
| State size at max | ~1.07 GB | ~17 GB at 30 qubits |
|
||||
|
||||
Complex f64 is the default precision, providing approximately 15 decimal digits
|
||||
of accuracy -- sufficient for quantum chemistry applications and deep circuits
|
||||
where accumulated floating-point error matters. An optional f32 mode halves
|
||||
memory usage at the cost of precision, suitable for shallow circuits and
|
||||
approximate optimization.
|
||||
|
||||
### 4. Event-Driven Activation Model
|
||||
|
||||
The engine follows ruVector's event-driven philosophy:
|
||||
|
||||
```
|
||||
Agent Context ruQu Engine Memory
|
||||
| | |
|
||||
|-- trigger(circuit) ->| |
|
||||
| |-- allocate(2^n) ---->|
|
||||
| |<---- state_ptr ------|
|
||||
| | |
|
||||
| |-- [execute gates] -->|
|
||||
| |-- [measure] -------->|
|
||||
| | |
|
||||
|<-- results ---------| |
|
||||
| |-- deallocate() ----->|
|
||||
| | |
|
||||
(idle) (inert) (freed)
|
||||
```
|
||||
|
||||
- **Inert by default**: No background threads, no persistent allocations
|
||||
- **Allocate on demand**: State vector created when circuit execution begins
|
||||
- **Free immediately**: All simulation memory released upon result delivery
|
||||
- **No global state**: Multiple concurrent simulations supported via independent
|
||||
state handles (no shared mutable global)
|
||||
|
||||
### 5. Dual-Target Compilation
|
||||
|
||||
The crate supports two compilation targets from a single codebase:
|
||||
|
||||
```
|
||||
ruqu-core
|
||||
|
|
||||
+----------+----------+
|
||||
| |
|
||||
[native target] [wasm32-unknown-unknown]
|
||||
| |
|
||||
- Full SIMD (AVX2, - WASM SIMD128
|
||||
AVX-512, NEON) - 4GB address limit
|
||||
- Rayon threading - Optional SharedArrayBuffer
|
||||
- Optional GPU (wgpu) - No GPU
|
||||
- 30+ qubits - 25 qubit ceiling
|
||||
- Full OS integration - Sandboxed
|
||||
```
|
||||
|
||||
Conditional compilation via Cargo feature flags controls target-specific code
|
||||
paths. The public API surface is identical across targets.
|
||||
|
||||
### 6. Optional Tensor Network Mode
|
||||
|
||||
For circuits with limited entanglement (e.g., shallow QAOA, certain VQE
|
||||
ansatze), the engine offers an optional tensor network backend:
|
||||
|
||||
- Represents the quantum state as a network of tensors rather than a single
|
||||
exponential vector
|
||||
- Memory scales as O(n * chi^2) where chi is the bond dimension (maximum
|
||||
entanglement width)
|
||||
- Efficient for circuits where entanglement grows slowly or remains bounded
|
||||
- Falls back to full state-vector when bond dimension exceeds threshold
|
||||
- Enabled via the `tensor-network` feature flag
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: Qukit (Rust, WASM-ready)
|
||||
|
||||
A pre-1.0 Rust quantum simulator with WASM support.
|
||||
|
||||
| Criterion | Assessment |
|
||||
|-----------|------------|
|
||||
| Maturity | Pre-1.0, limited community |
|
||||
| WASM support | Present but untested at scale |
|
||||
| Optimization | Basic; no SIMD, no gate fusion |
|
||||
| Integration | Would require adapter layer |
|
||||
| Maintenance | External dependency risk |
|
||||
|
||||
**Rejected**: Insufficient optimization depth and maturity for production use.
|
||||
|
||||
### Alternative 2: QuantRS2 (Rust, Python-focused)
|
||||
|
||||
A Rust quantum simulator primarily targeting Python bindings via PyO3.
|
||||
|
||||
| Criterion | Assessment |
|
||||
|-----------|------------|
|
||||
| Performance | Good benchmarks on native |
|
||||
| WASM support | Not a design target |
|
||||
| Dependencies | Heavy; Python-oriented build |
|
||||
| API design | Python-first, Rust API secondary |
|
||||
| Integration | Significant impedance mismatch |
|
||||
|
||||
**Rejected**: Python-centric design creates unnecessary weight and integration
|
||||
friction for a Rust-native edge system.
|
||||
|
||||
### Alternative 3: roqoqo + QuEST (Rust frontend, C backend)
|
||||
|
||||
roqoqo provides a Rust circuit description layer; QuEST is a high-performance
|
||||
C/C++ state-vector simulator.
|
||||
|
||||
| Criterion | Assessment |
|
||||
|-----------|------------|
|
||||
| Performance | Excellent (QuEST is highly optimized) |
|
||||
| WASM support | QuEST's C code breaks WASM compilation |
|
||||
| Maintenance | External C library maintenance burden |
|
||||
| Memory safety | C backend outside Rust safety guarantees |
|
||||
|
||||
**Rejected**: C dependency is incompatible with WASM target requirement.
|
||||
|
||||
### Alternative 4: Quant-Iron (Rust + OpenCL)
|
||||
|
||||
A Rust simulator leveraging OpenCL for GPU acceleration.
|
||||
|
||||
| Criterion | Assessment |
|
||||
|-----------|------------|
|
||||
| Performance | Excellent on GPU-equipped hardware |
|
||||
| WASM support | OpenCL incompatible with WASM |
|
||||
| Edge deployment | Most edge nodes lack discrete GPUs |
|
||||
| Complexity | OpenCL runtime adds operational burden |
|
||||
|
||||
**Rejected**: OpenCL dependency incompatible with WASM and edge deployment model.
|
||||
|
||||
### Alternative 5: No Simulator (Cloud Quantum APIs)
|
||||
|
||||
Delegate all quantum computation to cloud-based quantum simulators or hardware.
|
||||
|
||||
| Criterion | Assessment |
|
||||
|-----------|------------|
|
||||
| Performance | Network-bound latency |
|
||||
| Offline support | None; requires connectivity |
|
||||
| Cost | Per-execution charges |
|
||||
| Privacy | Circuit data sent to third party |
|
||||
| Edge philosophy | Violates offline-first design |
|
||||
|
||||
**Rejected**: Fundamentally incompatible with ruVector's offline-first edge
|
||||
computing philosophy.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Full control**: Complete ownership of the simulation pipeline, enabling
|
||||
deep integration with ruVector's math, SIMD, and memory subsystems
|
||||
- **WASM portable**: Single codebase compiles to any WASM runtime, enabling
|
||||
browser-based quantum experimentation
|
||||
- **No external dependencies**: Eliminates supply chain risk from C/C++ or
|
||||
Python library dependencies
|
||||
- **Edge-aligned**: Event-driven activation model matches Cognitum's power
|
||||
architecture
|
||||
- **Extensible**: Gate set, noise models, and backends can evolve independently
|
||||
|
||||
### Negative
|
||||
|
||||
- **Development effort**: Building a competitive quantum simulator from scratch
|
||||
requires significant engineering investment
|
||||
- **Maintenance burden**: Team must benchmark, optimize, and maintain the
|
||||
simulation engine alongside the rest of ruVector
|
||||
- **Classical simulation limits**: Exponential scaling is a fundamental physics
|
||||
constraint; the engine cannot exceed ~30 qubits on practical hardware
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| Performance below competitors | Medium | High | Benchmark-driven development against QuantRS2/Qukit |
|
||||
| Floating-point accuracy drift | Low | Medium | Comprehensive numerical tests, optional f64 enforcement |
|
||||
| WASM memory exhaustion | Medium | Medium | Hard qubit limit with clear error messages (ADR-QE-003) |
|
||||
| Scope creep into hardware simulation | Low | Low | Strict scope: gate-model only, no analog/pulse simulation |
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-005: WASM Runtime Integration](/docs/adr/ADR-005-wasm-runtime-integration.md)
|
||||
- [ADR-003: SIMD Optimization Strategy](/docs/adr/ADR-003-simd-optimization-strategy.md)
|
||||
- [ADR-006: Memory Management](/docs/adr/ADR-006-memory-management.md)
|
||||
- [ADR-014: Coherence Engine](/docs/adr/ADR-014-coherence-engine.md)
|
||||
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
|
||||
- [ADR-QE-003: WASM Compilation Strategy](./ADR-QE-003-wasm-compilation-strategy.md)
|
||||
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
|
||||
- Nielsen & Chuang, "Quantum Computation and Quantum Information" (2010)
|
||||
- Aaronson & Gottesman, "Improved simulation of stabilizer circuits" (2004)
|
||||
|
|
@ -0,0 +1,474 @@
|
|||
# ADR-QE-002: Crate Structure & ruVector Integration
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Context
|
||||
|
||||
### Problem Statement
|
||||
|
||||
The quantum engine must fit within the ruVector workspace, which currently
|
||||
comprises 73+ crates following a consistent modular architecture. The existing
|
||||
`ruQu` crate handles classical coherence monitoring -- specifically min-cut
|
||||
analysis and MWPM (Minimum Weight Perfect Matching) decoding for error
|
||||
correction analysis. The new quantum simulation capability requires clear
|
||||
separation from this classical functionality while integrating deeply with
|
||||
ruVector's shared infrastructure.
|
||||
|
||||
### Existing Workspace Patterns
|
||||
|
||||
The ruVector workspace follows established conventions that the quantum engine
|
||||
must respect:
|
||||
|
||||
```
|
||||
ruvector/
|
||||
crates/
|
||||
ruvector-math/ # SIMD-optimized linear algebra
|
||||
ruvector-hnsw/ # Vector similarity search
|
||||
ruvector-metrics/ # Observability and telemetry
|
||||
ruvector-router-wasm/ # WASM bindings for routing
|
||||
ruQu/ # Classical coherence (min-cut, MWPM)
|
||||
...73+ crates
|
||||
Cargo.toml # Workspace root
|
||||
```
|
||||
|
||||
Key conventions observed:
|
||||
|
||||
- **`no_std` + `alloc`** for maximum portability
|
||||
- **Feature flags** for optional capabilities (parallel, gpu, etc.)
|
||||
- **Separate WASM crates** for browser-facing bindings (e.g., `ruvector-router-wasm`)
|
||||
- **Metrics integration** via `ruvector-metrics` for observability
|
||||
- **SIMD reuse** via `ruvector-math` for hot-path computations
|
||||
|
||||
### Integration Points
|
||||
|
||||
The quantum engine must interact with several existing subsystems:
|
||||
|
||||
```
|
||||
+-------------------+
|
||||
| Agent Framework |
|
||||
+--------+----------+
|
||||
|
|
||||
trigger circuit execution
|
||||
|
|
||||
+--------v----------+
|
||||
| ruqu-core |
|
||||
| (quantum sim) |
|
||||
+---+------+--------+
|
||||
| |
|
||||
+----------+ +----------+
|
||||
| |
|
||||
+--------v--------+ +-----------v---------+
|
||||
| ruvector-math | | ruvector-metrics |
|
||||
| (SIMD, linalg) | | (telemetry) |
|
||||
+-----------------+ +---------------------+
|
||||
|
|
||||
+--------v--------+
|
||||
| ruQu (existing) |
|
||||
| (min-cut, MWPM) |
|
||||
+-----------------+
|
||||
```
|
||||
|
||||
## Decision
|
||||
|
||||
Adopt a **three-crate architecture** for the quantum engine, each with a
|
||||
clearly defined responsibility boundary.
|
||||
|
||||
### Crate 1: `ruqu-core` -- Pure Rust Simulation Library
|
||||
|
||||
The core simulation engine, containing all quantum computation logic.
|
||||
|
||||
**Responsibilities**:
|
||||
- `QuantumCircuit`: Circuit representation and manipulation
|
||||
- `QuantumState`: State-vector storage and operations
|
||||
- `Gate` enum: Full gate set (Pauli, Hadamard, CNOT, Toffoli, parametric rotations, etc.)
|
||||
- Measurement operations (computational basis, Pauli basis, mid-circuit)
|
||||
- Circuit optimization passes (gate fusion, cancellation)
|
||||
- Noise model application (optional)
|
||||
- Entanglement tracking for state splitting
|
||||
|
||||
**Design constraints**:
|
||||
- `#![no_std]` with `alloc` for embedded/WASM portability
|
||||
- Zero required external dependencies beyond `alloc`
|
||||
- All platform-specific code behind feature flags
|
||||
|
||||
**Feature flags**:
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `std` | off | Enable std library features (file I/O, advanced error types) |
|
||||
| `parallel` | off | Enable Rayon-based multi-threaded gate application |
|
||||
| `gpu` | off | Enable wgpu-based GPU acceleration for large states |
|
||||
| `tensor-network` | off | Enable tensor network backend for shallow circuits |
|
||||
| `noise-model` | off | Enable depolarizing, amplitude damping, and custom noise channels |
|
||||
| `f32` | off | Use f32 precision instead of f64 (halves memory, reduces accuracy) |
|
||||
| `serde` | off | Enable serialization of circuits and states |
|
||||
|
||||
**Module structure**:
|
||||
|
||||
```
|
||||
ruqu-core/
|
||||
src/
|
||||
lib.rs # Crate root, feature flag gating
|
||||
state.rs # QuantumState: amplitude storage, initialization
|
||||
circuit.rs # QuantumCircuit: gate sequence, metadata
|
||||
gates/
|
||||
mod.rs # Gate enum and dispatch
|
||||
single.rs # Single-qubit gates (H, X, Y, Z, S, T, Rx, Ry, Rz, U3)
|
||||
two.rs # Two-qubit gates (CNOT, CZ, SWAP, Rxx, Ryy, Rzz)
|
||||
multi.rs # Multi-qubit gates (Toffoli, Fredkin, custom unitaries)
|
||||
parametric.rs # Parameterized gate support for variational algorithms
|
||||
execution/
|
||||
mod.rs # Execution engine dispatch
|
||||
statevector.rs # Full state-vector simulation engine
|
||||
tensor.rs # Tensor network backend (feature-gated)
|
||||
noise.rs # Noise channel application (feature-gated)
|
||||
measurement.rs # Measurement: sampling, expectation values
|
||||
optimize/
|
||||
mod.rs # Circuit optimization pipeline
|
||||
fusion.rs # Gate fusion pass
|
||||
cancel.rs # Gate cancellation (HH=I, XX=I, etc.)
|
||||
commute.rs # Commutation-based reordering
|
||||
entanglement.rs # Entanglement tracking and state splitting
|
||||
types.rs # Complex number types, precision configuration
|
||||
error.rs # Error types (QubitOverflow, InvalidGate, etc.)
|
||||
Cargo.toml
|
||||
benches/
|
||||
statevector.rs # Criterion benchmarks for core operations
|
||||
```
|
||||
|
||||
**Public API surface**:
|
||||
|
||||
```rust
|
||||
// Core types
|
||||
pub struct QuantumState { /* ... */ }
|
||||
pub struct QuantumCircuit { /* ... */ }
|
||||
pub enum Gate { H, X, Y, Z, S, T, CNOT, CZ, Rx(f64), Ry(f64), Rz(f64), /* ... */ }
|
||||
|
||||
// Circuit construction
|
||||
impl QuantumCircuit {
|
||||
pub fn new(num_qubits: usize) -> Result<Self, QubitOverflow>;
|
||||
pub fn gate(&mut self, gate: Gate, targets: &[usize]) -> &mut Self;
|
||||
pub fn measure(&mut self, qubit: usize) -> &mut Self;
|
||||
pub fn measure_all(&mut self) -> &mut Self;
|
||||
pub fn barrier(&mut self) -> &mut Self;
|
||||
pub fn depth(&self) -> usize;
|
||||
pub fn gate_count(&self) -> usize;
|
||||
pub fn optimize(&mut self) -> &mut Self;
|
||||
}
|
||||
|
||||
// Execution
|
||||
impl QuantumState {
|
||||
pub fn new(num_qubits: usize) -> Result<Self, QubitOverflow>;
|
||||
pub fn execute(&mut self, circuit: &QuantumCircuit) -> ExecutionResult;
|
||||
pub fn sample(&self, shots: usize) -> Vec<BitString>;
|
||||
pub fn expectation(&self, observable: &Observable) -> f64;
|
||||
pub fn probabilities(&self) -> Vec<f64>;
|
||||
pub fn amplitude(&self, basis_state: usize) -> Complex<f64>;
|
||||
}
|
||||
```
|
||||
|
||||
### Crate 2: `ruqu-wasm` -- WebAssembly Bindings
|
||||
|
||||
WASM-specific bindings exposing the quantum engine to JavaScript environments.
|
||||
|
||||
**Responsibilities**:
|
||||
- wasm-bindgen annotated wrapper types
|
||||
- JavaScript-friendly API (string-based circuit construction, JSON results)
|
||||
- Memory limit enforcement (reject circuits exceeding WASM address space)
|
||||
- Optional multi-threading via wasm-bindgen-rayon
|
||||
|
||||
**Design constraints**:
|
||||
- Mirrors the `ruvector-router-wasm` crate pattern
|
||||
- Thin wrapper; all logic delegated to `ruqu-core`
|
||||
- TypeScript type definitions auto-generated
|
||||
|
||||
**Module structure**:
|
||||
|
||||
```
|
||||
ruqu-wasm/
|
||||
src/
|
||||
lib.rs # wasm-bindgen entry points
|
||||
circuit.rs # JS-facing QuantumCircuit wrapper
|
||||
state.rs # JS-facing QuantumState wrapper
|
||||
types.rs # JS-compatible type conversions
|
||||
limits.rs # WASM memory limit checks
|
||||
Cargo.toml
|
||||
pkg/ # wasm-pack output (generated)
|
||||
tests/
|
||||
web.rs # wasm-bindgen-test browser tests
|
||||
```
|
||||
|
||||
**JavaScript API**:
|
||||
|
||||
```javascript
|
||||
import { QuantumCircuit, QuantumState } from 'ruqu-wasm';
|
||||
|
||||
// Construct circuit
|
||||
const circuit = new QuantumCircuit(4);
|
||||
circuit.h(0);
|
||||
circuit.cnot(0, 1);
|
||||
circuit.cnot(1, 2);
|
||||
circuit.cnot(2, 3);
|
||||
circuit.measureAll();
|
||||
|
||||
// Execute
|
||||
const state = new QuantumState(4);
|
||||
const result = state.execute(circuit);
|
||||
|
||||
// Sample measurement outcomes
|
||||
const counts = state.sample(1024);
|
||||
console.log(counts); // { "0000": 512, "1111": 512 }
|
||||
|
||||
// Get probabilities
|
||||
const probs = state.probabilities();
|
||||
```
|
||||
|
||||
**Memory limit enforcement**:
|
||||
|
||||
```rust
|
||||
const WASM_MAX_QUBITS: usize = 25;
|
||||
const WASM_MAX_STATE_BYTES: usize = 1 << 30; // 1 GB
|
||||
|
||||
pub fn check_wasm_limits(num_qubits: usize) -> Result<(), WasmLimitError> {
|
||||
if num_qubits > WASM_MAX_QUBITS {
|
||||
return Err(WasmLimitError::QubitOverflow {
|
||||
requested: num_qubits,
|
||||
maximum: WASM_MAX_QUBITS,
|
||||
estimated_bytes: 16 * (1usize << num_qubits),
|
||||
});
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### Crate 3: `ruqu-algorithms` -- High-Level Algorithm Implementations
|
||||
|
||||
Quantum algorithm implementations built on top of `ruqu-core`.
|
||||
|
||||
**Responsibilities**:
|
||||
- VQE (Variational Quantum Eigensolver) with classical optimizer integration
|
||||
- Grover's search with oracle construction helpers
|
||||
- QAOA (Quantum Approximate Optimization Algorithm)
|
||||
- Quantum error correction (surface codes, stabilizer codes)
|
||||
- Hamiltonian simulation primitives (Trotterization)
|
||||
|
||||
**Module structure**:
|
||||
|
||||
```
|
||||
ruqu-algorithms/
|
||||
src/
|
||||
lib.rs
|
||||
vqe/
|
||||
mod.rs # VQE orchestration
|
||||
ansatz.rs # Parameterized ansatz circuits (UCCSD, HEA)
|
||||
hamiltonian.rs # Hamiltonian representation and decomposition
|
||||
optimizer.rs # Classical optimizer trait + implementations
|
||||
grover/
|
||||
mod.rs # Grover's algorithm orchestration
|
||||
oracle.rs # Oracle construction utilities
|
||||
diffusion.rs # Diffusion operator
|
||||
qaoa/
|
||||
mod.rs # QAOA orchestration
|
||||
mixer.rs # Mixer Hamiltonian circuits
|
||||
cost.rs # Cost function encoding
|
||||
qec/
|
||||
mod.rs # QEC framework
|
||||
surface.rs # Surface code implementation
|
||||
stabilizer.rs # Stabilizer formalism
|
||||
decoder.rs # Bridge to ruQu's MWPM decoder
|
||||
trotter.rs # Trotterization for Hamiltonian simulation
|
||||
utils.rs # Shared utilities (state preparation, etc.)
|
||||
Cargo.toml
|
||||
```
|
||||
|
||||
**VQE example**:
|
||||
|
||||
```rust
|
||||
use ruqu_core::{QuantumCircuit, QuantumState};
|
||||
use ruqu_algorithms::vqe::{VqeSolver, Hamiltonian, HardwareEfficientAnsatz};
|
||||
|
||||
let hamiltonian = Hamiltonian::from_pauli_sum(&[
|
||||
(0.5, "ZZ", &[0, 1]),
|
||||
(0.3, "X", &[0]),
|
||||
(0.3, "X", &[1]),
|
||||
]);
|
||||
|
||||
let ansatz = HardwareEfficientAnsatz::new(2, depth: 3);
|
||||
|
||||
let solver = VqeSolver::new(hamiltonian, ansatz)
|
||||
.optimizer(NelderMead::default())
|
||||
.max_iterations(200)
|
||||
.convergence_threshold(1e-6);
|
||||
|
||||
let result = solver.solve();
|
||||
println!("Ground state energy: {:.6}", result.energy);
|
||||
```
|
||||
|
||||
### Integration Points
|
||||
|
||||
#### Agent Activation
|
||||
|
||||
Quantum circuits are triggered via the ruVector agent context system. An agent
|
||||
can invoke simulation through graph query extensions:
|
||||
|
||||
```
|
||||
Agent Query: "Simulate VQE for H2 molecule at bond length 0.74 A"
|
||||
|
|
||||
v
|
||||
Agent Framework --> ruqu-algorithms::vqe::VqeSolver
|
||||
| |
|
||||
| +--> ruqu-core (multiple circuit executions)
|
||||
| |
|
||||
|<-- VqeResult ------+
|
||||
|
|
||||
v
|
||||
Agent Response: { energy: -1.137, parameters: [...], iterations: 47 }
|
||||
```
|
||||
|
||||
#### Memory Gating
|
||||
|
||||
Following ruVector's memory discipline (ADR-006):
|
||||
|
||||
- State vectors allocated exclusively within `QuantumState::new()` scope
|
||||
- All amplitudes dropped when `QuantumState` goes out of scope
|
||||
- No lazy or cached allocations persist between simulations
|
||||
- Peak memory tracked and reported via `ruvector-metrics`
|
||||
|
||||
#### Observability
|
||||
|
||||
Every simulation reports metrics through the existing `ruvector-metrics` pipeline:
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `ruqu.simulation.qubits` | Gauge | Number of qubits in current simulation |
|
||||
| `ruqu.simulation.gates` | Counter | Total gates applied |
|
||||
| `ruqu.simulation.depth` | Gauge | Circuit depth after optimization |
|
||||
| `ruqu.simulation.duration_ns` | Histogram | Wall-clock simulation time |
|
||||
| `ruqu.simulation.peak_memory_bytes` | Gauge | Peak memory during simulation |
|
||||
| `ruqu.optimization.gates_eliminated` | Counter | Gates removed by optimization passes |
|
||||
| `ruqu.measurement.shots` | Counter | Total measurement shots taken |
|
||||
|
||||
#### Coherence Bridge
|
||||
|
||||
The existing `ruQu` crate's min-cut analysis and MWPM decoders remain in place
|
||||
and become accessible from `ruqu-algorithms` for quantum error correction:
|
||||
|
||||
```
|
||||
ruqu-algorithms::qec::surface
|
||||
|
|
||||
+-- build syndrome graph
|
||||
|
|
||||
+-- invoke ruQu::mwpm::decode(syndrome)
|
||||
|
|
||||
+-- apply corrections to ruqu-core::QuantumState
|
||||
```
|
||||
|
||||
This avoids duplicating decoding logic and leverages the existing, tested
|
||||
classical infrastructure.
|
||||
|
||||
#### Math Reuse
|
||||
|
||||
`ruqu-core` depends on `ruvector-math` for SIMD-optimized operations:
|
||||
|
||||
- Complex number arithmetic (add, multiply, conjugate) using SIMD lanes
|
||||
- Aligned memory allocation for state vectors
|
||||
- Batch operations on amplitude arrays
|
||||
- Norm calculation for state normalization
|
||||
|
||||
```rust
|
||||
// In ruqu-core, gate application uses ruvector-math SIMD utilities
|
||||
use ruvector_math::simd::{complex_mul_f64x4, complex_add_f64x4};
|
||||
|
||||
fn apply_single_qubit_gate(
|
||||
state: &mut [Complex<f64>],
|
||||
target: usize,
|
||||
matrix: [[Complex<f64>; 2]; 2],
|
||||
) {
|
||||
let step = 1 << target;
|
||||
for block in (0..state.len()).step_by(2 * step) {
|
||||
for i in block..block + step {
|
||||
let (a, b) = (state[i], state[i + step]);
|
||||
state[i] = matrix[0][0] * a + matrix[0][1] * b;
|
||||
state[i + step] = matrix[1][0] * a + matrix[1][1] * b;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
ruqu-algorithms
|
||||
|
|
||||
+---> ruqu-core
|
||||
| |
|
||||
| +---> ruvector-math (SIMD utilities)
|
||||
| +---> ruvector-metrics (optional, behind "metrics" feature)
|
||||
|
|
||||
+---> ruQu (existing, for MWPM decoders in QEC)
|
||||
|
||||
ruqu-wasm
|
||||
|
|
||||
+---> ruqu-core
|
||||
+---> wasm-bindgen
|
||||
+---> wasm-bindgen-rayon (optional, behind "threads" feature)
|
||||
```
|
||||
|
||||
### Workspace Cargo.toml Additions
|
||||
|
||||
```toml
|
||||
[workspace]
|
||||
members = [
|
||||
# ... existing 73+ crates ...
|
||||
"crates/ruqu-core",
|
||||
"crates/ruqu-wasm",
|
||||
"crates/ruqu-algorithms",
|
||||
]
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Clean separation of concerns**: Each crate has a single, well-defined
|
||||
responsibility -- simulation, WASM bindings, and algorithms respectively
|
||||
- **Independent testing**: Each crate can be tested in isolation with its own
|
||||
benchmark suite
|
||||
- **Minimal WASM surface**: `ruqu-wasm` remains a thin wrapper, keeping the
|
||||
compiled `.wasm` module small
|
||||
- **Reuse of infrastructure**: SIMD, metrics, and classical decoders are shared,
|
||||
not duplicated
|
||||
- **Follows workspace conventions**: Same patterns as existing crates, reducing
|
||||
onboarding friction for contributors
|
||||
|
||||
### Negative
|
||||
|
||||
- **Three crates to maintain**: Each requires its own CI, documentation, and
|
||||
version management
|
||||
- **Cross-crate API stabilization**: Changes to `ruqu-core`'s public API affect
|
||||
both `ruqu-wasm` and `ruqu-algorithms`
|
||||
- **Feature flag combinatorics**: Multiple feature flags across three crates
|
||||
create a testing matrix that must be validated
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| API churn in ruqu-core destabilizing dependents | Semver discipline; stabilize core types before 1.0 |
|
||||
| Feature flag combinations causing compilation failures | CI matrix testing all supported flag combinations |
|
||||
| Coherence bridge creating tight coupling with ruQu | Trait-based decoder interface; ruQu dependency optional |
|
||||
| WASM crate size exceeding 2MB target | Regular binary size audits; aggressive dead code elimination |
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-QE-001: Quantum Engine Core Architecture](./ADR-QE-001-quantum-engine-core-architecture.md)
|
||||
- [ADR-QE-003: WASM Compilation Strategy](./ADR-QE-003-wasm-compilation-strategy.md)
|
||||
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
|
||||
- [Workspace Cargo.toml](/Cargo.toml)
|
||||
- [ruvector-router-wasm pattern](/crates/ruvector-router-wasm/)
|
||||
- [ruQu crate](/crates/ruQu/)
|
||||
- [ruvector-math crate](/crates/ruvector-math/)
|
||||
- [ruvector-metrics crate](/crates/ruvector-metrics/)
|
||||
459
docs/adr/quantum-engine/ADR-QE-003-wasm-compilation-strategy.md
Normal file
459
docs/adr/quantum-engine/ADR-QE-003-wasm-compilation-strategy.md
Normal file
|
|
@ -0,0 +1,459 @@
|
|||
# ADR-QE-003: WebAssembly Compilation Strategy
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Context
|
||||
|
||||
### Problem Statement
|
||||
|
||||
ruVector targets browsers, embedded/edge runtimes, and IoT devices via
|
||||
WebAssembly. The quantum simulation engine must compile to
|
||||
`wasm32-unknown-unknown` and run correctly in these constrained environments.
|
||||
WASM introduces fundamental constraints that differ significantly from native
|
||||
execution and must be addressed at the architectural level rather than
|
||||
worked around at runtime.
|
||||
|
||||
### WASM Execution Environment Constraints
|
||||
|
||||
| Constraint | Detail | Impact on Quantum Simulation |
|
||||
|------------|--------|------------------------------|
|
||||
| 32-bit address space | ~4 GB theoretical max, ~2 GB practical | Hard ceiling on state vector size |
|
||||
| Memory model | Linear memory, grows in 64 KB pages | Allocation must be page-aware |
|
||||
| No native threads | Web Workers required for parallelism | Requires SharedArrayBuffer + COOP/COEP headers |
|
||||
| No direct GPU | WebGPU is separate API, not WASM-native | GPU acceleration unavailable in WASM path |
|
||||
| No OS syscalls | Sandboxed execution, no file/network | All I/O must go through host bindings |
|
||||
| JIT compilation | V8/SpiderMonkey JIT, not AOT | ~1.5-3x slower than native, variable warmup |
|
||||
| SIMD support | 128-bit SIMD proposal (widely supported since 2021) | 4 f32 or 2 f64 per vector lane |
|
||||
| Stack size | Default ~1 MB, configurable | Deep recursion limited |
|
||||
|
||||
### Memory Budget Analysis for Quantum Simulation
|
||||
|
||||
The critical constraint is WASM's 32-bit address space. With a practical
|
||||
usable limit of approximately 2 GB (due to browser memory allocation
|
||||
behavior and address space fragmentation), the maximum feasible state vector
|
||||
size is bounded:
|
||||
|
||||
```
|
||||
Available WASM Memory Budget:
|
||||
|
||||
Total addressable: 4,294,967,296 bytes (4 GB theoretical)
|
||||
Practical usable: ~2,147,483,648 bytes (2 GB, browser-dependent)
|
||||
WASM overhead: ~100,000,000 bytes (module, stack, heap metadata)
|
||||
Application overhead: ~50,000,000 bytes (circuit data, scratch buffers)
|
||||
-------------------------------------------------
|
||||
Available for state: ~2,000,000,000 bytes (1.86 GB)
|
||||
|
||||
State vector sizes:
|
||||
24 qubits: 268,435,456 bytes (256 MB) -- comfortable
|
||||
25 qubits: 536,870,912 bytes (512 MB) -- feasible
|
||||
25 + scratch: ~1,073,741,824 bytes -- tight but within budget
|
||||
26 qubits: 1,073,741,824 bytes (1 GB) -- state alone, no scratch room
|
||||
27 qubits: 2,147,483,648 bytes (2 GB) -- exceeds practical limit
|
||||
```
|
||||
|
||||
### Existing WASM Patterns in ruVector
|
||||
|
||||
The `ruvector-router-wasm` crate establishes conventions for WASM compilation:
|
||||
|
||||
- `wasm-pack build` as the compilation tool
|
||||
- `wasm-bindgen` for JavaScript interop
|
||||
- TypeScript definition generation
|
||||
- Feature-flag controlled inclusion/exclusion of capabilities
|
||||
- Dedicated test suites using `wasm-bindgen-test`
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Target and Toolchain
|
||||
|
||||
**Target triple**: `wasm32-unknown-unknown`
|
||||
|
||||
**Build toolchain**: `wasm-pack` with `wasm-bindgen`
|
||||
|
||||
```bash
|
||||
# Development build
|
||||
wasm-pack build crates/ruqu-wasm --target web --dev
|
||||
|
||||
# Release build with size optimization
|
||||
wasm-pack build crates/ruqu-wasm --target web --release
|
||||
|
||||
# Node.js target (for server-side WASM)
|
||||
wasm-pack build crates/ruqu-wasm --target nodejs --release
|
||||
```
|
||||
|
||||
**Cargo profile for WASM release**:
|
||||
|
||||
```toml
|
||||
[profile.wasm-release]
|
||||
inherits = "release"
|
||||
opt-level = "z" # Optimize for binary size
|
||||
lto = true # Link-time optimization
|
||||
codegen-units = 1 # Single codegen unit for maximum optimization
|
||||
strip = true # Strip debug symbols
|
||||
panic = "abort" # Smaller panic handling
|
||||
```
|
||||
|
||||
### 2. Memory Limit Enforcement
|
||||
|
||||
`ruqu-wasm` enforces qubit limits before any allocation occurs. This is a hard
|
||||
gate, not a soft warning.
|
||||
|
||||
**Enforcement strategy**:
|
||||
|
||||
```
|
||||
User requests N qubits
|
||||
|
|
||||
v
|
||||
[N <= 25?] ---NO---> Return WasmLimitError {
|
||||
| requested: N,
|
||||
YES maximum: 25,
|
||||
| estimated_memory: 16 * 2^N,
|
||||
v suggestion: "Use native build for >25 qubits"
|
||||
[Estimate total }
|
||||
memory needed]
|
||||
|
|
||||
v
|
||||
[< 1.5 GB?] ---NO---> Return WasmLimitError::InsufficientMemory
|
||||
|
|
||||
YES
|
||||
|
|
||||
v
|
||||
Proceed with allocation
|
||||
```
|
||||
|
||||
**Qubit limits by precision**:
|
||||
|
||||
| Precision | Max Qubits (WASM) | State Size | With Scratch |
|
||||
|-----------|--------------------|------------|--------------|
|
||||
| Complex f64 (default) | 25 | 512 MB | ~1.07 GB |
|
||||
| Complex f32 (optional) | 26 | 512 MB | ~1.07 GB |
|
||||
|
||||
**Error reporting**:
|
||||
|
||||
```rust
|
||||
#[wasm_bindgen]
|
||||
#[derive(Debug)]
|
||||
pub struct WasmLimitError {
|
||||
pub requested_qubits: usize,
|
||||
pub maximum_qubits: usize,
|
||||
pub estimated_bytes: usize,
|
||||
pub message: String,
|
||||
}
|
||||
|
||||
impl WasmLimitError {
|
||||
pub fn qubit_overflow(requested: usize) -> Self {
|
||||
let max = if cfg!(feature = "f32") { 26 } else { 25 };
|
||||
let bytes_per_amplitude = if cfg!(feature = "f32") { 8 } else { 16 };
|
||||
Self {
|
||||
requested_qubits: requested,
|
||||
maximum_qubits: max,
|
||||
estimated_bytes: bytes_per_amplitude * (1usize << requested),
|
||||
message: format!(
|
||||
"Cannot simulate {} qubits in WASM: requires {} bytes, \
|
||||
exceeds WASM address space. Maximum: {} qubits. \
|
||||
Use native build for larger simulations.",
|
||||
requested,
|
||||
bytes_per_amplitude * (1usize << requested),
|
||||
max
|
||||
),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Threading Strategy
|
||||
|
||||
WASM multi-threading requires SharedArrayBuffer, which in turn requires
|
||||
specific HTTP security headers (Cross-Origin-Opener-Policy and
|
||||
Cross-Origin-Embedder-Policy). Not all deployment environments support these.
|
||||
|
||||
**Strategy**: Optional multi-threading with graceful fallback.
|
||||
|
||||
```
|
||||
ruqu-wasm execution
|
||||
|
|
||||
v
|
||||
[SharedArrayBuffer
|
||||
available?]
|
||||
/ \
|
||||
YES NO
|
||||
/ \
|
||||
[wasm-bindgen-rayon] [single-threaded
|
||||
parallel execution] execution]
|
||||
| |
|
||||
Split state vector Sequential gate
|
||||
across Web Workers application
|
||||
| |
|
||||
v v
|
||||
Fast (N cores) Slower (1 core)
|
||||
```
|
||||
|
||||
**Compile-time configuration**:
|
||||
|
||||
```toml
|
||||
# In ruqu-wasm/Cargo.toml
|
||||
[features]
|
||||
default = []
|
||||
threads = ["wasm-bindgen-rayon", "ruqu-core/parallel"]
|
||||
```
|
||||
|
||||
**Runtime detection**:
|
||||
|
||||
```rust
|
||||
#[wasm_bindgen]
|
||||
pub fn threading_available() -> bool {
|
||||
// Check if SharedArrayBuffer is available in this environment
|
||||
js_sys::eval("typeof SharedArrayBuffer !== 'undefined'")
|
||||
.ok()
|
||||
.and_then(|v| v.as_bool())
|
||||
.unwrap_or(false)
|
||||
}
|
||||
```
|
||||
|
||||
**Required HTTP headers for threading**:
|
||||
|
||||
```
|
||||
Cross-Origin-Opener-Policy: same-origin
|
||||
Cross-Origin-Embedder-Policy: require-corp
|
||||
```
|
||||
|
||||
### 4. SIMD Utilization
|
||||
|
||||
The WASM SIMD proposal (128-bit vectors) is widely supported in modern browsers
|
||||
and runtimes. The quantum engine uses SIMD for amplitude manipulation when
|
||||
available.
|
||||
|
||||
**WASM SIMD capabilities**:
|
||||
|
||||
| Operation | WASM SIMD Instruction | Use in Quantum Sim |
|
||||
|-----------|-----------------------|--------------------|
|
||||
| f64x2 multiply | `f64x2.mul` | Complex multiplication (real part) |
|
||||
| f64x2 add | `f64x2.add` | Amplitude accumulation |
|
||||
| f64x2 sub | `f64x2.sub` | Complex multiplication (cross terms) |
|
||||
| f64x2 shuffle | `i64x2.shuffle` | Swapping real/imaginary parts |
|
||||
| f32x4 multiply | `f32x4.mul` | f32 mode complex multiply |
|
||||
| f32x4 fma | emulated | Fused multiply-add for accuracy |
|
||||
|
||||
**Conditional compilation**:
|
||||
|
||||
```rust
|
||||
// In ruqu-core, WASM SIMD path
|
||||
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
|
||||
mod wasm_simd {
|
||||
use core::arch::wasm32::*;
|
||||
|
||||
/// Apply 2x2 unitary to a pair of amplitudes using WASM SIMD
|
||||
#[inline(always)]
|
||||
pub fn apply_gate_2x2_simd(
|
||||
a_re: f64, a_im: f64,
|
||||
b_re: f64, b_im: f64,
|
||||
u00_re: f64, u00_im: f64,
|
||||
u01_re: f64, u01_im: f64,
|
||||
u10_re: f64, u10_im: f64,
|
||||
u11_re: f64, u11_im: f64,
|
||||
) -> (f64, f64, f64, f64) {
|
||||
// Pack amplitude pair into SIMD lanes
|
||||
let a = f64x2(a_re, a_im);
|
||||
let b = f64x2(b_re, b_im);
|
||||
|
||||
// Complex multiply-accumulate for output amplitudes
|
||||
// c0 = u00*a + u01*b
|
||||
// c1 = u10*a + u11*b
|
||||
// (expanded for complex arithmetic)
|
||||
// ...
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback scalar path
|
||||
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
|
||||
mod scalar {
|
||||
// Pure scalar complex arithmetic
|
||||
}
|
||||
```
|
||||
|
||||
**Comparison of SIMD widths across targets**:
|
||||
|
||||
```
|
||||
Native (AVX-512): 512-bit = 8 f64 = 4 complex f64 per instruction
|
||||
Native (AVX2): 256-bit = 4 f64 = 2 complex f64 per instruction
|
||||
Native (NEON): 128-bit = 2 f64 = 1 complex f64 per instruction
|
||||
WASM SIMD: 128-bit = 2 f64 = 1 complex f64 per instruction
|
||||
```
|
||||
|
||||
WASM SIMD matches ARM NEON width but is slower due to JIT overhead. The engine
|
||||
uses the same algorithmic structure as the NEON path, adapted for WASM SIMD
|
||||
intrinsics.
|
||||
|
||||
### 5. No GPU in WASM
|
||||
|
||||
GPU acceleration is exclusively available in native builds. The WASM path
|
||||
uses CPU-only simulation.
|
||||
|
||||
**Rationale**:
|
||||
- WebGPU is a separate browser API, not accessible from WASM linear memory
|
||||
- Bridging WASM to WebGPU would require complex JavaScript glue code
|
||||
- WebGPU compute shader support varies across browsers
|
||||
- The performance benefit is uncertain for the 25-qubit WASM ceiling
|
||||
|
||||
**Future consideration**: If WebGPU stabilizes and WASM-WebGPU interop matures,
|
||||
a `ruqu-webgpu` crate could provide browser-side GPU acceleration. This is out
|
||||
of scope for the initial release.
|
||||
|
||||
### 6. API Parity
|
||||
|
||||
`ruqu-wasm` exposes an API that is functionally identical to `ruqu-core` native.
|
||||
The same circuit description produces the same measurement results (within
|
||||
floating-point tolerance). Only performance and capacity differ.
|
||||
|
||||
**Parity guarantee**:
|
||||
|
||||
```
|
||||
Same Circuit
|
||||
|
|
||||
+------------+------------+
|
||||
| |
|
||||
ruqu-core (native) ruqu-wasm (browser)
|
||||
| |
|
||||
- 30+ qubits - 25 qubits max
|
||||
- AVX2/AVX-512 SIMD - WASM SIMD128
|
||||
- Rayon threading - Optional Web Workers
|
||||
- Optional GPU - CPU only
|
||||
- ~17.5M gates/sec - ~5-12M gates/sec
|
||||
| |
|
||||
+------------+------------+
|
||||
|
|
||||
Same Results
|
||||
(within fp tolerance)
|
||||
```
|
||||
|
||||
**Verified by**: Shared test suite that runs against both native and WASM targets,
|
||||
comparing outputs bitwise (for deterministic operations) or statistically (for
|
||||
measurement sampling).
|
||||
|
||||
### 7. Module Size Target
|
||||
|
||||
Target `.wasm` binary size: **< 2 MB** for the default feature set.
|
||||
|
||||
**Size budget**:
|
||||
|
||||
| Component | Estimated Size |
|
||||
|-----------|---------------|
|
||||
| Core simulation engine | ~800 KB |
|
||||
| Gate implementations | ~200 KB |
|
||||
| Measurement and sampling | ~100 KB |
|
||||
| wasm-bindgen glue | ~50 KB |
|
||||
| Circuit optimization | ~150 KB |
|
||||
| Error handling and validation | ~50 KB |
|
||||
| **Total (default features)** | **~1.35 MB** |
|
||||
| + noise-model feature | +200 KB |
|
||||
| + tensor-network feature | +400 KB |
|
||||
| **Total (all features)** | **~1.95 MB** |
|
||||
|
||||
**Size reduction techniques**:
|
||||
- `opt-level = "z"` for size-optimized compilation
|
||||
- LTO (Link-Time Optimization) for dead code elimination
|
||||
- `wasm-opt` post-processing pass (binaryen)
|
||||
- Feature flags to exclude unused capabilities
|
||||
- `panic = "abort"` to eliminate unwinding machinery
|
||||
- Avoid `format!` and `std::fmt` where possible in hot paths
|
||||
|
||||
**Build pipeline**:
|
||||
|
||||
```bash
|
||||
# Build with wasm-pack
|
||||
wasm-pack build crates/ruqu-wasm --target web --release
|
||||
|
||||
# Post-process with wasm-opt for additional size reduction
|
||||
wasm-opt -Oz --enable-simd \
|
||||
crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm \
|
||||
-o crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
|
||||
|
||||
# Verify size
|
||||
ls -lh crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
|
||||
# Expected: < 2 MB
|
||||
```
|
||||
|
||||
### 8. Future: wasm64 (Memory64 Proposal)
|
||||
|
||||
The WebAssembly Memory64 proposal extends the address space to 64 bits,
|
||||
removing the 4 GB limitation. When this proposal reaches broad runtime support:
|
||||
|
||||
- Recompile `ruqu-wasm` targeting `wasm64-unknown-unknown`
|
||||
- Lift the 25-qubit ceiling to match native limits
|
||||
- Maintain backward compatibility with wasm32 via conditional compilation
|
||||
|
||||
**Current status**: Memory64 is at Phase 4 (standardized) in the WASM
|
||||
specification process. Browser support is emerging but not yet universal.
|
||||
|
||||
**Migration path**:
|
||||
|
||||
```toml
|
||||
# Future Cargo.toml
|
||||
[features]
|
||||
wasm64 = [] # Enable when targeting wasm64
|
||||
|
||||
# In code
|
||||
#[cfg(feature = "wasm64")]
|
||||
const MAX_QUBITS_WASM: usize = 30;
|
||||
|
||||
#[cfg(not(feature = "wasm64"))]
|
||||
const MAX_QUBITS_WASM: usize = 25;
|
||||
```
|
||||
|
||||
## Trade-offs Accepted
|
||||
|
||||
| Trade-off | Accepted Limitation | Justification |
|
||||
|-----------|---------------------|---------------|
|
||||
| Performance | ~1.5-3x slower than native | Universal deployment outweighs raw speed |
|
||||
| Qubit ceiling | 25 qubits in WASM vs 30+ native | Sufficient for most educational and research workloads |
|
||||
| Threading | Requires specific browser headers | Graceful fallback ensures always-works baseline |
|
||||
| No GPU | CPU-only in browser | GPU simulation at 25 qubits shows minimal benefit |
|
||||
| Binary size | ~1.35 MB module | Acceptable for a quantum simulation library |
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Universal deployment**: Any modern browser or WASM runtime can execute
|
||||
quantum simulations without installation
|
||||
- **Security sandboxing**: WASM's memory isolation prevents quantum simulation
|
||||
code from accessing host resources
|
||||
- **Edge-aligned**: Matches ruVector's philosophy of computation at the edge
|
||||
- **Testable**: WASM builds can be tested in CI via headless browsers and
|
||||
wasm-bindgen-test
|
||||
- **Progressive enhancement**: Single-threaded baseline with optional threading
|
||||
ensures broad compatibility
|
||||
|
||||
### Negative
|
||||
|
||||
- **Performance ceiling**: JIT overhead and narrower SIMD limit throughput
|
||||
- **Memory limits**: 25-qubit hard ceiling until wasm64 adoption
|
||||
- **Threading complexity**: SharedArrayBuffer requirement adds deployment
|
||||
configuration burden
|
||||
- **Debugging difficulty**: WASM debugging tools are less mature than native
|
||||
debuggers
|
||||
|
||||
### Mitigations
|
||||
|
||||
| Issue | Mitigation |
|
||||
|-------|------------|
|
||||
| Performance gap | Document native vs WASM trade-offs; recommend native for >20 qubits |
|
||||
| Memory exhaustion | Hard limit enforcement with informative error messages |
|
||||
| Threading failures | Automatic fallback to single-threaded; no silent degradation |
|
||||
| Debug difficulty | Source maps via wasm-pack; comprehensive logging to console |
|
||||
| Binary size creep | CI size gate: fail build if .wasm exceeds 2 MB |
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-QE-001: Quantum Engine Core Architecture](./ADR-QE-001-quantum-engine-core-architecture.md)
|
||||
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
|
||||
- [ADR-QE-004: Performance Optimization & Benchmarks](./ADR-QE-004-performance-optimization-benchmarks.md)
|
||||
- [ADR-005: WASM Runtime Integration](/docs/adr/ADR-005-wasm-runtime-integration.md)
|
||||
- [ruvector-router-wasm crate](/crates/ruvector-router-wasm/)
|
||||
- [WebAssembly SIMD Proposal](https://github.com/WebAssembly/simd)
|
||||
- [WebAssembly Memory64 Proposal](https://github.com/WebAssembly/memory64)
|
||||
- [wasm-bindgen-rayon](https://github.com/RReverser/wasm-bindgen-rayon)
|
||||
- [Cross-Origin Isolation Guide (MDN)](https://developer.mozilla.org/en-US/docs/Web/API/crossOriginIsolated)
|
||||
|
|
@ -0,0 +1,564 @@
|
|||
# ADR-QE-004: Performance Optimization & Benchmarks
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Context
|
||||
|
||||
### Problem Statement
|
||||
|
||||
Quantum state-vector simulation is computationally expensive. Every gate
|
||||
application touches the full amplitude vector of 2^n complex numbers, making
|
||||
gate application O(2^n) per gate for n qubits. For the quantum engine to be
|
||||
practical on edge devices and in browser environments, it must achieve
|
||||
competitive performance: millions of gates per second for small circuits,
|
||||
interactive latency for 10-20 qubit workloads, and the ability to handle
|
||||
moderately deep circuits (thousands of gates) without unacceptable delays.
|
||||
|
||||
### Computational Cost Model
|
||||
|
||||
For a circuit with n qubits, g gates, and s measurement shots:
|
||||
|
||||
```
|
||||
Total operations (approximate):
|
||||
|
||||
Single-qubit gate: 2^n complex multiplications + 2^n complex additions
|
||||
Two-qubit gate: 2^(n+1) complex multiplications + 2^(n+1) complex additions
|
||||
Measurement (1 shot): 2^n probability calculations + sampling
|
||||
Full circuit: sum_i(cost(gate_i)) + s * 2^n
|
||||
|
||||
Example: 20-qubit circuit, 500 gates, 1024 shots
|
||||
Gate cost: 500 * 2^20 * ~4 FLOP = ~2.1 billion FLOP
|
||||
Measure: 1024 * 2^20 * ~2 FLOP = ~2.1 billion FLOP
|
||||
Total: ~4.2 billion FLOP
|
||||
```
|
||||
|
||||
At 10 GFLOP/s (realistic single-core throughput), this is ~420 ms. With SIMD
|
||||
and multi-threading, we target 10-50x improvement.
|
||||
|
||||
### Performance Baseline from Comparable Systems
|
||||
|
||||
| Simulator | Language | 20-qubit H gate | Notes |
|
||||
|-----------|----------|-----------------|-------|
|
||||
| Qiskit Aer | C++/Python | ~50 ns | Heavily optimized, OpenMP |
|
||||
| Cirq | Python/C++ | ~200 ns | Google, less optimized |
|
||||
| QuantRS2 | Rust | ~57 ns | Rust-native, AVX2 |
|
||||
| Quest | C | ~40 ns | GPU-capable, highly tuned |
|
||||
| Target (ruQu) | Rust | < 60 ns | Competitive with QuantRS2 |
|
||||
|
||||
These benchmarks measure per-gate time on a single-qubit Hadamard applied to
|
||||
a 20-qubit state vector. Our target is to match or beat QuantRS2, the closest
|
||||
comparable pure-Rust implementation.
|
||||
|
||||
## Decision
|
||||
|
||||
Implement a **multi-layered optimization strategy** with six complementary
|
||||
techniques, each addressing a different performance bottleneck.
|
||||
|
||||
### Layer 1: SIMD Operations
|
||||
|
||||
Use `ruvector-math` SIMD utilities to vectorize amplitude manipulation.
|
||||
Gate application fundamentally involves applying a 2x2 or 4x4 unitary matrix
|
||||
to pairs/quadruples of complex amplitudes. SIMD processes multiple amplitude
|
||||
components simultaneously.
|
||||
|
||||
**Native SIMD dispatch**:
|
||||
|
||||
```
|
||||
Architecture Instruction Set Complex f64 per Cycle
|
||||
----------- --------------- ---------------------
|
||||
x86_64 AVX-512 4 (512-bit / 128-bit per complex)
|
||||
x86_64 AVX2 2 (256-bit / 128-bit per complex)
|
||||
ARM64 NEON 1 (128-bit / 128-bit per complex)
|
||||
WASM SIMD128 1 (128-bit / 128-bit per complex)
|
||||
Fallback Scalar 1 (sequential)
|
||||
```
|
||||
|
||||
**Single-qubit gate application with AVX2**:
|
||||
|
||||
```
|
||||
For each pair of amplitudes (a[i], a[i + 2^target]):
|
||||
|
||||
Load: a_re, a_im = load_f64x4([a[i].re, a[i].im, a[i+step].re, a[i+step].im])
|
||||
|
||||
Compute c0 = u00 * a + u01 * b:
|
||||
mul_re = u00_re * a_re - u00_im * a_im + u01_re * b_re - u01_im * b_im
|
||||
mul_im = u00_re * a_im + u00_im * a_re + u01_re * b_im + u01_im * b_re
|
||||
|
||||
Compute c1 = u10 * a + u11 * b:
|
||||
(analogous)
|
||||
|
||||
Store: [c0.re, c0.im, c1.re, c1.im]
|
||||
```
|
||||
|
||||
With AVX2 (256-bit), we process 2 complex f64 values per instruction,
|
||||
yielding a theoretical 2x speedup over scalar. With AVX-512, this doubles to
|
||||
4x. Practical speedup is 1.5-3.5x due to instruction latency and memory
|
||||
bandwidth.
|
||||
|
||||
**Target per-gate throughput**:
|
||||
|
||||
| Qubits | Amplitudes | AVX2 (est.) | AVX-512 (est.) | WASM SIMD (est.) |
|
||||
|--------|------------|-------------|----------------|-------------------|
|
||||
| 10 | 1,024 | ~15 ns | ~10 ns | ~30 ns |
|
||||
| 15 | 32,768 | ~1 us | ~0.5 us | ~2 us |
|
||||
| 20 | 1,048,576 | ~50 us | ~25 us | ~100 us |
|
||||
| 25 | 33,554,432 | ~1.5 ms | ~0.8 ms | ~3 ms |
|
||||
|
||||
### Layer 2: Multithreading
|
||||
|
||||
Rayon-based data parallelism splits the state vector across CPU cores for
|
||||
gate application. Each thread processes an independent contiguous block of
|
||||
amplitudes.
|
||||
|
||||
**Parallelization strategy**:
|
||||
|
||||
```
|
||||
State vector: [amp_0, amp_1, ..., amp_{2^n - 1}]
|
||||
|
||||
Thread 0: [amp_0 ... amp_{2^n/T - 1}]
|
||||
Thread 1: [amp_{2^n/T} ... amp_{2*2^n/T - 1}]
|
||||
...
|
||||
Thread T-1:[amp_{(T-1)*2^n/T} ... amp_{2^n - 1}]
|
||||
|
||||
Where T = number of threads (Rayon work-stealing pool)
|
||||
```
|
||||
|
||||
**Gate application requires care with target qubit position**:
|
||||
|
||||
- If `target < log2(chunk_size)`: each chunk contains complete amplitude pairs.
|
||||
Threads are fully independent. No synchronization needed.
|
||||
- If `target >= log2(chunk_size)`: amplitude pairs span chunk boundaries.
|
||||
Must adjust chunk boundaries to align with gate structure.
|
||||
|
||||
**Expected scaling**:
|
||||
|
||||
```
|
||||
Qubits Amps 1 thread 8 threads Speedup
|
||||
------ ---- -------- --------- -------
|
||||
15 32K 1 us ~200 ns ~5x
|
||||
20 1M 50 us ~8 us ~6x
|
||||
22 4M 200 us ~30 us ~6.5x
|
||||
24 16M 800 us ~120 us ~6.7x
|
||||
25 32M 1.5 ms ~220 us ~6.8x
|
||||
```
|
||||
|
||||
Speedup plateaus below linear (8x for 8 threads) due to memory bandwidth
|
||||
saturation. At 24+ qubits, the state vector exceeds L3 cache and performance
|
||||
becomes memory-bound.
|
||||
|
||||
**Parallelism threshold**: Do not parallelize below 14 qubits (16K amplitudes).
|
||||
The overhead of Rayon's work-stealing exceeds the benefit for small states.
|
||||
|
||||
### Layer 3: Gate Fusion
|
||||
|
||||
Preprocess circuits to combine consecutive gates into single matrix
|
||||
operations, reducing the number of state vector passes.
|
||||
|
||||
**Fusion rules**:
|
||||
|
||||
```
|
||||
Rule 1: Consecutive single-qubit gates on the same qubit
|
||||
Rz(a) -> Rx(b) -> Rz(c) ==> U3(a, b, c) [single matrix multiply]
|
||||
|
||||
Rule 2: Consecutive two-qubit gates on the same pair
|
||||
CNOT(0,1) -> CZ(0,1) ==> Fused_2Q(0,1) [4x4 matrix]
|
||||
|
||||
Rule 3: Single-qubit gate followed by controlled gate
|
||||
H(0) -> CNOT(0,1) ==> Fused operation (absorb H into CNOT matrix)
|
||||
|
||||
Rule 4: Identity cancellation
|
||||
H -> H ==> Identity (remove both)
|
||||
X -> X ==> Identity
|
||||
S -> S_dag ==> Identity
|
||||
CNOT -> CNOT (same control/target) ==> Identity
|
||||
```
|
||||
|
||||
**Fusion effectiveness by algorithm**:
|
||||
|
||||
| Algorithm | Typical Fusion Ratio | Gate Reduction |
|
||||
|-----------|----------------------|----------------|
|
||||
| VQE (UCCSD ansatz) | 1.8-2.5x | 30-50% fewer state passes |
|
||||
| Grover's | 1.2-1.5x | 15-25% |
|
||||
| QAOA | 1.5-2.0x | 25-40% |
|
||||
| QFT | 2.0-3.0x | 40-60% |
|
||||
| Random circuit | 1.1-1.3x | 5-15% |
|
||||
|
||||
**Implementation**:
|
||||
|
||||
```rust
|
||||
pub struct FusionPass;
|
||||
|
||||
impl CircuitOptimizer for FusionPass {
|
||||
fn optimize(&self, circuit: &mut QuantumCircuit) {
|
||||
let mut i = 0;
|
||||
while i < circuit.gates.len() - 1 {
|
||||
let current = &circuit.gates[i];
|
||||
let next = &circuit.gates[i + 1];
|
||||
|
||||
if can_fuse(current, next) {
|
||||
let fused = compute_fused_matrix(current, next);
|
||||
circuit.gates[i] = fused;
|
||||
circuit.gates.remove(i + 1);
|
||||
// Don't advance i; check if we can fuse again
|
||||
} else {
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Layer 4: Entanglement-Aware Splitting
|
||||
|
||||
Track which qubits have interacted via entangling gates. Simulate independent
|
||||
qubit subsets as separate, smaller state vectors. Merge subsets when an
|
||||
entangling gate connects them.
|
||||
|
||||
**Concept**:
|
||||
|
||||
```
|
||||
Circuit: q0 --[H]--[CNOT(0,1)]--[Rz]--
|
||||
q1 --[H]--[CNOT(0,1)]--[Ry]--
|
||||
q2 --[H]--[X]---------[Rz]---[CNOT(2,0)]--
|
||||
q3 --[H]--[Y]---------[Rx]--
|
||||
|
||||
Initially: {q0}, {q1}, {q2}, {q3} -- four 2^1 vectors (2 amps each)
|
||||
After CNOT(0,1): {q0,q1}, {q2}, {q3} -- one 2^2 + two 2^1 vectors
|
||||
After CNOT(2,0): {q0,q1,q2}, {q3} -- one 2^3 + one 2^1 vector
|
||||
|
||||
Memory: 8 + 2 = 10 amplitudes vs 2^4 = 16 amplitudes (full)
|
||||
```
|
||||
|
||||
**Savings scale dramatically for circuits with late entanglement**:
|
||||
|
||||
```
|
||||
Scenario: 20-qubit circuit, first 100 gates are local, then entangling
|
||||
|
||||
Without splitting: 2^20 = 1M amplitudes from gate 1
|
||||
With splitting: 20 * 2^1 = 40 amplitudes until first entangling gate
|
||||
Progressively merge as entanglement grows
|
||||
```
|
||||
|
||||
**Data structure**:
|
||||
|
||||
```rust
|
||||
pub struct SplitState {
|
||||
/// Each subset: (qubit indices, state vector)
|
||||
subsets: Vec<(Vec<usize>, QuantumState)>,
|
||||
/// Union-Find structure for tracking connectivity
|
||||
connectivity: UnionFind,
|
||||
}
|
||||
|
||||
impl SplitState {
|
||||
pub fn apply_gate(&mut self, gate: &Gate, targets: &[usize]) {
|
||||
if gate.is_entangling() {
|
||||
// Merge subsets containing target qubits
|
||||
let merged = self.merge_subsets(targets);
|
||||
// Apply gate to merged state
|
||||
merged.apply_gate(gate, targets);
|
||||
} else {
|
||||
// Apply to the subset containing the target qubit
|
||||
let subset = self.find_subset(targets[0]);
|
||||
subset.apply_gate(gate, targets);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**When splitting helps vs. hurts**:
|
||||
|
||||
| Circuit Type | Splitting Benefit |
|
||||
|-------------|-------------------|
|
||||
| Shallow QAOA (p=1-3) | High (qubits entangle gradually) |
|
||||
| VQE with local ansatz | High (many local rotations) |
|
||||
| Grover's (full oracle) | Low (oracle entangles all qubits early) |
|
||||
| QFT | Low (all-to-all entanglement) |
|
||||
| Random circuits | Low (entangles quickly) |
|
||||
|
||||
The engine automatically disables splitting when all qubits are connected,
|
||||
falling back to full state-vector simulation with zero overhead.
|
||||
|
||||
### Layer 5: Cache-Local Processing
|
||||
|
||||
For large state vectors (>20 qubits), cache utilization becomes critical.
|
||||
The state vector exceeds L2 cache (typically 256 KB - 1 MB) and potentially
|
||||
L3 cache (8-32 MB).
|
||||
|
||||
**Cache analysis**:
|
||||
|
||||
```
|
||||
Qubits State Size L2 (512KB) L3 (16MB)
|
||||
------ ---------- ---------- ---------
|
||||
18 4 MB 8x oversize in cache
|
||||
20 16 MB 32x in cache
|
||||
22 64 MB 128x 4x oversize
|
||||
24 256 MB 512x 16x oversize
|
||||
25 512 MB 1024x 32x oversize
|
||||
```
|
||||
|
||||
**Techniques**:
|
||||
|
||||
1. **Aligned allocation**: State vector aligned to cache line boundaries (64
|
||||
bytes) for optimal prefetch behavior. Uses `ruvector-math` aligned allocator.
|
||||
|
||||
2. **Blocking/tiling**: For gates on high-index qubits, the stride between
|
||||
amplitude pairs is large (2^target). Tiling the access pattern to process
|
||||
cache-line-sized blocks sequentially improves spatial locality.
|
||||
|
||||
```
|
||||
Without tiling (target qubit = 20):
|
||||
Access pattern: amp[0], amp[1M], amp[1], amp[1M+1], ...
|
||||
Cache misses: ~every access (stride = 16 MB)
|
||||
|
||||
With tiling (block size = L2/4):
|
||||
Process block [0..64K], then [64K..128K], ...
|
||||
Cache misses: ~1 per block (sequential within block)
|
||||
```
|
||||
|
||||
3. **Prefetch hints**: Insert software prefetch instructions for the next block
|
||||
of amplitudes while processing the current block.
|
||||
|
||||
```rust
|
||||
// Prefetch next cache line while processing current
|
||||
#[cfg(target_arch = "x86_64")]
|
||||
unsafe {
|
||||
core::arch::x86_64::_mm_prefetch(
|
||||
state.as_ptr().add(i + CACHE_LINE_AMPS) as *const i8,
|
||||
core::arch::x86_64::_MM_HINT_T0,
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
### Layer 6: Lazy Evaluation
|
||||
|
||||
Accumulate commuting rotations and defer their application until a
|
||||
non-commuting gate appears. This reduces the number of full state-vector
|
||||
passes for rotation-heavy circuits common in variational algorithms.
|
||||
|
||||
**Commutation rules**:
|
||||
|
||||
```
|
||||
Rz(a) commutes with Rz(b) => Rz(a+b)
|
||||
Rx(a) commutes with Rx(b) => Rx(a+b)
|
||||
Rz commutes with CZ => Defer Rz
|
||||
Diagonal gates commute => Combine phases
|
||||
|
||||
But:
|
||||
Rz does NOT commute with H
|
||||
Rx does NOT commute with CNOT (on target)
|
||||
```
|
||||
|
||||
**Implementation sketch**:
|
||||
|
||||
```rust
|
||||
pub struct LazyAccumulator {
|
||||
/// Pending rotations per qubit: (axis, total_angle)
|
||||
pending: HashMap<usize, Vec<(RotationAxis, f64)>>,
|
||||
}
|
||||
|
||||
impl LazyAccumulator {
|
||||
pub fn push_gate(&mut self, gate: &Gate, target: usize) -> Option<FlushedGate> {
|
||||
if let Some(rotation) = gate.as_rotation() {
|
||||
if let Some(existing) = self.pending.get_mut(&target) {
|
||||
if existing.last().map_or(false, |(axis, _)| *axis == rotation.axis) {
|
||||
// Same axis: accumulate angle
|
||||
existing.last_mut().unwrap().1 += rotation.angle;
|
||||
return None; // No gate emitted
|
||||
}
|
||||
}
|
||||
self.pending.entry(target).or_default().push((rotation.axis, rotation.angle));
|
||||
None
|
||||
} else {
|
||||
// Non-commuting gate: flush pending rotations for affected qubits
|
||||
let flushed = self.flush(target);
|
||||
Some(flushed)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Effectiveness**: VQE circuits with alternating Rz-Rx-Rz layers see 20-40%
|
||||
reduction in state-vector passes. QAOA circuits with repeated ZZ-rotation
|
||||
layers see 15-30% reduction.
|
||||
|
||||
## Benchmark Targets
|
||||
|
||||
### Primary Benchmark Suite
|
||||
|
||||
| ID | Workload | Qubits | Gates | Target Time | Notes |
|
||||
|----|----------|--------|-------|-------------|-------|
|
||||
| B1 | Grover (8 qubits) | 8 | ~200 | < 1 ms | 3 Grover iterations |
|
||||
| B2 | Grover (16 qubits) | 16 | ~3,000 | < 10 ms | ~64 iterations |
|
||||
| B3 | VQE iteration (12 qubits) | 12 | ~120 | < 5 ms | Single parameter update |
|
||||
| B4 | VQE iteration (20 qubits) | 20 | ~300 | < 50 ms | UCCSD ansatz |
|
||||
| B5 | QAOA p=3 (10 nodes) | 10 | ~75 | < 1 ms | MaxCut on random graph |
|
||||
| B6 | QAOA p=5 (20 nodes) | 20 | ~200 | < 200 ms | MaxCut on random graph |
|
||||
| B7 | Surface code cycle (d=3) | 17 | ~20 | < 10 ms | Single syndrome round |
|
||||
| B8 | 1000 surface code cycles | 17 | ~20,000 | < 2 s | Repeated error correction |
|
||||
| B9 | QFT (20 qubits) | 20 | ~210 | < 30 ms | Full quantum Fourier transform |
|
||||
| B10 | Random circuit (25 qubits) | 25 | 100 | < 10 s | Worst-case memory test |
|
||||
|
||||
### Micro-Benchmarks
|
||||
|
||||
Per-gate timing for individual operations:
|
||||
|
||||
| Gate | 10 qubits | 15 qubits | 20 qubits | 25 qubits |
|
||||
|------|-----------|-----------|-----------|-----------|
|
||||
| H | < 20 ns | < 0.5 us | < 50 us | < 1.5 ms |
|
||||
| CNOT | < 30 ns | < 1 us | < 80 us | < 2.5 ms |
|
||||
| Rz(theta) | < 15 ns | < 0.4 us | < 40 us | < 1.2 ms |
|
||||
| Toffoli | < 50 ns | < 1.5 us | < 120 us | < 4 ms |
|
||||
| Measure | < 10 ns | < 0.3 us | < 30 us | < 1 ms |
|
||||
|
||||
### WASM-Specific Benchmarks
|
||||
|
||||
| ID | Workload | Qubits | Target (WASM) | Target (Native) | Expected Ratio |
|
||||
|----|----------|--------|---------------|-----------------|----------------|
|
||||
| W1 | Grover (8) | 8 | < 3 ms | < 1 ms | ~3x |
|
||||
| W2 | VQE iter (12) | 12 | < 12 ms | < 5 ms | ~2.5x |
|
||||
| W3 | QAOA p=3 (10) | 10 | < 2.5 ms | < 1 ms | ~2.5x |
|
||||
| W4 | Random (20) | 20 | < 500 ms | < 200 ms | ~2.5x |
|
||||
| W5 | Random (25) | 25 | < 25 s | < 10 s | ~2.5x |
|
||||
|
||||
### Benchmark Infrastructure
|
||||
|
||||
Benchmarks use Criterion.rs for native and a custom timing harness for WASM:
|
||||
|
||||
```rust
|
||||
// Native benchmarks (Criterion)
|
||||
use criterion::{criterion_group, criterion_main, Criterion};
|
||||
|
||||
fn bench_grover_8(c: &mut Criterion) {
|
||||
c.bench_function("grover_8_qubits", |b| {
|
||||
b.iter(|| {
|
||||
let mut state = QuantumState::new(8).unwrap();
|
||||
let circuit = grover_circuit(8, &target_state);
|
||||
state.execute(&circuit)
|
||||
})
|
||||
});
|
||||
}
|
||||
|
||||
fn bench_single_gate_scaling(c: &mut Criterion) {
|
||||
let mut group = c.benchmark_group("hadamard_scaling");
|
||||
for n in [10, 12, 14, 16, 18, 20, 22, 24] {
|
||||
group.bench_with_input(
|
||||
BenchmarkId::from_parameter(n),
|
||||
&n,
|
||||
|b, &n| {
|
||||
let mut state = QuantumState::new(n).unwrap();
|
||||
let mut circuit = QuantumCircuit::new(n).unwrap();
|
||||
circuit.gate(Gate::H, &[0]);
|
||||
b.iter(|| state.execute(&circuit))
|
||||
},
|
||||
);
|
||||
}
|
||||
group.finish();
|
||||
}
|
||||
|
||||
criterion_group!(benches, bench_grover_8, bench_single_gate_scaling);
|
||||
criterion_main!(benches);
|
||||
```
|
||||
|
||||
**WASM benchmark harness**:
|
||||
|
||||
```javascript
|
||||
// Browser-based benchmark using performance.now()
|
||||
async function benchmarkGrover8() {
|
||||
const { QuantumCircuit, QuantumState } = await import('./ruqu_wasm.js');
|
||||
|
||||
const iterations = 100;
|
||||
const start = performance.now();
|
||||
|
||||
for (let i = 0; i < iterations; i++) {
|
||||
const circuit = QuantumCircuit.grover(8, 42);
|
||||
const state = new QuantumState(8);
|
||||
state.execute(circuit);
|
||||
state.free();
|
||||
circuit.free();
|
||||
}
|
||||
|
||||
const elapsed = performance.now() - start;
|
||||
console.log(`Grover 8-qubit: ${(elapsed / iterations).toFixed(3)} ms/iteration`);
|
||||
}
|
||||
```
|
||||
|
||||
### Performance Regression Detection
|
||||
|
||||
CI runs benchmark suite on every PR. Regressions exceeding 10% trigger a
|
||||
warning; regressions exceeding 25% block the merge.
|
||||
|
||||
```yaml
|
||||
# In CI pipeline
|
||||
- name: Run benchmarks
|
||||
run: |
|
||||
cargo bench --package ruqu-core -- --save-baseline pr
|
||||
cargo bench --package ruqu-core -- --baseline main --load-baseline pr
|
||||
# critcmp compares and flags regressions
|
||||
critcmp main pr --threshold 10
|
||||
```
|
||||
|
||||
### Optimization Priority Matrix
|
||||
|
||||
Not all optimizations apply equally to all workloads. The priority matrix
|
||||
guides implementation order:
|
||||
|
||||
| Optimization | Impact (small circuits) | Impact (large circuits) | Impl Effort | Priority |
|
||||
|-------------|------------------------|------------------------|-------------|----------|
|
||||
| SIMD | Medium (1.5-2x) | High (2-3.5x) | Medium | P0 |
|
||||
| Multithreading | Low (overhead > benefit) | High (5-7x) | Medium | P1 |
|
||||
| Gate fusion | High (30-50% fewer passes) | Medium (15-30%) | Low | P0 |
|
||||
| Entanglement splitting | Variable (0-100x) | Low (quickly entangled) | High | P2 |
|
||||
| Cache tiling | Low (fits in cache) | High (2-4x) | Medium | P1 |
|
||||
| Lazy evaluation | Medium (20-40%) | Low (10-20%) | Low | P2 |
|
||||
|
||||
**Implementation order**: SIMD -> Gate Fusion -> Multithreading -> Cache Tiling
|
||||
-> Lazy Evaluation -> Entanglement Splitting
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Competitive performance**: Multi-layered approach targets performance
|
||||
parity with state-of-the-art Rust simulators (QuantRS2)
|
||||
- **Interactive latency**: Most practical workloads (8-20 qubits) complete
|
||||
in single-digit milliseconds, enabling real-time experimentation
|
||||
- **Scalable**: Each optimization layer addresses a different bottleneck,
|
||||
providing compounding benefits
|
||||
- **Measurable**: Concrete benchmark targets enable objective progress tracking
|
||||
and regression detection
|
||||
|
||||
### Negative
|
||||
|
||||
- **Optimization complexity**: Six optimization layers create significant
|
||||
implementation and maintenance complexity
|
||||
- **Ongoing tuning**: Performance characteristics vary across hardware;
|
||||
benchmarks must cover representative platforms
|
||||
- **Diminishing returns**: For >20 qubits, memory bandwidth dominates and
|
||||
compute optimizations yield marginal gains
|
||||
- **Testing burden**: Each optimization must be validated for numerical
|
||||
correctness across all gate types
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|------|-----------|--------|------------|
|
||||
| Memory bandwidth bottleneck at >20 qubits | High | Medium | Document expected scaling; recommend native for large circuits |
|
||||
| Gate fusion introducing numerical error | Low | High | Comprehensive numerical tests comparing fused vs. unfused results |
|
||||
| Entanglement tracking overhead exceeding savings | Medium | Low | Automatic disable when all qubits connected within first 10 gates |
|
||||
| WASM SIMD not available in target runtime | Low | Medium | Graceful fallback to scalar; runtime feature detection |
|
||||
| Benchmark targets too aggressive for edge hardware | Medium | Low | Separate targets for edge (Cognitum) vs. desktop; scale expectations |
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-QE-001: Quantum Engine Core Architecture](./ADR-QE-001-quantum-engine-core-architecture.md)
|
||||
- [ADR-QE-002: Crate Structure & Integration](./ADR-QE-002-crate-structure-integration.md)
|
||||
- [ADR-QE-003: WASM Compilation Strategy](./ADR-QE-003-wasm-compilation-strategy.md)
|
||||
- [ADR-003: SIMD Optimization Strategy](/docs/adr/ADR-003-simd-optimization-strategy.md)
|
||||
- [ruvector-math crate](/crates/ruvector-math/)
|
||||
- Guerreschi & Hogaboam, "Intel Quantum Simulator: A cloud-ready high-performance
|
||||
simulator of quantum circuits" (2020)
|
||||
- Jones et al., "QuEST and High Performance Simulation of Quantum Computers" (2019)
|
||||
- QuantRS2 benchmark data (internal comparison)
|
||||
650
docs/adr/quantum-engine/ADR-QE-005-vqe-algorithm-support.md
Normal file
650
docs/adr/quantum-engine/ADR-QE-005-vqe-algorithm-support.md
Normal file
|
|
@ -0,0 +1,650 @@
|
|||
# ADR-QE-005: Variational Quantum Eigensolver (VQE) Support
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 0.1 | 2026-02-06 | ruv.io | Initial VQE architecture proposal |
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
### The Variational Quantum Eigensolver Problem
|
||||
|
||||
The Variational Quantum Eigensolver (VQE) is one of the most important near-term quantum
|
||||
algorithms, with direct applications in computational chemistry, materials science, and
|
||||
combinatorial optimization. VQE computes ground-state energies of molecular Hamiltonians
|
||||
by variationally minimizing the expectation value of a Hamiltonian operator with respect
|
||||
to a parameterized quantum state (ansatz).
|
||||
|
||||
### Why VQE Matters for ruQu
|
||||
|
||||
VQE sits at the intersection of quantum simulation and classical optimization, making it
|
||||
a natural fit for ruQu's hybrid classical-quantum architecture:
|
||||
|
||||
1. **Chemistry applications**: Drug discovery, catalyst design, battery materials
|
||||
2. **Optimization**: QUBO problems, portfolio optimization, logistics
|
||||
3. **Benchmarking**: VQE circuits exercise the full gate set and serve as a representative
|
||||
workload for evaluating simulator performance
|
||||
4. **Agent integration**: ruVector agents can autonomously explore chemical configuration
|
||||
spaces using VQE as the inner evaluation kernel
|
||||
|
||||
### Core Requirements
|
||||
|
||||
| Requirement | Description | Priority |
|
||||
|-------------|-------------|----------|
|
||||
| Parameterized circuits | Symbolic gate angles resolved at evaluation time | P0 |
|
||||
| Hamiltonian decomposition | Represent H as sum of weighted Pauli strings | P0 |
|
||||
| Exact expectation values | Direct state vector computation (no shot noise) | P0 |
|
||||
| Gradient evaluation | Parameter-shift rule for classical optimizer | P0 |
|
||||
| Shot-based sampling | Optional mode for hardware noise emulation | P1 |
|
||||
| Classical optimizer interface | Trait-based abstraction for multiple optimizers | P1 |
|
||||
| Hardware-efficient ansatz | Pre-built ansatz library for common topologies | P2 |
|
||||
|
||||
### Current Limitations
|
||||
|
||||
Without dedicated VQE support, users must manually:
|
||||
- Construct parameterized circuits with explicit angle substitution per iteration
|
||||
- Decompose Hamiltonians into individual Pauli measurements
|
||||
- Implement gradient computation by duplicating circuit evaluations
|
||||
- Wire up classical optimizers with no standard interface
|
||||
|
||||
This is error-prone and leaves significant performance on the table, since a state vector
|
||||
simulator can compute exact expectation values in a single pass without sampling overhead.
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Parameterized Gate Architecture
|
||||
|
||||
Circuits accept symbolic parameters that are resolved to numeric values per evaluation.
|
||||
This avoids circuit reconstruction on each VQE iteration.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ Parameterized Circuit │
|
||||
│ │
|
||||
│ ┌─────┐ ┌──────────┐ ┌─────┐ ┌──────────┐ │
|
||||
|0> ─────────┤ │ H ├──┤ Ry(θ[0]) ├──┤ CX ├──┤ Rz(θ[2]) ├──┤───
|
||||
│ └─────┘ └──────────┘ └──┬──┘ └──────────┘ │
|
||||
│ │ │
|
||||
|0> ─────────┤──────────────────────────────●───── Ry(θ[1]) ────┤───
|
||||
│ │
|
||||
└──────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
parameters: [θ[0], θ[1], θ[2]]
|
||||
values: [0.54, 1.23, -0.87]
|
||||
```
|
||||
|
||||
**Data model**:
|
||||
|
||||
```rust
|
||||
/// A symbolic parameter in a quantum circuit.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
|
||||
pub struct Parameter {
|
||||
pub name: String,
|
||||
pub index: usize,
|
||||
}
|
||||
|
||||
/// A gate that may reference symbolic parameters.
|
||||
pub enum ParameterizedGate {
|
||||
/// Fixed gate (no parameters)
|
||||
Fixed(Gate),
|
||||
/// Rotation gate with a symbolic angle
|
||||
Rx(ParameterExpr),
|
||||
Ry(ParameterExpr),
|
||||
Rz(ParameterExpr),
|
||||
/// Parameterized two-qubit gate
|
||||
Rzz(ParameterExpr, Qubit, Qubit),
|
||||
}
|
||||
|
||||
/// Expression for a gate parameter (supports linear combinations).
|
||||
pub enum ParameterExpr {
|
||||
/// Direct parameter reference: θ[i]
|
||||
Param(usize),
|
||||
/// Scaled parameter: c * θ[i]
|
||||
Scaled(f64, usize),
|
||||
/// Sum of expressions
|
||||
Sum(Box<ParameterExpr>, Box<ParameterExpr>),
|
||||
/// Constant value
|
||||
Constant(f64),
|
||||
}
|
||||
```
|
||||
|
||||
**Resolution**: When `evaluate(params: &[f64])` is called, each `ParameterExpr` is resolved
|
||||
to a concrete `f64`, and the corresponding unitary matrix is computed. This happens once per
|
||||
VQE iteration and is negligible compared to state vector manipulation.
|
||||
|
||||
### 2. Hamiltonian Representation
|
||||
|
||||
The Hamiltonian is represented as a sum of weighted Pauli strings:
|
||||
|
||||
```
|
||||
H = c_0 * I + c_1 * Z_0 + c_2 * Z_1 + c_3 * Z_0 Z_1 + c_4 * X_0 X_1 + ...
|
||||
```
|
||||
|
||||
where each term is a tensor product of single-qubit Pauli operators {I, X, Y, Z}.
|
||||
|
||||
```rust
|
||||
/// A single Pauli operator on one qubit.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum Pauli {
|
||||
I,
|
||||
X,
|
||||
Y,
|
||||
Z,
|
||||
}
|
||||
|
||||
/// A Pauli string: tensor product of single-qubit Paulis.
|
||||
/// Stored as a compact bitfield for n-qubit systems.
|
||||
///
|
||||
/// Encoding: 2 bits per qubit (00=I, 01=X, 10=Y, 11=Z)
|
||||
/// For n <= 32 qubits, fits in a single u64.
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
|
||||
pub struct PauliString {
|
||||
/// Packed Pauli operators (2 bits each)
|
||||
pub ops: Vec<u64>,
|
||||
/// Number of qubits
|
||||
pub n_qubits: usize,
|
||||
}
|
||||
|
||||
/// A Hamiltonian as a sum of weighted Pauli strings.
|
||||
///
|
||||
/// H = sum_j c_j P_j
|
||||
pub struct PauliSum {
|
||||
/// Terms: (coefficient, Pauli string)
|
||||
pub terms: Vec<(Complex64, PauliString)>,
|
||||
/// Number of qubits
|
||||
pub n_qubits: usize,
|
||||
}
|
||||
```
|
||||
|
||||
**Optimization**: Identity terms (all-I Pauli strings) contribute a constant energy offset
|
||||
and require no state vector computation. The implementation detects and separates these
|
||||
before the expectation loop.
|
||||
|
||||
### 3. Direct Expectation Value Computation
|
||||
|
||||
This is the critical performance advantage of state vector simulation over real hardware.
|
||||
On physical quantum computers, expectation values must be estimated via repeated
|
||||
measurement (shot-based sampling), requiring O(1/epsilon^2) shots for epsilon precision.
|
||||
|
||||
In a state vector simulator, we compute the **exact** expectation value:
|
||||
|
||||
```
|
||||
<psi| H |psi> = sum_j c_j * <psi| P_j |psi>
|
||||
```
|
||||
|
||||
For each Pauli string P_j, the expectation value is:
|
||||
|
||||
```
|
||||
<psi| P_j |psi> = sum_k psi_k* (P_j |psi>)_k
|
||||
```
|
||||
|
||||
Since P_j is a tensor product of single-qubit Paulis, its action on a basis state |k> is:
|
||||
- I: |k> -> |k>
|
||||
- X: flips qubit, no phase
|
||||
- Y: flips qubit, phase factor +/- i
|
||||
- Z: no flip, phase factor +/- 1
|
||||
|
||||
This means each Pauli string maps each basis state to exactly one other basis state with
|
||||
a phase factor. The expectation value reduces to a sum over 2^n amplitudes.
|
||||
|
||||
```rust
|
||||
impl QuantumState {
|
||||
/// Compute the exact expectation value of a PauliSum.
|
||||
///
|
||||
/// Complexity: O(T * 2^n) where T = number of Pauli terms, n = qubits.
|
||||
/// For a 12-qubit system with 100 Pauli terms:
|
||||
/// 100 * 4096 = 409,600 operations ~ 0.5ms
|
||||
pub fn expectation(&self, hamiltonian: &PauliSum) -> f64 {
|
||||
let mut total = 0.0_f64;
|
||||
|
||||
for (coeff, pauli) in &hamiltonian.terms {
|
||||
let mut term_val = Complex64::zero();
|
||||
|
||||
for k in 0..self.amplitudes.len() {
|
||||
// Compute P_j |k>: determine target index and phase
|
||||
let (target_idx, phase) = pauli.apply_to_basis(k);
|
||||
// <k| P_j |psi> = phase * psi[target_idx]
|
||||
// Accumulate psi[k]* * phase * psi[target_idx]
|
||||
term_val += self.amplitudes[k].conj()
|
||||
* phase
|
||||
* self.amplitudes[target_idx];
|
||||
}
|
||||
|
||||
total += (coeff * term_val).re;
|
||||
}
|
||||
|
||||
total
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Function signature**: `QuantumState::expectation(PauliSum) -> f64`
|
||||
|
||||
#### Accuracy Advantage Over Sampling
|
||||
|
||||
| Method | Precision | Evaluations | 12-qubit Cost |
|
||||
|--------|-----------|-------------|---------------|
|
||||
| Shot-based (1000 shots) | ~3% | 1000 circuit runs per term | ~500ms |
|
||||
| Shot-based (10000 shots) | ~1% | 10000 circuit runs per term | ~5s |
|
||||
| Shot-based (1M shots) | ~0.1% | 1M circuit runs per term | ~500s |
|
||||
| **Exact (state vector)** | **Machine epsilon** | **1 pass over state** | **~0.5ms** |
|
||||
|
||||
For VQE convergence, exact expectation values eliminate the statistical noise floor that
|
||||
plagues hardware-based VQE. Classical optimizers receive clean gradients, leading to:
|
||||
- Faster convergence (fewer iterations)
|
||||
- No barren plateau artifacts from shot noise
|
||||
- Deterministic reproducibility
|
||||
|
||||
### 4. Gradient Support via Parameter-Shift Rule
|
||||
|
||||
The parameter-shift rule provides exact analytic gradients for parameterized quantum gates.
|
||||
For a gate with parameter theta:
|
||||
|
||||
```
|
||||
d/d(theta) <H> = [<H>(theta + pi/2) - <H>(theta - pi/2)] / 2
|
||||
```
|
||||
|
||||
This requires two circuit evaluations per parameter per gradient component.
|
||||
|
||||
```rust
|
||||
/// Compute the gradient of the expectation value with respect to all parameters.
|
||||
///
|
||||
/// Uses the parameter-shift rule:
|
||||
/// grad_i = [E(theta_i + pi/2) - E(theta_i - pi/2)] / 2
|
||||
///
|
||||
/// Complexity: O(2 * n_params * circuit_eval_cost)
|
||||
/// For 12 qubits, 20 parameters, 100 Pauli terms:
|
||||
/// 2 * 20 * (circuit_sim + expectation) ~ 40 * 1ms = 40ms
|
||||
pub fn gradient(
|
||||
circuit: &ParameterizedCircuit,
|
||||
hamiltonian: &PauliSum,
|
||||
params: &[f64],
|
||||
) -> Vec<f64> {
|
||||
let n_params = params.len();
|
||||
let mut grad = vec![0.0; n_params];
|
||||
let shift = std::f64::consts::FRAC_PI_2; // pi/2
|
||||
|
||||
for i in 0..n_params {
|
||||
// Forward shift
|
||||
let mut params_plus = params.to_vec();
|
||||
params_plus[i] += shift;
|
||||
let e_plus = evaluate_energy(circuit, hamiltonian, ¶ms_plus);
|
||||
|
||||
// Backward shift
|
||||
let mut params_minus = params.to_vec();
|
||||
params_minus[i] -= shift;
|
||||
let e_minus = evaluate_energy(circuit, hamiltonian, ¶ms_minus);
|
||||
|
||||
grad[i] = (e_plus - e_minus) / 2.0;
|
||||
}
|
||||
|
||||
grad
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Classical Optimizer Interface
|
||||
|
||||
A trait-based abstraction supports plugging in different classical optimizers without
|
||||
changing the VQE loop:
|
||||
|
||||
```rust
|
||||
/// Trait for classical optimizers used in the VQE outer loop.
|
||||
pub trait ClassicalOptimizer: Send {
|
||||
/// Initialize the optimizer with the parameter count.
|
||||
fn initialize(&mut self, n_params: usize);
|
||||
|
||||
/// Propose next parameter values given current energy and optional gradient.
|
||||
fn step(
|
||||
&mut self,
|
||||
params: &[f64],
|
||||
energy: f64,
|
||||
gradient: Option<&[f64]>,
|
||||
) -> OptimizerResult;
|
||||
|
||||
/// Check if the optimizer has converged.
|
||||
fn has_converged(&self) -> bool;
|
||||
|
||||
/// Get optimizer name for logging.
|
||||
fn name(&self) -> &str;
|
||||
}
|
||||
|
||||
/// Result of an optimizer step.
|
||||
pub struct OptimizerResult {
|
||||
pub new_params: Vec<f64>,
|
||||
pub converged: bool,
|
||||
pub iteration: usize,
|
||||
}
|
||||
```
|
||||
|
||||
**Provided implementations**:
|
||||
|
||||
| Optimizer | Type | Gradient Required | Best For |
|
||||
|-----------|------|-------------------|----------|
|
||||
| `GradientDescent` | Gradient-based | Yes | Simple landscapes |
|
||||
| `Adam` | Adaptive gradient | Yes | Noisy gradients, deep circuits |
|
||||
| `LBFGS` | Quasi-Newton | Yes | Smooth landscapes, fast convergence |
|
||||
| `COBYLA` | Derivative-free | No | Non-differentiable cost functions |
|
||||
| `NelderMead` | Simplex | No | Low-dimensional problems |
|
||||
| `SPSA` | Stochastic | No | Shot-based mode, noisy evaluations |
|
||||
|
||||
### 6. VQE Iteration Loop
|
||||
|
||||
The complete VQE algorithm proceeds as follows:
|
||||
|
||||
```
|
||||
VQE Iteration Loop
|
||||
==================
|
||||
|
||||
Input: Hamiltonian H (PauliSum), Ansatz A (ParameterizedCircuit),
|
||||
Optimizer O (ClassicalOptimizer), initial params theta_0
|
||||
|
||||
Output: Minimum energy E_min, optimal params theta_opt
|
||||
|
||||
theta = theta_0
|
||||
O.initialize(len(theta))
|
||||
|
||||
repeat:
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ 1. PREPARE STATE │
|
||||
│ |psi(theta)> = A(theta) |0...0> │
|
||||
│ [Simulate parameterized circuit] │
|
||||
│ Cost: O(G * 2^n) where G = gate count │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ 2. EVALUATE ENERGY │
|
||||
│ E = <psi(theta)| H |psi(theta)> │
|
||||
│ [Direct state vector expectation] │
|
||||
│ Cost: O(T * 2^n) where T = Pauli terms │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ 3. COMPUTE GRADIENT (if optimizer needs it) │
|
||||
│ grad = parameter_shift(A, H, theta) │
|
||||
│ [2 * n_params circuit evaluations] │
|
||||
│ Cost: O(2P * (G + T) * 2^n) │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ 4. CLASSICAL UPDATE │
|
||||
│ theta_new = O.step(theta, E, grad) │
|
||||
│ [Pure classical computation] │
|
||||
│ Cost: O(P^2) for quasi-Newton │
|
||||
└─────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ 5. CONVERGENCE CHECK │
|
||||
│ if |E_new - E_old| < tol: STOP │
|
||||
│ else: theta = theta_new, continue │
|
||||
└─────────────────────────────────────────────┘
|
||||
|
||||
return (E_min, theta_opt)
|
||||
```
|
||||
|
||||
**Pseudocode**:
|
||||
|
||||
```rust
|
||||
pub fn vqe(
|
||||
ansatz: &ParameterizedCircuit,
|
||||
hamiltonian: &PauliSum,
|
||||
optimizer: &mut dyn ClassicalOptimizer,
|
||||
config: &VqeConfig,
|
||||
) -> VqeResult {
|
||||
let n_params = ansatz.parameter_count();
|
||||
let mut params = config.initial_params.clone()
|
||||
.unwrap_or_else(|| vec![0.0; n_params]);
|
||||
|
||||
optimizer.initialize(n_params);
|
||||
|
||||
let mut best_energy = f64::INFINITY;
|
||||
let mut best_params = params.clone();
|
||||
let mut history = Vec::new();
|
||||
|
||||
for iteration in 0..config.max_iterations {
|
||||
// Step 1+2: Simulate circuit and compute energy
|
||||
let state = ansatz.simulate(¶ms);
|
||||
let energy = state.expectation(hamiltonian);
|
||||
|
||||
// Track best
|
||||
if energy < best_energy {
|
||||
best_energy = energy;
|
||||
best_params = params.clone();
|
||||
}
|
||||
|
||||
// Step 3: Compute gradient if needed
|
||||
let grad = if optimizer.needs_gradient() {
|
||||
Some(gradient(ansatz, hamiltonian, ¶ms))
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
history.push(VqeIteration { iteration, energy, params: params.clone() });
|
||||
|
||||
// Step 4: Classical update
|
||||
let result = optimizer.step(¶ms, energy, grad.as_deref());
|
||||
params = result.new_params;
|
||||
|
||||
// Step 5: Convergence check
|
||||
if result.converged || (iteration > 0 &&
|
||||
(history[iteration].energy - history[iteration - 1].energy).abs()
|
||||
< config.convergence_threshold) {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
VqeResult {
|
||||
energy: best_energy,
|
||||
optimal_params: best_params,
|
||||
iterations: history.len(),
|
||||
history,
|
||||
converged: optimizer.has_converged(),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7. Optional Shot-Based Sampling Mode
|
||||
|
||||
For mimicking real hardware behavior and testing noise resilience:
|
||||
|
||||
```rust
|
||||
/// Configuration for shot-based VQE mode.
|
||||
pub struct ShotConfig {
|
||||
/// Number of measurement shots per expectation estimation
|
||||
pub shots: usize,
|
||||
/// Random seed for reproducibility
|
||||
pub seed: Option<u64>,
|
||||
/// Readout error rate (probability of bit flip on measurement)
|
||||
pub readout_error: f64,
|
||||
}
|
||||
|
||||
impl QuantumState {
|
||||
/// Estimate expectation value via shot-based sampling.
|
||||
///
|
||||
/// Samples the state `shots` times in the computational basis,
|
||||
/// then computes the empirical expectation of each Pauli term.
|
||||
pub fn expectation_sampled(
|
||||
&self,
|
||||
hamiltonian: &PauliSum,
|
||||
config: &ShotConfig,
|
||||
) -> (f64, f64) {
|
||||
// Returns (mean, standard_error)
|
||||
// Standard error = std_dev / sqrt(shots)
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8. Hardware-Efficient Ansatz Patterns
|
||||
|
||||
Pre-built ansatz constructors for common use cases:
|
||||
|
||||
```
|
||||
Hardware-Efficient Ansatz (depth d, n qubits):
|
||||
|
||||
Layer 1..d:
|
||||
┌─────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
┤ Ry ├──┤ Rz ├──┤ CNOT ├──┤ Ry ├──
|
||||
└─────┘ └──────────┘ │ ladder │ └──────────┘
|
||||
┌─────┐ ┌──────────┐ │ │ ┌──────────┐
|
||||
┤ Ry ├──┤ Rz ├──┤ ├──┤ Ry ├──
|
||||
└─────┘ └──────────┘ └──────────┘ └──────────┘
|
||||
|
||||
Parameters per layer: 3n (Ry + Rz + Ry per qubit)
|
||||
Total parameters: 3nd
|
||||
```
|
||||
|
||||
```rust
|
||||
/// Pre-built ansatz constructors.
|
||||
pub mod ansatz {
|
||||
/// Hardware-efficient ansatz with Ry-Rz layers and linear CNOT entanglement.
|
||||
pub fn hardware_efficient(n_qubits: usize, depth: usize) -> ParameterizedCircuit;
|
||||
|
||||
/// UCCSD (Unitary Coupled Cluster Singles and Doubles) for chemistry.
|
||||
/// Generates excitation operators based on active space.
|
||||
pub fn uccsd(n_electrons: usize, n_orbitals: usize) -> ParameterizedCircuit;
|
||||
|
||||
/// Hamiltonian variational ansatz: layers of exp(-i * theta_j * P_j)
|
||||
/// for each term P_j in the Hamiltonian.
|
||||
pub fn hamiltonian_variational(
|
||||
hamiltonian: &PauliSum,
|
||||
depth: usize,
|
||||
) -> ParameterizedCircuit;
|
||||
|
||||
/// Symmetry-preserving ansatz that respects particle number conservation.
|
||||
pub fn symmetry_preserving(
|
||||
n_qubits: usize,
|
||||
n_particles: usize,
|
||||
depth: usize,
|
||||
) -> ParameterizedCircuit;
|
||||
}
|
||||
```
|
||||
|
||||
### 9. Performance Analysis
|
||||
|
||||
#### 12-Qubit VQE Performance Estimate
|
||||
|
||||
| Component | Operations | Time |
|
||||
|-----------|-----------|------|
|
||||
| State vector size | 2^12 = 4,096 complex amplitudes | 64 KB |
|
||||
| Circuit simulation (50 gates) | 50 * 4096 = 204,800 ops | ~0.3ms |
|
||||
| Expectation (100 Pauli terms) | 100 * 4096 = 409,600 ops | ~0.5ms |
|
||||
| Gradient (20 params) | 40 * (0.3 + 0.5) ms | ~32ms |
|
||||
| Classical optimizer step | O(20^2) | ~0.001ms |
|
||||
| **Total per iteration (with gradient)** | | **~33ms** |
|
||||
| **Total per iteration (no gradient)** | | **~0.8ms** |
|
||||
|
||||
For gradient-free optimizers (COBYLA, Nelder-Mead), a 12-qubit VQE iteration completes
|
||||
in under 1ms. With parameter-shift gradients, the cost scales linearly with parameter
|
||||
count but remains under 50ms for typical chemistry ansatze.
|
||||
|
||||
**Scaling with qubit count**:
|
||||
|
||||
| Qubits | State Size | Memory | Energy Eval (100 terms) | Gradient (20 params) |
|
||||
|--------|-----------|--------|------------------------|---------------------|
|
||||
| 8 | 256 | 4 KB | ~0.03ms | ~2ms |
|
||||
| 12 | 4,096 | 64 KB | ~0.5ms | ~33ms |
|
||||
| 16 | 65,536 | 1 MB | ~8ms | ~500ms |
|
||||
| 20 | 1,048,576 | 16 MB | ~130ms | ~8s |
|
||||
| 24 | 16,777,216 | 256 MB | ~2s | ~130s |
|
||||
| 28 | 268,435,456 | 4 GB | ~33s | ~35min |
|
||||
|
||||
### 10. Integration with ruVector Agent System
|
||||
|
||||
ruVector agents can drive autonomous chemistry optimization using VQE as the evaluation
|
||||
kernel:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ ruVector Agent Orchestration │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌────────────────────┐ │
|
||||
│ │ Research │───>│ Architecture │───>│ Chemistry Agent │ │
|
||||
│ │ Agent │ │ Agent │ │ │ │
|
||||
│ │ │ │ │ │ - Molecule spec │ │
|
||||
│ │ Literature│ │ Hamiltonian │ │ - Basis set sel. │ │
|
||||
│ │ search │ │ generation │ │ - Active space │ │
|
||||
│ └──────────┘ └──────────────┘ │ - VQE execution │ │
|
||||
│ │ - Result analysis │ │
|
||||
│ └────────┬───────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────▼───────────┐ │
|
||||
│ │ ruQu VQE Engine │ │
|
||||
│ │ │ │
|
||||
│ │ Parameterized │ │
|
||||
│ │ Circuit + PauliSum│ │
|
||||
│ │ + Optimizer │ │
|
||||
│ └────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
The agent workflow:
|
||||
1. **Research agent** retrieves molecular structure and prior computational results
|
||||
2. **Architecture agent** generates the qubit Hamiltonian (Jordan-Wigner or Bravyi-Kitaev
|
||||
transformation from fermionic operators)
|
||||
3. **Chemistry agent** selects ansatz, optimizer, and runs VQE iterations
|
||||
4. **Results** are stored in ruVector memory for pattern learning across molecules
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **Exact expectation values** eliminate sampling noise, enabling faster convergence and
|
||||
deterministic reproducibility -- a major advantage over hardware VQE
|
||||
2. **Symbolic parameterization** avoids circuit reconstruction overhead, reducing per-iteration
|
||||
cost to pure state manipulation
|
||||
3. **Trait-based optimizer interface** allows users to swap optimizers without touching VQE
|
||||
logic, and supports custom optimizer implementations
|
||||
4. **Hardware-efficient ansatz library** provides tested, production-quality circuit templates
|
||||
for common use cases
|
||||
5. **Gradient support** via parameter-shift rule enables modern gradient-based optimization
|
||||
(Adam, L-BFGS) that converges significantly faster than derivative-free methods
|
||||
6. **Agent integration** enables autonomous, memory-enhanced chemistry exploration that
|
||||
learns from prior VQE runs across molecular configurations
|
||||
|
||||
### Risks
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| Exponential memory scaling limits qubit count | High | Medium | Tensor network backend for >30 qubits (future ADR) |
|
||||
| Parameter-shift gradient cost scales with parameter count | Medium | Medium | Batched gradient evaluation, simultaneous perturbation (SPSA) fallback |
|
||||
| Hamiltonian term count explosion for large molecules | Medium | High | Pauli grouping (qubit-wise commuting), measurement reduction techniques |
|
||||
| Optimizer convergence to local minima | Medium | Medium | Multi-start strategies, QAOA-inspired initialization |
|
||||
|
||||
### Trade-offs
|
||||
|
||||
| Decision | Advantage | Disadvantage |
|
||||
|----------|-----------|--------------|
|
||||
| Exact expectation over sampling | Machine-precision accuracy | Not representative of real hardware noise |
|
||||
| Parameter-shift over finite-difference | Exact gradients | 2x evaluations per parameter |
|
||||
| Trait-based optimizer | Extensible | Slight abstraction overhead |
|
||||
| Compact PauliString bitfield | Cache-friendly | Complex bit manipulation logic |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Peruzzo, A. et al. "A variational eigenvalue solver on a photonic quantum processor." Nature Communications 5, 4213 (2014)
|
||||
- McClean, J.R. et al. "The theory of variational hybrid quantum-classical algorithms." New Journal of Physics 18, 023023 (2016)
|
||||
- Kandala, A. et al. "Hardware-efficient variational quantum eigensolver for small molecules." Nature 549, 242-246 (2017)
|
||||
- Schuld, M. et al. "Evaluating analytic gradients on quantum hardware." Physical Review A 99, 032331 (2019)
|
||||
- ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
|
||||
- ADR-QE-001 through ADR-QE-004: Prior quantum engine architecture decisions
|
||||
- ruQu crate: `crates/ruQu/src/` - existing syndrome processing and coherence gate infrastructure
|
||||
- ruVector memory system: pattern storage for cross-molecule VQE learning
|
||||
|
|
@ -0,0 +1,562 @@
|
|||
# ADR-QE-006: Grover's Search Algorithm Implementation
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 0.1 | 2026-02-06 | ruv.io | Initial Grover's search architecture proposal |
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
### Unstructured Search and Quadratic Speedup
|
||||
|
||||
Grover's algorithm is one of the foundational quantum algorithms, providing a provable
|
||||
quadratic speedup for unstructured search. Given a search space of N = 2^n items and an
|
||||
oracle that marks one or more target items, Grover's algorithm finds a target in
|
||||
O(sqrt(N)) oracle queries, compared to the classical O(N) lower bound.
|
||||
|
||||
### Building Blocks
|
||||
|
||||
The algorithm consists of two principal components applied repeatedly:
|
||||
|
||||
1. **Oracle (O)**: Flips the phase of marked (target) states
|
||||
- On hardware: requires multi-controlled-Z decomposition into elementary gates
|
||||
- In simulation: can be a single O(1) amplitude flip (key insight)
|
||||
|
||||
2. **Diffuser (D)**: Inversion about the mean amplitude (also called the Grover diffusion
|
||||
operator)
|
||||
- D = 2|s><s| - I, where |s> is the uniform superposition
|
||||
- Implemented as: H^{otimes n} * (2|0><0| - I) * H^{otimes n}
|
||||
|
||||
### Why Simulation Unlocks a Unique Optimization
|
||||
|
||||
On real quantum hardware, the oracle must be decomposed into a circuit of elementary
|
||||
gates. For a single marked state in n qubits, the oracle requires O(n) multi-controlled
|
||||
gates, each of which may need further decomposition. The full gate count is O(n^2) or
|
||||
worse depending on connectivity.
|
||||
|
||||
In a state vector simulator, we have **direct access to the amplitude array**. The oracle
|
||||
for a known marked state at index t is simply:
|
||||
|
||||
```
|
||||
amplitudes[t] *= -1
|
||||
```
|
||||
|
||||
This is an O(1) operation, regardless of qubit count. This fundamentally changes the
|
||||
performance profile of Grover simulation.
|
||||
|
||||
### Applications in ruVector
|
||||
|
||||
| Application | Description |
|
||||
|-------------|-------------|
|
||||
| Vector DB search | Encode HNSW candidate filtering as a Grover oracle |
|
||||
| SAT solving | Map boolean satisfiability to oracle function |
|
||||
| Cryptographic analysis | Brute-force key search with quadratic speedup |
|
||||
| Database queries | Unstructured search over ruVector memory entries |
|
||||
| Algorithm benchmarking | Reference implementation for quantum advantage studies |
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Oracle Implementation Strategy
|
||||
|
||||
We provide two oracle modes: optimized index-based for known targets, and general
|
||||
unitary oracle for black-box functions.
|
||||
|
||||
#### Mode A: Index-Based Oracle (O(1) per application)
|
||||
|
||||
When the target index is known (or the oracle can be expressed as a predicate on
|
||||
basis state indices), we bypass gate decomposition entirely:
|
||||
|
||||
```rust
|
||||
impl QuantumState {
|
||||
/// Apply Grover oracle by direct amplitude negation.
|
||||
///
|
||||
/// Flips the sign of amplitude at the given index.
|
||||
/// This is an O(1) operation -- the key simulation advantage.
|
||||
///
|
||||
/// On hardware, this would require O(n) multi-controlled gates
|
||||
/// decomposed into O(n^2) elementary gates.
|
||||
#[inline]
|
||||
pub fn oracle_flip(&mut self, target_index: usize) {
|
||||
debug_assert!(target_index < self.amplitudes.len());
|
||||
self.amplitudes[target_index] = -self.amplitudes[target_index];
|
||||
}
|
||||
|
||||
/// Apply Grover oracle for multiple marked states.
|
||||
///
|
||||
/// Complexity: O(k) where k = number of marked states.
|
||||
/// Hardware equivalent: O(k * n^2) gates.
|
||||
pub fn oracle_flip_multi(&mut self, target_indices: &[usize]) {
|
||||
for &idx in target_indices {
|
||||
debug_assert!(idx < self.amplitudes.len());
|
||||
self.amplitudes[idx] = -self.amplitudes[idx];
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Why this is valid**: The oracle operator O is defined as the diagonal unitary
|
||||
O = I - 2|t><t|, which maps |t> to -|t> and leaves all other basis states unchanged.
|
||||
In the amplitude array, this is exactly `amplitudes[t] *= -1`. No physical gate
|
||||
decomposition is needed because we are simulating the mathematical operator directly.
|
||||
|
||||
#### Mode B: General Unitary Oracle
|
||||
|
||||
For black-box oracle functions where the marked states are not known in advance:
|
||||
|
||||
```rust
|
||||
/// A general oracle as a unitary operation on the state vector.
|
||||
///
|
||||
/// The oracle function receives a basis state index and returns
|
||||
/// true if it should be marked (phase-flipped).
|
||||
pub trait GroverOracle: Send {
|
||||
/// Evaluate whether basis state |index> is a target.
|
||||
fn is_marked(&self, index: usize, n_qubits: usize) -> bool;
|
||||
}
|
||||
|
||||
impl QuantumState {
|
||||
/// Apply a general Grover oracle.
|
||||
///
|
||||
/// Iterates over all 2^n amplitudes, evaluating the oracle predicate.
|
||||
/// Complexity: O(2^n) per application (equivalent to hardware cost).
|
||||
pub fn oracle_apply(&mut self, oracle: &dyn GroverOracle) {
|
||||
let n_qubits = self.n_qubits;
|
||||
for i in 0..self.amplitudes.len() {
|
||||
if oracle.is_marked(i, n_qubits) {
|
||||
self.amplitudes[i] = -self.amplitudes[i];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Diffuser Implementation
|
||||
|
||||
The Grover diffuser (inversion about the mean) is decomposed as:
|
||||
|
||||
```
|
||||
D = H^{otimes n} * phase_flip(|0>) * H^{otimes n}
|
||||
```
|
||||
|
||||
where `phase_flip(|0>)` flips the sign of the all-zeros state: (2|0><0| - I).
|
||||
|
||||
```
|
||||
Diffuser Circuit Decomposition:
|
||||
|
||||
|psi> ──[H]──[phase_flip(0)]──[H]──
|
||||
|
||||
Expanded:
|
||||
|
||||
┌───┐ ┌──────────────┐ ┌───┐
|
||||
q[0] ──┤ H ├───┤ ├───┤ H ├──
|
||||
└───┘ │ │ └───┘
|
||||
┌───┐ │ 2|0><0| - I │ ┌───┐
|
||||
q[1] ──┤ H ├───┤ ├───┤ H ├──
|
||||
└───┘ │ │ └───┘
|
||||
┌───┐ │ │ ┌───┐
|
||||
q[2] ──┤ H ├───┤ ├───┤ H ├──
|
||||
└───┘ └──────────────┘ └───┘
|
||||
```
|
||||
|
||||
Both the H^{otimes n} layers and the phase_flip(0) benefit from simulation optimizations:
|
||||
|
||||
```rust
|
||||
impl QuantumState {
|
||||
/// Apply Hadamard to all qubits.
|
||||
///
|
||||
/// Optimized implementation using butterfly structure.
|
||||
/// Complexity: O(n * 2^n)
|
||||
pub fn hadamard_all(&mut self) {
|
||||
for qubit in 0..self.n_qubits {
|
||||
self.apply_hadamard(qubit);
|
||||
}
|
||||
}
|
||||
|
||||
/// Flip the phase of the |0...0> state.
|
||||
///
|
||||
/// O(1) operation via direct indexing -- another simulation advantage.
|
||||
/// On hardware, this requires an n-controlled-Z gate.
|
||||
#[inline]
|
||||
pub fn phase_flip_zero(&mut self) {
|
||||
// |0...0> is at index 0
|
||||
self.amplitudes[0] = -self.amplitudes[0];
|
||||
}
|
||||
|
||||
/// Apply the full Grover diffuser.
|
||||
///
|
||||
/// D = H^n * (2|0><0| - I) * H^n
|
||||
///
|
||||
/// Implementation note: (2|0><0| - I) negates all states except |0>,
|
||||
/// which is equivalent to a global phase of -1 followed by
|
||||
/// flipping amplitude[0]. We use the phase_flip_zero + global negate
|
||||
/// approach for efficiency.
|
||||
pub fn grover_diffuser(&mut self) {
|
||||
self.hadamard_all();
|
||||
|
||||
// Apply 2|0><0| - I:
|
||||
// Negate all amplitudes, then flip sign of |0> again
|
||||
// This gives: amp[0] -> amp[0], amp[k] -> -amp[k] for k != 0
|
||||
for amp in self.amplitudes.iter_mut() {
|
||||
*amp = -*amp;
|
||||
}
|
||||
self.amplitudes[0] = -self.amplitudes[0];
|
||||
|
||||
self.hadamard_all();
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Optimal Iteration Count
|
||||
|
||||
The optimal number of Grover iterations for k marked states out of N = 2^n total:
|
||||
|
||||
```
|
||||
iterations = floor(pi/4 * sqrt(N/k))
|
||||
```
|
||||
|
||||
For a single marked state (k=1):
|
||||
|
||||
| Qubits (n) | N = 2^n | Optimal Iterations | Classical Steps |
|
||||
|------------|---------|-------------------|----------------|
|
||||
| 4 | 16 | 3 | 16 |
|
||||
| 8 | 256 | 12 | 256 |
|
||||
| 12 | 4,096 | 50 | 4,096 |
|
||||
| 16 | 65,536 | 201 | 65,536 |
|
||||
| 20 | 1,048,576 | 804 | 1,048,576 |
|
||||
|
||||
```rust
|
||||
/// Compute the optimal number of Grover iterations.
|
||||
///
|
||||
/// For k marked states in a search space of 2^n:
|
||||
/// iterations = floor(pi/4 * sqrt(2^n / k))
|
||||
pub fn optimal_iterations(n_qubits: usize, n_marked: usize) -> usize {
|
||||
let n = (1_usize << n_qubits) as f64;
|
||||
let k = n_marked as f64;
|
||||
(std::f64::consts::FRAC_PI_4 * (n / k).sqrt()).floor() as usize
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Complete Grover Algorithm
|
||||
|
||||
```rust
|
||||
/// Configuration for Grover's search.
|
||||
pub struct GroverConfig {
|
||||
/// Number of qubits
|
||||
pub n_qubits: usize,
|
||||
/// Target indices (for index-based oracle)
|
||||
pub targets: Vec<usize>,
|
||||
/// Custom oracle (overrides targets if set)
|
||||
pub oracle: Option<Box<dyn GroverOracle>>,
|
||||
/// Override iteration count (auto-computed if None)
|
||||
pub iterations: Option<usize>,
|
||||
/// Number of measurement shots (for probabilistic result)
|
||||
pub shots: usize,
|
||||
}
|
||||
|
||||
/// Result of Grover's search.
|
||||
pub struct GroverResult {
|
||||
/// Most likely measurement outcome (basis state index)
|
||||
pub found_index: usize,
|
||||
/// Probability of measuring the found state
|
||||
pub success_probability: f64,
|
||||
/// Number of Grover iterations performed
|
||||
pub iterations: usize,
|
||||
/// Total wall-clock time
|
||||
pub elapsed: Duration,
|
||||
/// Full probability distribution (optional, for analysis)
|
||||
pub probabilities: Option<Vec<f64>>,
|
||||
}
|
||||
```
|
||||
|
||||
**Pseudocode for the complete algorithm**:
|
||||
|
||||
```rust
|
||||
pub fn grover_search(config: &GroverConfig) -> GroverResult {
|
||||
let n = config.n_qubits;
|
||||
let num_states = 1 << n;
|
||||
|
||||
// Step 1: Initialize uniform superposition
|
||||
// |s> = H^n |0...0> = (1/sqrt(N)) * sum_k |k>
|
||||
let mut state = QuantumState::new(n);
|
||||
state.hadamard_all(); // O(n * 2^n)
|
||||
|
||||
// Step 2: Determine iteration count
|
||||
let k = config.targets.len();
|
||||
let iterations = config.iterations
|
||||
.unwrap_or_else(|| optimal_iterations(n, k));
|
||||
|
||||
// Step 3: Apply Grover iterations
|
||||
for _iter in 0..iterations {
|
||||
// Oracle: flip phase of marked states
|
||||
match &config.oracle {
|
||||
Some(oracle) => state.oracle_apply(oracle.as_ref()),
|
||||
None => state.oracle_flip_multi(&config.targets),
|
||||
}
|
||||
|
||||
// Diffuser: inversion about the mean
|
||||
state.grover_diffuser();
|
||||
}
|
||||
|
||||
// Step 4: Measure (find highest-probability state)
|
||||
let probabilities: Vec<f64> = state.amplitudes.iter()
|
||||
.map(|a| a.norm_sqr())
|
||||
.collect();
|
||||
|
||||
let found_index = probabilities.iter()
|
||||
.enumerate()
|
||||
.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap())
|
||||
.map(|(i, _)| i)
|
||||
.unwrap();
|
||||
|
||||
GroverResult {
|
||||
found_index,
|
||||
success_probability: probabilities[found_index],
|
||||
iterations,
|
||||
elapsed: start.elapsed(),
|
||||
probabilities: Some(probabilities),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. The O(1) Oracle Trick: Simulation-Unique Advantage
|
||||
|
||||
This section formalizes the performance advantage unique to state vector simulation.
|
||||
|
||||
**Hardware cost model** (per Grover iteration):
|
||||
|
||||
```
|
||||
Oracle (hardware):
|
||||
- Multi-controlled-Z gate: O(n) Toffoli gates
|
||||
- Each Toffoli: ~6 CNOT + single-qubit gates
|
||||
- Total: O(n) gates, each touching O(2^n) amplitudes in simulation
|
||||
- Simulation cost: O(n * 2^n) per oracle application
|
||||
|
||||
Diffuser (hardware):
|
||||
- H^n: n Hadamard gates = O(n * 2^n) simulation ops
|
||||
- Multi-controlled-Z: same as oracle = O(n * 2^n) simulation ops
|
||||
- H^n: O(n * 2^n) again
|
||||
- Total: O(n * 2^n) per diffuser
|
||||
|
||||
Per iteration (hardware path): O(n * 2^n)
|
||||
Total (hardware path): O(n * 2^n * sqrt(2^n)) = O(n * 2^(3n/2))
|
||||
```
|
||||
|
||||
**Simulation cost model** (with O(1) oracle optimization):
|
||||
|
||||
```
|
||||
Oracle (optimized):
|
||||
- Direct amplitude flip: O(1) for single target, O(k) for k targets
|
||||
- Simulation cost: O(k)
|
||||
|
||||
Diffuser (optimized):
|
||||
- H^n: O(n * 2^n) -- unavoidable
|
||||
- phase_flip(0): O(1) via direct index
|
||||
- H^n: O(n * 2^n)
|
||||
- Total: O(n * 2^n) per diffuser
|
||||
|
||||
Per iteration (optimized): O(n * 2^n) [dominated by diffuser]
|
||||
Total (optimized): O(n * 2^n * sqrt(2^n)) = O(n * 2^(3n/2))
|
||||
```
|
||||
|
||||
The asymptotic complexity is the same (diffuser dominates), but the constant factor
|
||||
improvement is significant: the oracle step drops from O(n * 2^n) to O(k), saving
|
||||
roughly 50% of per-iteration time for single-target search.
|
||||
|
||||
### 6. Multi-Target Grover Support
|
||||
|
||||
When multiple states are marked (k > 1), the algorithm converges faster:
|
||||
|
||||
```
|
||||
iterations(k) = floor(pi/4 * sqrt(N/k))
|
||||
```
|
||||
|
||||
The success probability oscillates sinusoidally. For k targets:
|
||||
|
||||
```
|
||||
P(success after t iterations) = sin^2((2t+1) * arcsin(sqrt(k/N)))
|
||||
```
|
||||
|
||||
```rust
|
||||
/// Compute success probability after t Grover iterations.
|
||||
pub fn success_probability(n_qubits: usize, n_marked: usize, iterations: usize) -> f64 {
|
||||
let n = (1_usize << n_qubits) as f64;
|
||||
let k = n_marked as f64;
|
||||
let theta = (k / n).sqrt().asin();
|
||||
let angle = (2.0 * iterations as f64 + 1.0) * theta;
|
||||
angle.sin().powi(2)
|
||||
}
|
||||
```
|
||||
|
||||
**Over-iteration risk**: If too many iterations are applied, the algorithm starts
|
||||
"uncomputing" the answer. The success probability oscillates with period
|
||||
~pi * sqrt(N/k) / 2. Our implementation auto-computes the optimal count and warns
|
||||
if the user-specified count deviates significantly.
|
||||
|
||||
### 7. Performance Benchmarks
|
||||
|
||||
#### Measured Performance Estimates
|
||||
|
||||
| Qubits | States | Iterations | Oracle Cost | Diffuser Cost | Total |
|
||||
|--------|--------|-----------|-------------|--------------|-------|
|
||||
| 4 | 16 | 3 | 3 * O(1) | 3 * O(64) | <0.01ms |
|
||||
| 8 | 256 | 12 | 12 * O(1) | 12 * O(2048) | <0.1ms |
|
||||
| 12 | 4,096 | 50 | 50 * O(1) | 50 * O(49K) | ~1ms |
|
||||
| 16 | 65,536 | 201 | 201 * O(1) | 201 * O(1M) | ~10ms |
|
||||
| 20 | 1,048,576 | 804 | 804 * O(1) | 804 * O(20M) | ~500ms |
|
||||
| 24 | 16,777,216 | 3,217 | 3217 * O(1) | 3217 * O(402M) | ~60s |
|
||||
|
||||
**Gate-count equivalent** (for comparison with hardware gate-based simulation):
|
||||
|
||||
| Qubits | Grover Iterations | Equivalent Gate Count | Index-Optimized Ops |
|
||||
|--------|------------------|----------------------|---------------------|
|
||||
| 8 | 12 | ~200 gates | ~25K ops |
|
||||
| 12 | 50 | ~1,500 gates | ~2.5M ops |
|
||||
| 16 | 201 | ~10,000 gates | ~200M ops |
|
||||
| 20 | 804 | ~60,000 gates | ~16B ops |
|
||||
|
||||
The "gates" column counts oracle gates (decomposed) + diffuser gates. The "ops" column
|
||||
counts actual floating-point operations in the optimized simulation path. The ratio
|
||||
confirms that the O(1) oracle trick yields a roughly 2x constant-factor improvement
|
||||
for the overall search.
|
||||
|
||||
### 8. Integration with HNSW Index for Hybrid Quantum-Classical Search
|
||||
|
||||
A speculative but architecturally sound integration path connects Grover's search with
|
||||
ruVector's HNSW (Hierarchical Navigable Small World) index:
|
||||
|
||||
```
|
||||
Hybrid Quantum-Classical Nearest-Neighbor Search
|
||||
=================================================
|
||||
|
||||
Phase 1: Classical HNSW (coarse filtering)
|
||||
- Navigate the HNSW graph to find candidate neighborhood
|
||||
- Reduce search space from N to ~sqrt(N) candidates
|
||||
- Time: O(log N)
|
||||
|
||||
Phase 2: Grover's Search (fine filtering)
|
||||
- Encode candidate set as Grover oracle
|
||||
- Search for exact nearest neighbor among candidates
|
||||
- Quadratic speedup over brute-force comparison
|
||||
- Time: O(N^{1/4}) for sqrt(N) candidates
|
||||
|
||||
Combined: O(log N + N^{1/4}) vs classical O(log N + sqrt(N))
|
||||
|
||||
┌──────────────────────────────────────────────┐
|
||||
│ HNSW Layer Navigation │
|
||||
│ │
|
||||
│ Layer 3: o ─────────── o ────── o │
|
||||
│ │ │ │
|
||||
│ Layer 2: o ── o ────── o ── o ──o │
|
||||
│ │ │ │ │ │ │
|
||||
│ Layer 1: o─o──o──o──o──o─o──o──o─o │
|
||||
│ │ │ │ │ │ │ │ │ │ │ │
|
||||
│ Layer 0: o-o-oo-oo-oo-oo-o-oo-oo-o │
|
||||
│ │ │
|
||||
│ ┌───────▼────────┐ │
|
||||
│ │ Candidate Pool │ │
|
||||
│ │ ~sqrt(N) items│ │
|
||||
│ └───────┬────────┘ │
|
||||
│ │ │
|
||||
└────────────────────┼───────────────────────────┘
|
||||
│
|
||||
┌──────────▼───────────┐
|
||||
│ Grover's Search │
|
||||
│ │
|
||||
│ Oracle: distance │
|
||||
│ threshold on │
|
||||
│ candidate indices │
|
||||
│ │
|
||||
│ O(N^{1/4}) queries │
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
This integration is facilitated by ruVector's existing HNSW implementation
|
||||
(150x-12,500x faster than baseline, per ruVector performance targets). The Grover
|
||||
oracle would encode a distance-threshold predicate: "is vector[i] within distance d
|
||||
of the query vector?"
|
||||
|
||||
```rust
|
||||
/// Oracle that marks basis states corresponding to vectors
|
||||
/// within distance threshold of a query.
|
||||
pub struct HnswGroverOracle {
|
||||
/// Candidate indices from HNSW coarse search
|
||||
pub candidates: Vec<usize>,
|
||||
/// Query vector
|
||||
pub query: Vec<f32>,
|
||||
/// Distance threshold
|
||||
pub threshold: f32,
|
||||
/// Pre-computed distances (for O(1) oracle evaluation)
|
||||
pub distances: Vec<f32>,
|
||||
}
|
||||
|
||||
impl GroverOracle for HnswGroverOracle {
|
||||
fn is_marked(&self, index: usize, _n_qubits: usize) -> bool {
|
||||
if index < self.distances.len() {
|
||||
self.distances[index] <= self.threshold
|
||||
} else {
|
||||
false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**: This hybrid approach is currently theoretical for classical simulation.
|
||||
Its value lies in (a) algorithm prototyping for future quantum hardware, and
|
||||
(b) demonstrating integration patterns between quantum algorithms and classical
|
||||
data structures.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **O(1) oracle optimization** provides a 2x constant-factor speedup unique to state
|
||||
vector simulation, making Grover's algorithm practical for up to 20+ qubits
|
||||
2. **Dual oracle modes** support both fast known-target search (index-based) and general
|
||||
black-box function search (predicate-based)
|
||||
3. **Auto-computed iteration count** prevents over-iteration and ensures near-optimal
|
||||
success probability
|
||||
4. **Multi-target support** handles the general case of k marked states with appropriate
|
||||
iteration adjustment
|
||||
5. **HNSW integration path** provides a concrete vision for hybrid quantum-classical
|
||||
search that leverages ruVector's existing vector database infrastructure
|
||||
|
||||
### Risks
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| Diffuser dominates runtime, limiting oracle optimization benefit | High | Low | Accept 2x improvement; focus on SIMD-optimized Hadamard |
|
||||
| Multi-target count unknown in practice | Medium | Medium | Quantum counting subroutine (future work) |
|
||||
| HNSW integration adds complexity with unclear practical advantage | Low | Low | Keep as optional module, prototype-only initially |
|
||||
| Over-iteration produces incorrect results | Low | High | Auto-compute + warning system + probability tracking |
|
||||
|
||||
### Trade-offs
|
||||
|
||||
| Decision | Advantage | Disadvantage |
|
||||
|----------|-----------|--------------|
|
||||
| O(1) index oracle | Massive speedup for known targets | Not applicable to true black-box search |
|
||||
| Auto iteration count | Prevents user error | Less flexible for advanced use cases |
|
||||
| General oracle trait | Supports arbitrary predicates | O(2^n) per application (no speedup over gates) |
|
||||
| Eager probability tracking | Enables convergence monitoring | Memory overhead for probability vector |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Grover, L.K. "A fast quantum mechanical algorithm for database search." Proceedings of the 28th Annual ACM Symposium on Theory of Computing, 212-219 (1996)
|
||||
- Boyer, M., Brassard, G., Hoyer, P., Tapp, A. "Tight bounds on quantum searching." Fortschritte der Physik 46, 493-505 (1998)
|
||||
- Malviya, Y.K., Zapatero, R.A. "Quantum search algorithms for database search: A comprehensive review." arXiv:2311.01265 (2023)
|
||||
- ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
|
||||
- ADR-QE-005: VQE Algorithm Support (parameterized circuits, expectation values)
|
||||
- ruVector HNSW implementation: 150x-12,500x faster pattern search (CLAUDE.md performance targets)
|
||||
- ruQu crate: `crates/ruQu/src/` - syndrome processing and state vector infrastructure
|
||||
631
docs/adr/quantum-engine/ADR-QE-007-qaoa-maxcut-implementation.md
Normal file
631
docs/adr/quantum-engine/ADR-QE-007-qaoa-maxcut-implementation.md
Normal file
|
|
@ -0,0 +1,631 @@
|
|||
# ADR-QE-007: QAOA MaxCut Implementation
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 0.1 | 2026-02-06 | ruv.io | Initial QAOA MaxCut architecture proposal |
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
### Combinatorial Optimization on Quantum Computers
|
||||
|
||||
The Quantum Approximate Optimization Algorithm (QAOA), introduced by Farhi, Goldstone,
|
||||
and Gutmann (2014), is a leading candidate for demonstrating quantum advantage on
|
||||
combinatorial optimization problems. QAOA constructs a parameterized quantum circuit that
|
||||
encodes the cost function of an optimization problem and uses classical outer-loop
|
||||
optimization to find parameters that maximize the expected cost.
|
||||
|
||||
### MaxCut as the Canonical QAOA Problem
|
||||
|
||||
MaxCut is the prototypical problem for QAOA: given a graph G = (V, E), partition the
|
||||
vertices into two sets S and S-complement to maximize the number of edges crossing the
|
||||
partition.
|
||||
|
||||
```
|
||||
MaxCut Example (5 vertices, 6 edges):
|
||||
|
||||
0 ─── 1
|
||||
│ \ │
|
||||
│ \ │
|
||||
3 ─── 2
|
||||
│
|
||||
4
|
||||
|
||||
Optimal cut: S = {0, 2, 4}, S' = {1, 3}
|
||||
Cut value: 5 edges crossing (0-1, 0-3, 1-2, 2-3, 2-4)
|
||||
```
|
||||
|
||||
The cost function is:
|
||||
|
||||
```
|
||||
C(z) = sum_{(i,j) in E} (1 - z_i * z_j) / 2
|
||||
```
|
||||
|
||||
where z_i in {+1, -1} encodes the partition assignment.
|
||||
|
||||
### QAOA Circuit Structure
|
||||
|
||||
A depth-p QAOA circuit alternates two types of layers:
|
||||
|
||||
1. **Phase separation** (encodes the problem): For each edge (i,j), apply
|
||||
exp(-i * gamma * Z_i Z_j / 2)
|
||||
2. **Mixing** (explores the solution space): For each qubit i, apply
|
||||
exp(-i * beta * X_i) = Rx(2*beta)
|
||||
|
||||
```
|
||||
QAOA Circuit (p layers):
|
||||
|
||||
|+> ──[Phase(gamma_1)]──[Mix(beta_1)]──[Phase(gamma_2)]──[Mix(beta_2)]── ... ──[Measure]
|
||||
│
|
||||
Parameters: gamma = [gamma_1, ..., gamma_p], beta = [beta_1, ..., beta_p] │
|
||||
▼
|
||||
Classical
|
||||
Optimizer
|
||||
```
|
||||
|
||||
### Why QAOA Matters for ruQu
|
||||
|
||||
| Motivation | Details |
|
||||
|------------|---------|
|
||||
| Optimization benchmarks | Standard workload for evaluating quantum simulator performance |
|
||||
| Graph problems | Natural integration with ruVector graph database (ruvector-graph) |
|
||||
| Variational algorithm | Shares infrastructure with VQE (ADR-QE-005): parameterized circuits, expectation values, classical optimizers |
|
||||
| Scalability study | QAOA depth and graph size provide tunable complexity for benchmarking |
|
||||
| Agent integration | ruVector agents can use QAOA to solve graph optimization tasks autonomously |
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Phase Separation Operator: Native Rzz Gate
|
||||
|
||||
The phase separation operator for MaxCut applies exp(-i * gamma * Z_i Z_j / 2) for
|
||||
each edge (i,j). We implement this as a native two-qubit operation via direct amplitude
|
||||
manipulation, avoiding CNOT decomposition.
|
||||
|
||||
**Mathematical basis**:
|
||||
|
||||
```
|
||||
exp(-i * theta * Z_i Z_j / 2) acts on computational basis states as:
|
||||
|
||||
|00> -> e^{-i*theta/2} |00> (Z_i Z_j = +1)
|
||||
|01> -> e^{+i*theta/2} |01> (Z_i Z_j = -1)
|
||||
|10> -> e^{+i*theta/2} |10> (Z_i Z_j = -1)
|
||||
|11> -> e^{-i*theta/2} |11> (Z_i Z_j = +1)
|
||||
```
|
||||
|
||||
In the state vector, for each amplitude at index k:
|
||||
- Extract bits i and j from k
|
||||
- Compute parity = bit_i XOR bit_j
|
||||
- Apply phase: `amp[k] *= exp(-i * theta * (-1)^parity / 2)`
|
||||
- If parity = 0 (same bits): `amp[k] *= exp(-i * theta / 2)`
|
||||
- If parity = 1 (different bits): `amp[k] *= exp(+i * theta / 2)`
|
||||
|
||||
```rust
|
||||
impl QuantumState {
|
||||
/// Apply Rzz(theta) = exp(-i * theta * Z_i Z_j / 2) via direct amplitude
|
||||
/// manipulation.
|
||||
///
|
||||
/// For each basis state |k>:
|
||||
/// - Compute parity of bits i and j in k
|
||||
/// - Apply phase e^{-i * theta * (-1)^parity / 2}
|
||||
///
|
||||
/// Complexity: O(2^n) -- single pass over state vector.
|
||||
/// Vectorizable: all amplitudes are independent (no swaps).
|
||||
///
|
||||
/// Hardware equivalent: CNOT(i,j) + Rz(theta, j) + CNOT(i,j) = 3 gates.
|
||||
pub fn rzz(&mut self, theta: f64, qubit_i: usize, qubit_j: usize) {
|
||||
let phase_same = Complex64::from_polar(1.0, -theta / 2.0);
|
||||
let phase_diff = Complex64::from_polar(1.0, theta / 2.0);
|
||||
|
||||
let mask_i = 1_usize << qubit_i;
|
||||
let mask_j = 1_usize << qubit_j;
|
||||
|
||||
for k in 0..self.amplitudes.len() {
|
||||
let bit_i = (k & mask_i) >> qubit_i;
|
||||
let bit_j = (k & mask_j) >> qubit_j;
|
||||
let parity = bit_i ^ bit_j;
|
||||
|
||||
if parity == 0 {
|
||||
self.amplitudes[k] *= phase_same;
|
||||
} else {
|
||||
self.amplitudes[k] *= phase_diff;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Vectorization opportunity**: The inner loop is a streaming operation over the amplitude
|
||||
array with no data dependencies between iterations. This is ideal for SIMD vectorization
|
||||
(AVX-512 can process 8 complex64 values per instruction) and parallelization across
|
||||
cores.
|
||||
|
||||
### 2. Mixing Operator
|
||||
|
||||
The mixing operator applies Rx(2*beta) to each qubit:
|
||||
|
||||
```
|
||||
Rx(2*beta) = exp(-i * beta * X) = [[cos(beta), -i*sin(beta)],
|
||||
[-i*sin(beta), cos(beta)]]
|
||||
```
|
||||
|
||||
This uses the standard single-qubit gate application from the simulator core:
|
||||
|
||||
```rust
|
||||
impl QuantumState {
|
||||
/// Apply the QAOA mixing operator: Rx(2*beta) on each qubit.
|
||||
///
|
||||
/// Complexity: O(n * 2^n) for n qubits.
|
||||
pub fn qaoa_mixing(&mut self, beta: f64) {
|
||||
for qubit in 0..self.n_qubits {
|
||||
self.rx(2.0 * beta, qubit);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. QAOA Circuit Construction
|
||||
|
||||
A convenience function builds the full QAOA circuit from a graph and parameters:
|
||||
|
||||
```rust
|
||||
/// A graph represented as an edge list with optional weights.
|
||||
pub struct Graph {
|
||||
/// Number of vertices
|
||||
pub n_vertices: usize,
|
||||
/// Edges: (vertex_i, vertex_j, weight)
|
||||
pub edges: Vec<(usize, usize, f64)>,
|
||||
}
|
||||
|
||||
impl Graph {
|
||||
/// Construct from adjacency list.
|
||||
pub fn from_adjacency_list(adj: &[Vec<usize>]) -> Self;
|
||||
|
||||
/// Construct from edge list (unweighted, weight = 1.0).
|
||||
pub fn from_edge_list(n_vertices: usize, edges: &[(usize, usize)]) -> Self;
|
||||
|
||||
/// Load from ruVector graph query result.
|
||||
pub fn from_ruvector_query(result: &GraphQueryResult) -> Self;
|
||||
}
|
||||
|
||||
/// QAOA configuration.
|
||||
pub struct QaoaConfig {
|
||||
/// Graph defining the MaxCut instance
|
||||
pub graph: Graph,
|
||||
/// QAOA depth (number of layers)
|
||||
pub p: usize,
|
||||
/// Gamma parameters (phase separation angles), length = p
|
||||
pub gammas: Vec<f64>,
|
||||
/// Beta parameters (mixing angles), length = p
|
||||
pub betas: Vec<f64>,
|
||||
}
|
||||
|
||||
/// Build and simulate a QAOA circuit for MaxCut.
|
||||
///
|
||||
/// Circuit structure for depth p:
|
||||
/// 1. Initialize |+>^n (Hadamard on all qubits)
|
||||
/// 2. For layer l = 1..p:
|
||||
/// a. Phase separation: Rzz(gamma_l, i, j) for each edge (i,j)
|
||||
/// b. Mixing: Rx(2*beta_l) on each qubit
|
||||
/// 3. Return final state
|
||||
pub fn build_qaoa_circuit(config: &QaoaConfig) -> QuantumState {
|
||||
let n = config.graph.n_vertices;
|
||||
let mut state = QuantumState::new(n);
|
||||
|
||||
// Step 1: Initialize uniform superposition
|
||||
state.hadamard_all();
|
||||
|
||||
// Step 2: Alternating phase separation and mixing layers
|
||||
for layer in 0..config.p {
|
||||
let gamma = config.gammas[layer];
|
||||
let beta = config.betas[layer];
|
||||
|
||||
// Phase separation: apply Rzz for each edge
|
||||
for &(i, j, weight) in &config.graph.edges {
|
||||
state.rzz(gamma * weight, i, j);
|
||||
}
|
||||
|
||||
// Mixing: Rx(2*beta) on each qubit
|
||||
state.qaoa_mixing(beta);
|
||||
}
|
||||
|
||||
state
|
||||
}
|
||||
```
|
||||
|
||||
**Pseudocode for the complete QAOA MaxCut solver**:
|
||||
|
||||
```rust
|
||||
pub fn qaoa_maxcut(
|
||||
graph: &Graph,
|
||||
p: usize,
|
||||
optimizer: &mut dyn ClassicalOptimizer,
|
||||
config: &QaoaOptConfig,
|
||||
) -> QaoaResult {
|
||||
let n_params = 2 * p; // p gammas + p betas
|
||||
optimizer.initialize(n_params);
|
||||
|
||||
let mut params = config.initial_params.clone()
|
||||
.unwrap_or_else(|| {
|
||||
// Standard initialization: gamma in [0, pi], beta in [0, pi/2]
|
||||
let mut p_init = vec![0.0; n_params];
|
||||
for i in 0..p {
|
||||
p_init[i] = 0.5; // gamma_i
|
||||
p_init[p + i] = 0.25; // beta_i
|
||||
}
|
||||
p_init
|
||||
});
|
||||
|
||||
let mut best_cost = f64::NEG_INFINITY;
|
||||
let mut best_params = params.clone();
|
||||
let mut history = Vec::new();
|
||||
|
||||
for iteration in 0..config.max_iterations {
|
||||
let gammas = params[..p].to_vec();
|
||||
let betas = params[p..].to_vec();
|
||||
|
||||
// Build and simulate circuit
|
||||
let qaoa_config = QaoaConfig {
|
||||
graph: graph.clone(),
|
||||
p,
|
||||
gammas,
|
||||
betas,
|
||||
};
|
||||
let state = build_qaoa_circuit(&qaoa_config);
|
||||
|
||||
// Evaluate MaxCut cost function
|
||||
let cost = maxcut_expectation(&state, graph);
|
||||
|
||||
if cost > best_cost {
|
||||
best_cost = cost;
|
||||
best_params = params.clone();
|
||||
}
|
||||
|
||||
// Gradient computation (parameter-shift rule, same as VQE)
|
||||
let grad = if optimizer.needs_gradient() {
|
||||
Some(qaoa_gradient(graph, p, ¶ms))
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
history.push(QaoaIteration { iteration, cost, params: params.clone() });
|
||||
|
||||
let result = optimizer.step(¶ms, -cost, grad.as_deref());
|
||||
// Note: negate cost because optimizer minimizes
|
||||
params = result.new_params;
|
||||
|
||||
if result.converged {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Sample the final state to get candidate cuts
|
||||
let final_state = build_qaoa_circuit(&QaoaConfig {
|
||||
graph: graph.clone(),
|
||||
p,
|
||||
gammas: best_params[..p].to_vec(),
|
||||
betas: best_params[p..].to_vec(),
|
||||
});
|
||||
let best_cut = sample_maxcut(&final_state, graph, config.sample_shots);
|
||||
|
||||
QaoaResult {
|
||||
best_cost,
|
||||
best_params,
|
||||
best_cut,
|
||||
iterations: history.len(),
|
||||
history,
|
||||
approximation_ratio: best_cost / graph.max_cut_upper_bound(),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Cost Function Evaluation
|
||||
|
||||
The MaxCut cost function in Pauli operator form is:
|
||||
|
||||
```
|
||||
C = sum_{(i,j) in E} w_{ij} * (1 - Z_i Z_j) / 2
|
||||
```
|
||||
|
||||
This reuses the PauliSum expectation API from ADR-QE-005:
|
||||
|
||||
```rust
|
||||
/// Compute the MaxCut cost as the expectation value of the cost Hamiltonian.
|
||||
///
|
||||
/// C = sum_{(i,j) in E} w_ij * (1 - Z_i Z_j) / 2
|
||||
/// = sum_{(i,j) in E} w_ij/2 - sum_{(i,j) in E} w_ij/2 * Z_i Z_j
|
||||
/// = const - sum_{(i,j)} w_ij/2 * <Z_i Z_j>
|
||||
///
|
||||
/// Each Z_i Z_j expectation is computed via the efficient diagonal trick:
|
||||
/// <psi| Z_i Z_j |psi> = sum_k |amp_k|^2 * (-1)^{bit_i(k) XOR bit_j(k)}
|
||||
pub fn maxcut_expectation(state: &QuantumState, graph: &Graph) -> f64 {
|
||||
let mut cost = 0.0;
|
||||
|
||||
for &(i, j, weight) in &graph.edges {
|
||||
let mask_i = 1_usize << i;
|
||||
let mask_j = 1_usize << j;
|
||||
|
||||
let mut zz_expectation = 0.0;
|
||||
for k in 0..state.amplitudes.len() {
|
||||
let bit_i = (k & mask_i) >> i;
|
||||
let bit_j = (k & mask_j) >> j;
|
||||
let parity = bit_i ^ bit_j;
|
||||
let sign = 1.0 - 2.0 * parity as f64; // +1 if same, -1 if different
|
||||
zz_expectation += state.amplitudes[k].norm_sqr() * sign;
|
||||
}
|
||||
|
||||
cost += weight * (1.0 - zz_expectation) / 2.0;
|
||||
}
|
||||
|
||||
cost
|
||||
}
|
||||
```
|
||||
|
||||
**Optimization**: Since Z_i Z_j is diagonal in the computational basis, the expectation
|
||||
reduces to a weighted sum over probabilities. No amplitude swapping is needed, and the
|
||||
computation is embarrassingly parallel.
|
||||
|
||||
### 5. Sampling Mode
|
||||
|
||||
In addition to exact expectation values, we support sampling the final state to
|
||||
obtain candidate cuts:
|
||||
|
||||
```rust
|
||||
/// Sample the QAOA state to find candidate MaxCut solutions.
|
||||
///
|
||||
/// Returns the best cut found across `shots` samples.
|
||||
pub fn sample_maxcut(
|
||||
state: &QuantumState,
|
||||
graph: &Graph,
|
||||
shots: usize,
|
||||
) -> MaxCutSolution {
|
||||
let probabilities: Vec<f64> = state.amplitudes.iter()
|
||||
.map(|a| a.norm_sqr())
|
||||
.collect();
|
||||
|
||||
let mut best_cut_value = 0.0;
|
||||
let mut best_bitstring = 0_usize;
|
||||
let mut rng = thread_rng();
|
||||
|
||||
for _ in 0..shots {
|
||||
// Sample from probability distribution
|
||||
let sample = sample_from_distribution(&probabilities, &mut rng);
|
||||
|
||||
// Evaluate cut value for this bitstring
|
||||
let cut_value = evaluate_cut(sample, graph);
|
||||
|
||||
if cut_value > best_cut_value {
|
||||
best_cut_value = cut_value;
|
||||
best_bitstring = sample;
|
||||
}
|
||||
}
|
||||
|
||||
MaxCutSolution {
|
||||
partition: best_bitstring,
|
||||
cut_value: best_cut_value,
|
||||
set_s: (0..graph.n_vertices)
|
||||
.filter(|&v| (best_bitstring >> v) & 1 == 1)
|
||||
.collect(),
|
||||
set_s_complement: (0..graph.n_vertices)
|
||||
.filter(|&v| (best_bitstring >> v) & 1 == 0)
|
||||
.collect(),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Graph Interface
|
||||
|
||||
Three input modes cover common use cases:
|
||||
|
||||
```rust
|
||||
impl Graph {
|
||||
/// From adjacency list (unweighted).
|
||||
///
|
||||
/// Example: adj[0] = [1, 3] means vertex 0 connects to 1 and 3.
|
||||
pub fn from_adjacency_list(adj: &[Vec<usize>]) -> Self {
|
||||
let n = adj.len();
|
||||
let mut edges = Vec::new();
|
||||
let mut seen = std::collections::HashSet::new();
|
||||
|
||||
for (u, neighbors) in adj.iter().enumerate() {
|
||||
for &v in neighbors {
|
||||
let edge = if u < v { (u, v) } else { (v, u) };
|
||||
if seen.insert(edge) {
|
||||
edges.push((edge.0, edge.1, 1.0));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Self { n_vertices: n, edges }
|
||||
}
|
||||
|
||||
/// From edge list with uniform weight.
|
||||
pub fn from_edge_list(n_vertices: usize, edge_list: &[(usize, usize)]) -> Self {
|
||||
Self {
|
||||
n_vertices,
|
||||
edges: edge_list.iter().map(|&(u, v)| (u, v, 1.0)).collect(),
|
||||
}
|
||||
}
|
||||
|
||||
/// From ruVector graph database query result.
|
||||
///
|
||||
/// Enables QAOA MaxCut on graphs stored in ruvector-graph.
|
||||
pub fn from_ruvector_query(result: &GraphQueryResult) -> Self {
|
||||
// Convert ruvector-graph nodes and edges to QAOA format
|
||||
// Vertex IDs are remapped to contiguous 0..n range
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7. Tensor Network Optimization for Sparse Graphs
|
||||
|
||||
For sparse or planar graphs, the QAOA state can be represented more efficiently using
|
||||
tensor network contraction. The key insight is that QAOA circuits have a structure
|
||||
dictated by the graph topology:
|
||||
|
||||
```
|
||||
Tensor Network View of QAOA:
|
||||
|
||||
Qubit 0: ──[H]──[Rzz(0,1)]──[Rzz(0,3)]──[Rx]── ...
|
||||
Qubit 1: ──[H]──[Rzz(0,1)]──[Rzz(1,2)]──[Rx]── ...
|
||||
Qubit 2: ──[H]──[Rzz(1,2)]──[Rzz(2,3)]──[Rx]── ...
|
||||
Qubit 3: ──[H]──[Rzz(0,3)]──[Rzz(2,3)]──[Rx]── ...
|
||||
|
||||
For a planar graph with treewidth w, tensor contraction costs O(2^w * poly(n))
|
||||
instead of O(2^n). For many practical graphs, w << n.
|
||||
```
|
||||
|
||||
```rust
|
||||
/// Detect graph treewidth and decide simulation strategy.
|
||||
pub fn select_simulation_strategy(graph: &Graph) -> SimulationStrategy {
|
||||
let treewidth = estimate_treewidth(graph);
|
||||
let n = graph.n_vertices;
|
||||
|
||||
if treewidth <= 20 && n > 24 {
|
||||
// Tensor network contraction is cheaper than full state vector
|
||||
SimulationStrategy::TensorNetwork {
|
||||
contraction_order: compute_contraction_order(graph),
|
||||
estimated_cost: (1 << treewidth) * n * n,
|
||||
}
|
||||
} else {
|
||||
SimulationStrategy::StateVector {
|
||||
estimated_cost: 1 << n,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub enum SimulationStrategy {
|
||||
StateVector { estimated_cost: usize },
|
||||
TensorNetwork {
|
||||
contraction_order: Vec<ContractionStep>,
|
||||
estimated_cost: usize,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### 8. Performance Analysis
|
||||
|
||||
#### Gate Counts and Timing
|
||||
|
||||
For a graph with n vertices, m edges, and QAOA depth p:
|
||||
|
||||
| Operation | Gate Count per Layer | Total Gates (p layers) |
|
||||
|-----------|---------------------|----------------------|
|
||||
| Phase separation (Rzz) | m | p * m |
|
||||
| Mixing (Rx) | n | p * n |
|
||||
| **Total per layer** | **m + n** | **p * (m + n)** |
|
||||
|
||||
**Benchmark estimates**:
|
||||
|
||||
| Configuration | n | m | p | Total Gates | Estimated Time |
|
||||
|---------------|---|---|---|-------------|---------------|
|
||||
| Small triangle | 3 | 3 | 1 | 6 | <0.01ms |
|
||||
| Petersen graph | 10 | 15 | 3 | 75 | <0.1ms |
|
||||
| Random d-reg (d=3) | 10 | 15 | 5 | 125 | <0.5ms |
|
||||
| Grid 4x5 | 20 | 31 | 3 | 189 | ~50ms |
|
||||
| Grid 4x5 | 20 | 31 | 5 | 315 | ~100ms |
|
||||
| Random d-reg (d=4) | 20 | 40 | 5 | 400 | ~200ms |
|
||||
| Dense (complete) | 20 | 190 | 3 | 630 | ~300ms |
|
||||
| Sparse large | 24 | 36 | 3 | 216 | ~5s |
|
||||
| Dense large | 24 | 276 | 5 | 1500 | ~30s |
|
||||
|
||||
**Memory requirements**:
|
||||
|
||||
| Qubits | State Vector Size | Memory |
|
||||
|--------|------------------|--------|
|
||||
| 10 | 1,024 | 16 KB |
|
||||
| 16 | 65,536 | 1 MB |
|
||||
| 20 | 1,048,576 | 16 MB |
|
||||
| 24 | 16,777,216 | 256 MB |
|
||||
| 28 | 268,435,456 | 4 GB |
|
||||
|
||||
### 9. Integration with ruvector-graph
|
||||
|
||||
The connection to ruVector's graph database enables a powerful workflow:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ QAOA MaxCut Pipeline │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌────────────────┐ ┌──────────────────┐ │
|
||||
│ │ ruvector-graph│ │ QAOA Engine │ │ Result Store │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ Query: │────>│ Build circuit │────>│ Optimal cut │ │
|
||||
│ │ "find all │ │ Optimize │ │ Partition │ │
|
||||
│ │ connected │ │ Sample │ │ Approximation │ │
|
||||
│ │ subgraphs │ │ │ │ ratio │ │
|
||||
│ │ of size k" │ │ │ │ │ │
|
||||
│ └──────────────┘ └────────────────┘ └──────────────────┘ │
|
||||
│ │
|
||||
│ Data Flow: │
|
||||
│ 1. Agent queries ruvector-graph for subgraph │
|
||||
│ 2. Graph converted to QAOA format via Graph::from_ruvector_query() │
|
||||
│ 3. QAOA optimizer runs with configurable depth p │
|
||||
│ 4. Results stored in ruVector memory for pattern learning │
|
||||
│ 5. Agent uses learned patterns to choose p and initial parameters │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
The ruvector-mincut integration is particularly relevant: the existing
|
||||
`SubpolynomialMinCut` algorithm (El-Hayek/Henzinger/Li, O(n^{o(1)}) amortized) provides
|
||||
exact min-cut values that serve as a lower bound for MaxCut verification. QAOA solutions
|
||||
can be validated against this classical baseline.
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **Native Rzz gate** via direct amplitude manipulation avoids CNOT decomposition,
|
||||
yielding a simpler and faster phase separation implementation
|
||||
2. **PauliSum expectation API reuse** from ADR-QE-005 provides a unified interface for
|
||||
all variational algorithms (VQE, QAOA, and future extensions)
|
||||
3. **Graph interface flexibility** supports adjacency lists, edge lists, and ruVector
|
||||
graph queries, covering the most common input formats
|
||||
4. **Tensor network fallback** for low-treewidth graphs extends QAOA to larger problem
|
||||
instances than pure state vector simulation allows
|
||||
5. **ruvector-graph integration** enables a seamless pipeline from graph storage to
|
||||
quantum optimization to result analysis
|
||||
|
||||
### Risks
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| QAOA at low depth p gives poor approximation ratios | High | Medium | Support high-p QAOA, classical warm-starting |
|
||||
| Treewidth estimation is NP-hard in general | Medium | Low | Use heuristic upper bounds (min-degree, greedy) |
|
||||
| Parameter landscape has many local minima | Medium | Medium | Multi-start optimization, INTERP initialization |
|
||||
| Large dense graphs exhaust memory | Medium | High | Tensor network fallback, graph coarsening |
|
||||
|
||||
### Trade-offs
|
||||
|
||||
| Decision | Advantage | Disadvantage |
|
||||
|----------|-----------|--------------|
|
||||
| Direct Rzz over CNOT decomposition | Simpler, faster | Not a one-to-one hardware circuit mapping |
|
||||
| Exact expectation over sampling | No statistical noise | Does not model real hardware shot noise |
|
||||
| Automatic strategy selection | Transparent to user | Additional complexity in simulation backend |
|
||||
| Integrated graph interface | Seamless workflow | Coupling to ruvector-graph API |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Farhi, E., Goldstone, J., Gutmann, S. "A Quantum Approximate Optimization Algorithm." arXiv:1411.4028 (2014)
|
||||
- Hadfield, S. et al. "From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz." Algorithms 12, 34 (2019)
|
||||
- Zhou, L. et al. "Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation on Near-Term Devices." Physical Review X 10, 021067 (2020)
|
||||
- Guerreschi, G.G., Matsuura, A.Y. "QAOA for Max-Cut requires hundreds of qubits for quantum speed-up." Scientific Reports 9, 6903 (2019)
|
||||
- ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
|
||||
- ADR-QE-005: VQE Algorithm Support (shared parameterized circuit and optimizer infrastructure)
|
||||
- ADR-QE-006: Grover's Search Implementation (quantum state manipulation primitives)
|
||||
- ruvector-mincut: `crates/ruvector-mincut/` - El-Hayek/Henzinger/Li subpolynomial min-cut
|
||||
- ruvector-graph: graph database integration for sourcing MaxCut instances
|
||||
|
|
@ -0,0 +1,997 @@
|
|||
# ADR-QE-008: Surface Code Error Correction Simulation
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
## Version History
|
||||
|
||||
| Version | Date | Author | Changes |
|
||||
|---------|------|--------|---------|
|
||||
| 0.1 | 2026-02-06 | ruv.io | Initial surface code QEC simulation proposal |
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
### The Importance of QEC Simulation
|
||||
|
||||
Quantum Error Correction (QEC) is the bridge between noisy intermediate-scale quantum
|
||||
(NISQ) devices and fault-tolerant quantum computing. Before deploying error correction
|
||||
on real hardware, every aspect of the QEC stack must be validated through simulation:
|
||||
|
||||
1. **Decoder validation**: Verify that decoding algorithms (MWPM, Union-Find, neural
|
||||
decoders) produce correct corrections under various noise models
|
||||
2. **Threshold estimation**: Determine the physical error rate below which logical error
|
||||
rate decreases with increasing code distance
|
||||
3. **Architecture exploration**: Compare surface code layouts, flag qubit placements, and
|
||||
scheduling strategies
|
||||
4. **Noise model development**: Test decoder robustness against realistic noise (correlated
|
||||
errors, leakage, crosstalk)
|
||||
|
||||
### Surface Codes as the Leading Architecture
|
||||
|
||||
The surface code is the most promising QEC architecture for superconducting qubit
|
||||
platforms due to:
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Error threshold | ~1% (highest among practical codes) |
|
||||
| Connectivity | Nearest-neighbor only (matches hardware) |
|
||||
| Syndrome extraction | Local stabilizer measurements |
|
||||
| Decoding | Efficient MWPM, Union-Find in O(n * alpha(n)) |
|
||||
|
||||
### Surface Code Layout (Distance-3)
|
||||
|
||||
```
|
||||
Distance-3 Rotated Surface Code:
|
||||
|
||||
Data qubits: D0..D8 (9 total)
|
||||
X-stabilizers: X0..X3 (4 ancilla qubits)
|
||||
Z-stabilizers: Z0..Z3 (4 ancilla qubits)
|
||||
|
||||
Z0 Z1
|
||||
/ \ / \
|
||||
D0 ──── D1 ──── D2
|
||||
| X0 | X1 |
|
||||
D3 ──── D4 ──── D5
|
||||
| X2 | X3 |
|
||||
D6 ──── D7 ──── D8
|
||||
\ / \ /
|
||||
Z2 Z3
|
||||
|
||||
Qubit count: 9 data + 8 ancilla = 17 total qubits
|
||||
State vector: 2^17 = 131,072 complex amplitudes
|
||||
Memory: 2 MB per state vector
|
||||
```
|
||||
|
||||
### What ruQu Provides Today
|
||||
|
||||
The existing ruQu crate already implements key components for error correction:
|
||||
|
||||
| Component | Module | Status |
|
||||
|-----------|--------|--------|
|
||||
| Syndrome processing | `syndrome.rs` | Production-ready (1M rounds/sec) |
|
||||
| MWPM decoder | `decoder.rs` | Integrated via fusion-blossom |
|
||||
| Min-cut coherence | `mincut.rs` | El-Hayek/Henzinger/Li algorithm |
|
||||
| Three-filter pipeline | `filters.rs` | Structural + Shift + Evidence |
|
||||
| Tile architecture | `tile.rs`, `fabric.rs` | 256-tile WASM fabric |
|
||||
| Stim integration | `stim.rs` | Syndrome generation |
|
||||
|
||||
What is **missing** is the ability to simulate the full quantum state evolution of a
|
||||
surface code cycle: ancilla initialization, stabilizer circuits, projective measurement,
|
||||
state collapse, decoder feedback, and correction application. This ADR fills that gap.
|
||||
|
||||
### Requirements
|
||||
|
||||
| Requirement | Description | Priority |
|
||||
|-------------|-------------|----------|
|
||||
| Mid-circuit measurement | Projective measurement of individual qubits | P0 |
|
||||
| Qubit reset | Reinitialize ancilla qubits to |0> each cycle | P0 |
|
||||
| Conditional operations | Apply gates conditioned on measurement outcomes | P0 |
|
||||
| Noise injection | Depolarizing, bit-flip, phase-flip channels | P0 |
|
||||
| Syndrome extraction | Extract syndrome bits from ancilla measurements | P0 |
|
||||
| Decoder integration | Feed syndromes to MWPM/min-cut decoder | P0 |
|
||||
| Logical error tracking | Determine if logical error occurred | P1 |
|
||||
| Multi-cycle simulation | Run thousands of QEC cycles efficiently | P1 |
|
||||
| Leakage modeling | Simulate qubit leakage to non-computational states | P2 |
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Mid-Circuit Measurement
|
||||
|
||||
Mid-circuit measurement is the most critical new capability. Unlike final-state
|
||||
measurement (which collapses the entire state), mid-circuit measurement collapses a
|
||||
single qubit while preserving the rest of the system for continued evolution.
|
||||
|
||||
**Mathematical formulation**:
|
||||
|
||||
For measuring qubit q in the computational basis:
|
||||
|
||||
1. Split the state into two subspaces:
|
||||
- |psi_0>: amplitudes where qubit q = 0
|
||||
- |psi_1>: amplitudes where qubit q = 1
|
||||
2. Compute probabilities:
|
||||
- P(0) = ||psi_0||^2 = sum_{k: bit_q(k)=0} |amp_k|^2
|
||||
- P(1) = ||psi_1||^2 = sum_{k: bit_q(k)=1} |amp_k|^2
|
||||
3. Sample outcome m in {0, 1} according to P(0), P(1)
|
||||
4. Collapse: zero out amplitudes in the non-selected subspace
|
||||
5. Renormalize: divide remaining amplitudes by sqrt(P(m))
|
||||
|
||||
```rust
|
||||
/// Result of a mid-circuit measurement.
|
||||
pub struct MeasurementResult {
|
||||
/// The measured qubit index
|
||||
pub qubit: usize,
|
||||
/// The measurement outcome (0 or 1)
|
||||
pub outcome: u8,
|
||||
/// The probability of this outcome
|
||||
pub probability: f64,
|
||||
}
|
||||
|
||||
impl QuantumState {
|
||||
/// Perform a projective measurement on a single qubit.
|
||||
///
|
||||
/// This collapses the qubit to |0> or |1> based on Born probabilities,
|
||||
/// zeroes out amplitudes in the rejected subspace, and renormalizes.
|
||||
///
|
||||
/// The remaining qubits are left in a valid quantum state for continued
|
||||
/// simulation (essential for mid-circuit measurement in QEC).
|
||||
///
|
||||
/// Complexity: O(2^n) -- two passes over the state vector.
|
||||
/// Pass 1: Compute probabilities P(0), P(1)
|
||||
/// Pass 2: Collapse and renormalize
|
||||
pub fn measure_qubit(
|
||||
&mut self,
|
||||
qubit: usize,
|
||||
rng: &mut impl Rng,
|
||||
) -> MeasurementResult {
|
||||
let mask = 1_usize << qubit;
|
||||
let n = self.amplitudes.len();
|
||||
|
||||
// Pass 1: Compute P(0) and P(1)
|
||||
let mut prob_0 = 0.0_f64;
|
||||
let mut prob_1 = 0.0_f64;
|
||||
|
||||
for k in 0..n {
|
||||
let p = self.amplitudes[k].norm_sqr();
|
||||
if (k & mask) == 0 {
|
||||
prob_0 += p;
|
||||
} else {
|
||||
prob_1 += p;
|
||||
}
|
||||
}
|
||||
|
||||
// Sample outcome
|
||||
let outcome = if rng.gen::<f64>() < prob_0 { 0_u8 } else { 1_u8 };
|
||||
let prob_selected = if outcome == 0 { prob_0 } else { prob_1 };
|
||||
let norm_factor = 1.0 / prob_selected.sqrt();
|
||||
|
||||
// Pass 2: Collapse and renormalize
|
||||
for k in 0..n {
|
||||
let bit = ((k & mask) >> qubit) as u8;
|
||||
if bit == outcome {
|
||||
self.amplitudes[k] *= norm_factor;
|
||||
} else {
|
||||
self.amplitudes[k] = Complex64::zero();
|
||||
}
|
||||
}
|
||||
|
||||
MeasurementResult {
|
||||
qubit,
|
||||
outcome,
|
||||
probability: prob_selected,
|
||||
}
|
||||
}
|
||||
|
||||
/// Measure multiple qubits (ancilla register).
|
||||
///
|
||||
/// Measures each qubit sequentially. The order matters because each
|
||||
/// measurement collapses the state before the next measurement.
|
||||
/// For stabilizer measurements, this correctly handles correlated outcomes.
|
||||
pub fn measure_qubits(
|
||||
&mut self,
|
||||
qubits: &[usize],
|
||||
rng: &mut impl Rng,
|
||||
) -> Vec<MeasurementResult> {
|
||||
qubits.iter()
|
||||
.map(|&q| self.measure_qubit(q, rng))
|
||||
.collect()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Qubit Reset
|
||||
|
||||
Ancilla qubits must be reinitialized to |0> at the start of each syndrome extraction
|
||||
cycle. The reset operation projects onto the |0> subspace and renormalizes:
|
||||
|
||||
```rust
|
||||
impl QuantumState {
|
||||
/// Reset a qubit to |0>.
|
||||
///
|
||||
/// Zeroes out all amplitudes where qubit q = 1, then renormalizes.
|
||||
/// This is equivalent to measuring the qubit and, if the outcome is |1>,
|
||||
/// applying an X gate to flip it back to |0>.
|
||||
///
|
||||
/// Complexity: O(2^n) -- single pass over state vector.
|
||||
///
|
||||
/// Used for ancilla reinitialization in each QEC cycle.
|
||||
pub fn reset_qubit(&mut self, qubit: usize) {
|
||||
let mask = 1_usize << qubit;
|
||||
let partner_mask = !mask;
|
||||
let n = self.amplitudes.len();
|
||||
|
||||
// For each pair of states (k, k XOR mask), move amplitude from
|
||||
// the |1> component to the |0> component.
|
||||
// This implements: |0><0| + |0><1| (measure-then-flip).
|
||||
//
|
||||
// Simpler approach: zero out |1> subspace, renormalize.
|
||||
let mut norm_sq = 0.0_f64;
|
||||
|
||||
for k in 0..n {
|
||||
if (k & mask) != 0 {
|
||||
// Qubit q is |1> in this basis state
|
||||
// Transfer amplitude to partner state with q = |0>
|
||||
let partner = k & partner_mask;
|
||||
// Coherent reset: add amplitudes
|
||||
// For incoherent reset (thermal): would zero out instead
|
||||
self.amplitudes[partner] += self.amplitudes[k];
|
||||
self.amplitudes[k] = Complex64::zero();
|
||||
}
|
||||
}
|
||||
|
||||
// Renormalize
|
||||
for k in 0..n {
|
||||
norm_sq += self.amplitudes[k].norm_sqr();
|
||||
}
|
||||
let norm_factor = 1.0 / norm_sq.sqrt();
|
||||
for amp in self.amplitudes.iter_mut() {
|
||||
*amp *= norm_factor;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Noise Model
|
||||
|
||||
We implement three standard noise channels plus a combined depolarizing model.
|
||||
Noise is applied by stochastically inserting Pauli gates after specified operations.
|
||||
|
||||
```
|
||||
Noise Channels:
|
||||
|
||||
Bit-flip (X): rho -> (1-p) * rho + p * X * rho * X
|
||||
Phase-flip (Z): rho -> (1-p) * rho + p * Z * rho * Z
|
||||
Depolarizing: rho -> (1-p) * rho + p/3 * (X*rho*X + Y*rho*Y + Z*rho*Z)
|
||||
```
|
||||
|
||||
For state vector simulation, noise is applied via **stochastic Pauli insertion**:
|
||||
|
||||
```rust
|
||||
/// Noise model configuration.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct NoiseModel {
|
||||
/// Single-qubit gate error rate
|
||||
pub single_qubit_error: f64,
|
||||
/// Two-qubit gate error rate
|
||||
pub two_qubit_error: f64,
|
||||
/// Measurement error rate (readout bit-flip)
|
||||
pub measurement_error: f64,
|
||||
/// Idle error rate (per qubit per cycle)
|
||||
pub idle_error: f64,
|
||||
/// Noise type
|
||||
pub noise_type: NoiseType,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum NoiseType {
|
||||
/// Random X errors with probability p
|
||||
BitFlip,
|
||||
/// Random Z errors with probability p
|
||||
PhaseFlip,
|
||||
/// Random X, Y, or Z errors each with probability p/3
|
||||
Depolarizing,
|
||||
/// Independent bit-flip (p_x) and phase-flip (p_z)
|
||||
Independent { p_x: f64, p_z: f64 },
|
||||
}
|
||||
|
||||
impl QuantumState {
|
||||
/// Apply a noise channel to a single qubit.
|
||||
///
|
||||
/// For depolarizing noise with probability p:
|
||||
/// - With probability 1-p: do nothing
|
||||
/// - With probability p/3: apply X
|
||||
/// - With probability p/3: apply Y
|
||||
/// - With probability p/3: apply Z
|
||||
///
|
||||
/// This stochastic Pauli insertion is exact for Pauli channels
|
||||
/// and a good approximation for general noise (Pauli twirl).
|
||||
pub fn apply_noise(
|
||||
&mut self,
|
||||
qubit: usize,
|
||||
error_rate: f64,
|
||||
noise_type: NoiseType,
|
||||
rng: &mut impl Rng,
|
||||
) {
|
||||
match noise_type {
|
||||
NoiseType::BitFlip => {
|
||||
if rng.gen::<f64>() < error_rate {
|
||||
self.apply_x(qubit);
|
||||
}
|
||||
}
|
||||
NoiseType::PhaseFlip => {
|
||||
if rng.gen::<f64>() < error_rate {
|
||||
self.apply_z(qubit);
|
||||
}
|
||||
}
|
||||
NoiseType::Depolarizing => {
|
||||
let r = rng.gen::<f64>();
|
||||
if r < error_rate / 3.0 {
|
||||
self.apply_x(qubit);
|
||||
} else if r < 2.0 * error_rate / 3.0 {
|
||||
self.apply_y(qubit);
|
||||
} else if r < error_rate {
|
||||
self.apply_z(qubit);
|
||||
}
|
||||
// else: no error (identity)
|
||||
}
|
||||
NoiseType::Independent { p_x, p_z } => {
|
||||
if rng.gen::<f64>() < p_x {
|
||||
self.apply_x(qubit);
|
||||
}
|
||||
if rng.gen::<f64>() < p_z {
|
||||
self.apply_z(qubit);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Apply idle noise to all data qubits.
|
||||
///
|
||||
/// Called once per QEC cycle to model decoherence during idle periods.
|
||||
pub fn apply_idle_noise(
|
||||
&mut self,
|
||||
data_qubits: &[usize],
|
||||
noise: &NoiseModel,
|
||||
rng: &mut impl Rng,
|
||||
) {
|
||||
for &q in data_qubits {
|
||||
self.apply_noise(q, noise.idle_error, noise.noise_type, rng);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Syndrome Extraction Circuit
|
||||
|
||||
A complete surface code syndrome extraction cycle consists of:
|
||||
|
||||
1. Reset ancilla qubits to |0>
|
||||
2. Apply CNOT chains from data qubits to ancilla (stabilizer circuits)
|
||||
3. Measure ancilla qubits to extract syndrome bits
|
||||
4. (Optionally) apply noise after each gate
|
||||
|
||||
```
|
||||
Syndrome Extraction for X-Stabilizer X0 = X_D0 * X_D1 * X_D3 * X_D4:
|
||||
|
||||
D0: ────────●───────────────────────────
|
||||
│
|
||||
D1: ────────┼──────●────────────────────
|
||||
│ │
|
||||
D3: ────────┼──────┼──────●─────────────
|
||||
│ │ │
|
||||
D4: ────────┼──────┼──────┼──────●──────
|
||||
│ │ │ │
|
||||
X0: ──|0>──[H]──CNOT──CNOT──CNOT──CNOT──[H]──[M]── syndrome bit
|
||||
|
||||
(For X-stabilizers: Hadamard on ancilla before and after CNOTs)
|
||||
(For Z-stabilizers: CNOTs in opposite direction, no Hadamards)
|
||||
```
|
||||
|
||||
```rust
|
||||
/// Surface code layout definition.
|
||||
pub struct SurfaceCodeLayout {
|
||||
/// Code distance
|
||||
pub distance: usize,
|
||||
/// Data qubit indices
|
||||
pub data_qubits: Vec<usize>,
|
||||
/// X-stabilizer definitions: (ancilla_qubit, [data_qubits])
|
||||
pub x_stabilizers: Vec<(usize, Vec<usize>)>,
|
||||
/// Z-stabilizer definitions: (ancilla_qubit, [data_qubits])
|
||||
pub z_stabilizers: Vec<(usize, Vec<usize>)>,
|
||||
/// Total qubit count (data + ancilla)
|
||||
pub total_qubits: usize,
|
||||
}
|
||||
|
||||
impl SurfaceCodeLayout {
|
||||
/// Generate a distance-d rotated surface code layout.
|
||||
pub fn rotated(distance: usize) -> Self {
|
||||
let n_data = distance * distance;
|
||||
let n_x_stab = (distance * distance - 1) / 2;
|
||||
let n_z_stab = (distance * distance - 1) / 2;
|
||||
let total = n_data + n_x_stab + n_z_stab;
|
||||
|
||||
// Assign qubit indices:
|
||||
// 0..n_data: data qubits
|
||||
// n_data..n_data+n_x_stab: X-stabilizer ancillae
|
||||
// n_data+n_x_stab..total: Z-stabilizer ancillae
|
||||
|
||||
let data_qubits: Vec<usize> = (0..n_data).collect();
|
||||
|
||||
// Build stabilizer mappings based on rotated surface code geometry
|
||||
let (x_stabilizers, z_stabilizers) =
|
||||
build_rotated_stabilizers(distance, n_data);
|
||||
|
||||
Self {
|
||||
distance,
|
||||
data_qubits,
|
||||
x_stabilizers,
|
||||
z_stabilizers,
|
||||
total_qubits: total,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// One complete syndrome extraction cycle.
|
||||
///
|
||||
/// Returns the syndrome bitstring (one bit per stabilizer).
|
||||
pub fn extract_syndrome(
|
||||
state: &mut QuantumState,
|
||||
layout: &SurfaceCodeLayout,
|
||||
noise: &Option<NoiseModel>,
|
||||
rng: &mut impl Rng,
|
||||
) -> SyndromeBits {
|
||||
let mut syndrome = SyndromeBits::new(
|
||||
layout.x_stabilizers.len() + layout.z_stabilizers.len()
|
||||
);
|
||||
|
||||
// Step 1: Reset all ancilla qubits
|
||||
for &(ancilla, _) in layout.x_stabilizers.iter()
|
||||
.chain(layout.z_stabilizers.iter())
|
||||
{
|
||||
state.reset_qubit(ancilla);
|
||||
}
|
||||
|
||||
// Step 2: X-stabilizer circuits
|
||||
for (stab_idx, &(ancilla, ref data)) in layout.x_stabilizers.iter().enumerate() {
|
||||
// Hadamard on ancilla (transforms Z-basis CNOT to X-basis measurement)
|
||||
state.apply_hadamard(ancilla);
|
||||
if let Some(ref n) = noise {
|
||||
state.apply_noise(ancilla, n.single_qubit_error, n.noise_type, rng);
|
||||
}
|
||||
|
||||
// CNOT from each data qubit to ancilla
|
||||
for &d in data {
|
||||
state.apply_cnot(d, ancilla);
|
||||
if let Some(ref n) = noise {
|
||||
state.apply_noise(d, n.two_qubit_error, n.noise_type, rng);
|
||||
state.apply_noise(ancilla, n.two_qubit_error, n.noise_type, rng);
|
||||
}
|
||||
}
|
||||
|
||||
// Hadamard on ancilla
|
||||
state.apply_hadamard(ancilla);
|
||||
if let Some(ref n) = noise {
|
||||
state.apply_noise(ancilla, n.single_qubit_error, n.noise_type, rng);
|
||||
}
|
||||
|
||||
// Measure ancilla
|
||||
let result = state.measure_qubit(ancilla, rng);
|
||||
|
||||
// Apply measurement error
|
||||
let mut outcome = result.outcome;
|
||||
if let Some(ref n) = noise {
|
||||
if rng.gen::<f64>() < n.measurement_error {
|
||||
outcome ^= 1; // Flip the classical bit
|
||||
}
|
||||
}
|
||||
|
||||
syndrome.set(stab_idx, outcome);
|
||||
}
|
||||
|
||||
// Step 3: Z-stabilizer circuits
|
||||
let offset = layout.x_stabilizers.len();
|
||||
for (stab_idx, &(ancilla, ref data)) in layout.z_stabilizers.iter().enumerate() {
|
||||
// No Hadamard for Z-stabilizers
|
||||
|
||||
// CNOT from ancilla to each data qubit
|
||||
for &d in data {
|
||||
state.apply_cnot(ancilla, d);
|
||||
if let Some(ref n) = noise {
|
||||
state.apply_noise(d, n.two_qubit_error, n.noise_type, rng);
|
||||
state.apply_noise(ancilla, n.two_qubit_error, n.noise_type, rng);
|
||||
}
|
||||
}
|
||||
|
||||
// Measure ancilla
|
||||
let result = state.measure_qubit(ancilla, rng);
|
||||
|
||||
let mut outcome = result.outcome;
|
||||
if let Some(ref n) = noise {
|
||||
if rng.gen::<f64>() < n.measurement_error {
|
||||
outcome ^= 1;
|
||||
}
|
||||
}
|
||||
|
||||
syndrome.set(offset + stab_idx, outcome);
|
||||
}
|
||||
|
||||
// Step 4: Apply idle noise to data qubits
|
||||
if let Some(ref n) = noise {
|
||||
state.apply_idle_noise(&layout.data_qubits, n, rng);
|
||||
}
|
||||
|
||||
syndrome
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Decoder Integration
|
||||
|
||||
The syndrome bits feed into ruQu's existing decoder infrastructure:
|
||||
|
||||
```
|
||||
Decoder Pipeline:
|
||||
|
||||
Syndrome Bits ──> SyndromeFilter ──> MWPM Decoder ──> Correction ──> Apply to State
|
||||
│ │
|
||||
│ ┌─────▼─────┐
|
||||
│ │ ruvector- │
|
||||
│ │ mincut │
|
||||
└──────────────────────────────│ coherence │
|
||||
│ validation │
|
||||
└────────────┘
|
||||
```
|
||||
|
||||
```rust
|
||||
/// Decode syndrome and apply corrections.
|
||||
///
|
||||
/// This function bridges the quantum simulation (state vector) with
|
||||
/// ruQu's classical decoder infrastructure.
|
||||
pub fn decode_and_correct(
|
||||
state: &mut QuantumState,
|
||||
syndrome: &SyndromeBits,
|
||||
layout: &SurfaceCodeLayout,
|
||||
decoder: &mut MWPMDecoder,
|
||||
) -> DecoderResult {
|
||||
// Convert syndrome bits to DetectorBitmap (ruQu format)
|
||||
let mut bitmap = DetectorBitmap::new(syndrome.len());
|
||||
for i in 0..syndrome.len() {
|
||||
bitmap.set(i, syndrome.get(i) == 1);
|
||||
}
|
||||
|
||||
// Decode using MWPM
|
||||
let correction = decoder.decode(&bitmap);
|
||||
|
||||
// Apply X corrections to data qubits
|
||||
for &qubit in &correction.x_corrections {
|
||||
state.apply_x(qubit);
|
||||
}
|
||||
|
||||
// Apply Z corrections to data qubits
|
||||
for &qubit in &correction.z_corrections {
|
||||
state.apply_z(qubit);
|
||||
}
|
||||
|
||||
DecoderResult {
|
||||
correction,
|
||||
syndrome: bitmap,
|
||||
applied: true,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Integration with `ruvector-mincut` for coherence validation:
|
||||
|
||||
```rust
|
||||
/// Validate decoder correction using min-cut coherence analysis.
|
||||
///
|
||||
/// Uses ruQu's existing DynamicMinCutEngine to assess whether the
|
||||
/// post-correction state maintains structural coherence.
|
||||
pub fn validate_correction(
|
||||
syndrome: &SyndromeBits,
|
||||
correction: &Correction,
|
||||
mincut_engine: &mut DynamicMinCutEngine,
|
||||
) -> CoherenceAssessment {
|
||||
// Update min-cut graph edges based on syndrome pattern
|
||||
// High syndrome density in a region lowers edge weights (less coherent)
|
||||
// Correction success restores edge weights
|
||||
|
||||
let cut_value = mincut_engine.query_min_cut();
|
||||
|
||||
CoherenceAssessment {
|
||||
min_cut_value: cut_value.value,
|
||||
is_coherent: cut_value.value > COHERENCE_THRESHOLD,
|
||||
witness: cut_value.witness_hash,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6. Logical Error Tracking
|
||||
|
||||
To determine if a logical error has occurred, we compare the initial and final
|
||||
logical qubit states:
|
||||
|
||||
```rust
|
||||
/// Track logical errors across QEC cycles.
|
||||
///
|
||||
/// A logical error occurs when the cumulative effect of physical errors
|
||||
/// and decoder corrections results in a non-trivial logical operator
|
||||
/// being applied to the encoded qubit.
|
||||
pub struct LogicalErrorTracker {
|
||||
/// Accumulated X corrections on data qubits
|
||||
x_correction_parity: Vec<bool>,
|
||||
/// Accumulated Z corrections on data qubits
|
||||
z_correction_parity: Vec<bool>,
|
||||
/// Known physical X errors (for debugging/validation)
|
||||
x_error_parity: Vec<bool>,
|
||||
/// Known physical Z errors
|
||||
z_error_parity: Vec<bool>,
|
||||
/// Logical X operator support (which data qubits)
|
||||
logical_x_support: Vec<usize>,
|
||||
/// Logical Z operator support
|
||||
logical_z_support: Vec<usize>,
|
||||
}
|
||||
|
||||
impl LogicalErrorTracker {
|
||||
/// Check if a logical X error has occurred.
|
||||
///
|
||||
/// A logical X error occurs when the net X-type operator
|
||||
/// (errors + corrections) has odd overlap with the logical Z operator.
|
||||
pub fn has_logical_x_error(&self) -> bool {
|
||||
let mut parity = false;
|
||||
for &q in &self.logical_z_support {
|
||||
parity ^= self.x_error_parity[q] ^ self.x_correction_parity[q];
|
||||
}
|
||||
parity
|
||||
}
|
||||
|
||||
/// Check if a logical Z error has occurred.
|
||||
pub fn has_logical_z_error(&self) -> bool {
|
||||
let mut parity = false;
|
||||
for &q in &self.logical_x_support {
|
||||
parity ^= self.z_error_parity[q] ^ self.z_correction_parity[q];
|
||||
}
|
||||
parity
|
||||
}
|
||||
|
||||
/// Check if any logical error has occurred.
|
||||
pub fn has_logical_error(&self) -> bool {
|
||||
self.has_logical_x_error() || self.has_logical_z_error()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7. Full Surface Code Simulation Cycle
|
||||
|
||||
Putting it all together, the complete simulation loop:
|
||||
|
||||
```
|
||||
Full Surface Code QEC Cycle
|
||||
============================
|
||||
|
||||
Input: Code distance d, noise model, number of cycles T, decoder
|
||||
|
||||
Output: Logical error rate estimate
|
||||
|
||||
layout = SurfaceCodeLayout::rotated(d)
|
||||
state = QuantumState::new(layout.total_qubits)
|
||||
tracker = LogicalErrorTracker::new(layout)
|
||||
decoder = MWPMDecoder::new(d)
|
||||
mincut = DynamicMinCutEngine::new()
|
||||
|
||||
// Prepare initial logical |0> state
|
||||
prepare_logical_zero(&mut state, &layout)
|
||||
|
||||
for cycle in 0..T:
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ 1. INJECT NOISE │
|
||||
│ Apply depolarizing noise to all data qubits │
|
||||
│ (models decoherence during idle + gate errors) │
|
||||
│ tracker.record_errors(noise_locations) │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ 2. EXTRACT SYNDROME │
|
||||
│ Reset ancillae -> stabilizer circuits -> measure │
|
||||
│ Returns syndrome bitstring for this cycle │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ 3. DECODE │
|
||||
│ Feed syndrome to MWPM decoder │
|
||||
│ Decoder returns correction (X and Z Pauli ops) │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ 4. APPLY CORRECTION │
|
||||
│ Apply Pauli corrections to data qubits │
|
||||
│ tracker.record_corrections(corrections) │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ 5. VALIDATE COHERENCE (optional) │
|
||||
│ Run min-cut analysis on syndrome pattern │
|
||||
│ Flag if coherence drops below threshold │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
|
||||
// After T cycles, check for logical error
|
||||
logical_error = tracker.has_logical_error()
|
||||
```
|
||||
|
||||
**Pseudocode for the full simulation**:
|
||||
|
||||
```rust
|
||||
/// Run a complete surface code QEC simulation.
|
||||
///
|
||||
/// Returns the logical error rate estimated from `trials` independent runs,
|
||||
/// each consisting of `cycles` QEC rounds.
|
||||
pub fn simulate_surface_code(config: &SurfaceCodeConfig) -> SimulationResult {
|
||||
let layout = SurfaceCodeLayout::rotated(config.distance);
|
||||
let mut logical_errors = 0_u64;
|
||||
let mut total_cycles = 0_u64;
|
||||
|
||||
for trial in 0..config.trials {
|
||||
let mut state = QuantumState::new(layout.total_qubits);
|
||||
let mut tracker = LogicalErrorTracker::new(&layout);
|
||||
let mut decoder = MWPMDecoder::new(DecoderConfig {
|
||||
distance: config.distance,
|
||||
physical_error_rate: config.noise.idle_error,
|
||||
..Default::default()
|
||||
});
|
||||
let mut rng = StdRng::seed_from_u64(config.seed + trial);
|
||||
|
||||
// Prepare logical |0>
|
||||
prepare_logical_zero(&mut state, &layout);
|
||||
|
||||
for cycle in 0..config.cycles {
|
||||
// 1. Inject noise
|
||||
inject_data_noise(&mut state, &layout, &config.noise, &mut rng);
|
||||
|
||||
// 2. Extract syndrome
|
||||
let syndrome = extract_syndrome(
|
||||
&mut state, &layout, &Some(config.noise.clone()), &mut rng
|
||||
);
|
||||
|
||||
// 3. Decode
|
||||
let correction = decoder.decode_syndrome(&syndrome);
|
||||
|
||||
// 4. Apply correction
|
||||
apply_correction(&mut state, &correction);
|
||||
tracker.record_correction(&correction);
|
||||
|
||||
total_cycles += 1;
|
||||
}
|
||||
|
||||
// Check for logical error
|
||||
if tracker.has_logical_error() {
|
||||
logical_errors += 1;
|
||||
}
|
||||
}
|
||||
|
||||
let logical_error_rate = logical_errors as f64 / config.trials as f64;
|
||||
let error_per_cycle = 1.0 - (1.0 - logical_error_rate)
|
||||
.powf(1.0 / config.cycles as f64);
|
||||
|
||||
SimulationResult {
|
||||
logical_error_rate,
|
||||
logical_error_per_cycle: error_per_cycle,
|
||||
total_trials: config.trials,
|
||||
total_cycles,
|
||||
logical_errors,
|
||||
distance: config.distance,
|
||||
physical_error_rate: config.noise.idle_error,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8. Performance Estimates
|
||||
|
||||
#### Distance-3 Surface Code
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Data qubits | 9 |
|
||||
| Ancilla qubits | 8 |
|
||||
| Total qubits | 17 |
|
||||
| State vector entries | 2^17 = 131,072 |
|
||||
| State vector memory | 2 MB |
|
||||
| CNOTs per cycle | ~16 (4 per stabilizer, 4 stabilizers active) |
|
||||
| Measurements per cycle | 8 |
|
||||
| Resets per cycle | 8 |
|
||||
| **Time per cycle** | **~0.5ms** |
|
||||
| **1000 cycles** | **~0.5s** |
|
||||
|
||||
#### Distance-5 Surface Code
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Data qubits | 25 |
|
||||
| Ancilla qubits | 24 |
|
||||
| Total qubits | 49 |
|
||||
| State vector entries | 2^49 ~ 5.6 * 10^14 |
|
||||
| State vector memory | **4 PB** (infeasible for full state vector) |
|
||||
|
||||
This highlights the fundamental scaling challenge: full state vector simulation of
|
||||
distance-5 surface codes requires stabilizer simulation or tensor network methods,
|
||||
not direct state vector evolution. However, for the critical distance-3 case, state
|
||||
vector simulation is fast and provides ground truth.
|
||||
|
||||
**Practical simulation envelope**:
|
||||
|
||||
| Distance | Qubits | State Vector | Feasible? | Cycles/sec |
|
||||
|----------|--------|-------------|-----------|------------|
|
||||
| 2 (toy) | 7 | 128 entries | Yes | ~50,000 |
|
||||
| 3 | 17 | 131K entries | Yes | ~2,000 |
|
||||
| 3 (with noise) | 17 | 131K entries | Yes | ~1,000 |
|
||||
| 4 | 31 | 2B entries | Marginal (16 GB) | ~0.1 |
|
||||
| 5+ | 49+ | >10^14 | No (state vector) | -- |
|
||||
|
||||
For distance 5 and above, the implementation should fall back to **stabilizer
|
||||
simulation** (Gottesman-Knill theorem: Clifford circuits on stabilizer states can be
|
||||
simulated in polynomial time). Since surface code circuits consist entirely of Clifford
|
||||
gates (H, CNOT, S) with Pauli noise, this is a natural fit.
|
||||
|
||||
### 9. Integration with Existing ruQu Pipeline
|
||||
|
||||
The surface code simulation integrates with the full ruQu stack:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ ruQu QEC Simulation Stack │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────────────────┐ │
|
||||
│ │ State │ │ Syndrome │ │ Decoder Pipeline │ │
|
||||
│ │ Vector │ │ Processing │ │ │ │
|
||||
│ │ Engine │──│ (syndrome.rs)│──│ SyndromeFilter │ │
|
||||
│ │ (new) │ │ │ │ ├── StructuralFilter │ │
|
||||
│ │ │ │ DetectorBitmap │ │ ├── ShiftFilter │ │
|
||||
│ │ measure() │ │ SyndromeBuffer │ │ ├── EvidenceFilter │ │
|
||||
│ │ reset() │ │ SyndromeDelta │ │ └── MWPM Decoder │ │
|
||||
│ │ noise() │ │ │ │ (decoder.rs) │ │
|
||||
│ └─────────────┘ └──────────────┘ └───────────────────────────┘ │
|
||||
│ │ │ │
|
||||
│ │ ┌─────────────────────────┘ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌──────────────────────────┐ ┌────────────────────────────────┐ │
|
||||
│ │ Correction Application │ │ Coherence Validation │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ apply_x(qubit) │ │ DynamicMinCutEngine │ │
|
||||
│ │ apply_z(qubit) │ │ (mincut.rs) │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ Logical Error Tracker │ │ El-Hayek/Henzinger/Li │ │
|
||||
│ └──────────────────────────┘ │ O(n^{o(1)}) min-cut │ │
|
||||
│ └────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌───────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Tile Architecture (fabric.rs, tile.rs) │ │
|
||||
│ │ │ │
|
||||
│ │ TileZero (coordinator) + 255 WorkerTiles │ │
|
||||
│ │ Can parallelize across stabilizer groups for large codes │ │
|
||||
│ └───────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Key integration points:
|
||||
|
||||
1. **Syndrome bits** from `measure_qubit()` are converted to `DetectorBitmap` format
|
||||
for compatibility with ruQu's existing syndrome processing pipeline
|
||||
2. **MWPM decoder** from `decoder.rs` (backed by fusion-blossom) receives syndromes
|
||||
and returns corrections
|
||||
3. **Min-cut coherence** from `mincut.rs` validates post-correction state quality
|
||||
4. **Tile architecture** from `fabric.rs` can distribute stabilizer measurements across
|
||||
tiles for parallel processing of large codes
|
||||
5. **Stim integration** from `stim.rs` provides reference syndrome distributions for
|
||||
decoder benchmarking
|
||||
|
||||
### 10. Error Rate Estimation
|
||||
|
||||
To estimate the error threshold, we run simulations at multiple physical error rates
|
||||
and code distances:
|
||||
|
||||
```rust
|
||||
/// Estimate the error threshold by scanning physical error rates.
|
||||
///
|
||||
/// The threshold is the physical error rate p* at which logical error rate
|
||||
/// is independent of code distance. Below p*, larger codes are better.
|
||||
/// Above p*, larger codes are worse.
|
||||
pub fn estimate_threshold(
|
||||
distances: &[usize],
|
||||
error_rates: &[f64],
|
||||
cycles_per_trial: usize,
|
||||
trials: usize,
|
||||
) -> ThresholdResult {
|
||||
let mut results = Vec::new();
|
||||
|
||||
for &d in distances {
|
||||
for &p in error_rates {
|
||||
let config = SurfaceCodeConfig {
|
||||
distance: d,
|
||||
noise: NoiseModel {
|
||||
idle_error: p,
|
||||
single_qubit_error: p / 10.0,
|
||||
two_qubit_error: p,
|
||||
measurement_error: p,
|
||||
noise_type: NoiseType::Depolarizing,
|
||||
},
|
||||
cycles: cycles_per_trial,
|
||||
trials: trials as u64,
|
||||
seed: 42,
|
||||
};
|
||||
|
||||
let sim_result = simulate_surface_code(&config);
|
||||
results.push((d, p, sim_result.logical_error_per_cycle));
|
||||
}
|
||||
}
|
||||
|
||||
// Find crossing point of d=3 and d=5 curves
|
||||
find_threshold_crossing(&results)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **Full quantum state simulation** provides ground truth for decoder validation that
|
||||
stabilizer simulation alone cannot (e.g., non-Clifford noise, leakage states)
|
||||
2. **Seamless integration** with ruQu's existing syndrome processing, MWPM decoder,
|
||||
and min-cut coherence infrastructure minimizes new code and leverages battle-tested
|
||||
components
|
||||
3. **Mid-circuit measurement** and qubit reset enable accurate simulation of the actual
|
||||
hardware QEC cycle, not just the error model
|
||||
4. **Noise model flexibility** (bit-flip, phase-flip, depolarizing, independent) covers
|
||||
the standard noise models used in QEC research
|
||||
5. **Logical error tracking** provides direct measurement of the quantity of interest
|
||||
(logical error rate) without post-hoc analysis
|
||||
6. **Integration with min-cut coherence** validates that decoder corrections maintain
|
||||
structural coherence, bridging ruQu's unique coherence-gating approach with standard
|
||||
QEC metrics
|
||||
|
||||
### Risks
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|------------|--------|------------|
|
||||
| State vector memory limits simulation to d <= 3 | High | High | Stabilizer simulation fallback for d >= 5 |
|
||||
| Mid-circuit measurement breaks SIMD optimization | Medium | Medium | Separate hot/cold paths, measurement is infrequent |
|
||||
| Noise model too simplistic for real hardware | Medium | Medium | Support custom noise channels, correlated errors |
|
||||
| Decoder latency dominates simulation time | Low | Medium | Use streaming decoder, pre-built matching graphs |
|
||||
| Logical error tracking complexity for higher distance | Low | Low | Automate logical operator computation from layout |
|
||||
|
||||
### Trade-offs
|
||||
|
||||
| Decision | Advantage | Disadvantage |
|
||||
|----------|-----------|--------------|
|
||||
| State vector over stabilizer simulation | Handles arbitrary noise and non-Clifford ops | Exponential memory, limited to d <= 3-4 |
|
||||
| Stochastic Pauli insertion for noise | Simple, exact for Pauli channels | Approximate for non-Pauli noise |
|
||||
| Sequential ancilla measurement | Correct correlated outcomes | Cannot parallelize measurement step |
|
||||
| Integration with existing ruQu decoder | Reuses battle-tested code | Decoder API may not perfectly match simulation needs |
|
||||
| Coherent reset (amplitude transfer) | Preserves entanglement structure | More complex than incoherent reset |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- Fowler, A.G. et al. "Surface codes: Towards practical large-scale quantum computation." Physical Review A 86, 032324 (2012)
|
||||
- Dennis, E. et al. "Topological quantum memory." Journal of Mathematical Physics 43, 4452-4505 (2002)
|
||||
- Google Quantum AI. "Suppressing quantum errors by scaling a surface code logical qubit." Nature 614, 676-681 (2023)
|
||||
- Higgott, O. "PyMatching: A Python package for decoding quantum codes with minimum-weight perfect matching." ACM Transactions on Quantum Computing 3, 1-16 (2022)
|
||||
- Wu, Y. & Lin, H.H. "Hypergraph Decomposition and Secret Sharing." Discrete Applied Mathematics (2024)
|
||||
- ADR-001: ruQu Architecture - Classical Nervous System for Quantum Machines
|
||||
- ADR-QE-005: VQE Algorithm Support (quantum state manipulation, expectation values)
|
||||
- ADR-QE-006: Grover's Search (state vector operations, measurement)
|
||||
- ruQu syndrome module: `crates/ruQu/src/syndrome.rs` - DetectorBitmap, SyndromeBuffer
|
||||
- ruQu decoder module: `crates/ruQu/src/decoder.rs` - MWPMDecoder, fusion-blossom
|
||||
- ruQu mincut module: `crates/ruQu/src/mincut.rs` - DynamicMinCutEngine
|
||||
- ruQu filters module: `crates/ruQu/src/filters.rs` - Three-filter coherence pipeline
|
||||
- ruvector-mincut crate: `crates/ruvector-mincut/` - El-Hayek/Henzinger/Li algorithm
|
||||
480
docs/adr/quantum-engine/ADR-QE-009-tensor-network-evaluation.md
Normal file
480
docs/adr/quantum-engine/ADR-QE-009-tensor-network-evaluation.md
Normal file
|
|
@ -0,0 +1,480 @@
|
|||
# ADR-QE-009: Tensor Network Evaluation Mode
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
Full state-vector simulation stores all 2^n complex amplitudes explicitly, yielding
|
||||
O(2^n) memory and O(G * 2^n) time for G gates. At n=30 this is 16 GiB; at n=40 it
|
||||
exceeds 16 TiB. Many practically interesting circuits, however, contain limited
|
||||
entanglement:
|
||||
|
||||
| Circuit family | Entanglement structure | Treewidth |
|
||||
|---|---|---|
|
||||
| Shallow QAOA on sparse graphs | Bounded by graph degree | Low (often < 20) |
|
||||
| Separate-register circuits | Disjoint qubit subsets | Sum of sub-widths |
|
||||
| Near-Clifford circuits | Stabilizer + few T gates | Depends on T count |
|
||||
| 1D brickwork (finite depth) | Area-law entanglement | O(depth) |
|
||||
| Random deep circuits (all-to-all) | Volume-law entanglement | O(n) -- no gain |
|
||||
|
||||
For the first four families, tensor network (TN) methods can trade increased
|
||||
computation for drastically reduced memory by representing each gate as a tensor and
|
||||
contracting the resulting network in an optimized order. The contraction cost scales
|
||||
exponentially in the *treewidth* of the circuit's line graph rather than in the total
|
||||
qubit count.
|
||||
|
||||
QuantRS2 (the Rust quantum simulation reference) demonstrated tensor network
|
||||
contraction for circuits up to 60 qubits on commodity hardware when treewidth
|
||||
remained below ~25. ruVector's existing `ruvector-mincut` crate already solves graph
|
||||
partitioning problems that are structurally identical to contraction-order
|
||||
optimization, providing a natural integration point.
|
||||
|
||||
The ruQu engine needs this capability to support:
|
||||
|
||||
1. Surface code simulations at distance d >= 7 (49+ data qubits) for decoder
|
||||
validation, where the syndrome extraction circuit is shallow and geometrically
|
||||
local.
|
||||
2. Variational algorithm prototyping (VQE, QAOA) on graphs larger than 30 nodes.
|
||||
3. Hybrid workflows where part of the circuit is simulated via state vector and part
|
||||
via tensor contraction.
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Feature-Gated Backend
|
||||
|
||||
Tensor network evaluation is implemented as an optional backend behind the
|
||||
`tensor-network` feature flag in `ruqu-core`:
|
||||
|
||||
```toml
|
||||
# ruqu-core/Cargo.toml
|
||||
[features]
|
||||
default = ["state-vector"]
|
||||
state-vector = []
|
||||
tensor-network = ["dep:ndarray", "dep:petgraph"]
|
||||
all-backends = ["state-vector", "tensor-network"]
|
||||
```
|
||||
|
||||
When both backends are compiled in, the engine selects the backend at runtime based
|
||||
on circuit analysis (see Section 4 below).
|
||||
|
||||
### 2. Tensor Representation
|
||||
|
||||
Every gate becomes a tensor connecting the qubit wire indices it acts on:
|
||||
|
||||
| Gate type | Tensor rank | Shape | Example |
|
||||
|---|---|---|---|
|
||||
| Single-qubit (H, X, Rz, ...) | 2 | [2, 2] | Input wire -> output wire |
|
||||
| Two-qubit (CNOT, CZ, ...) | 4 | [2, 2, 2, 2] | Two input wires -> two output wires |
|
||||
| Three-qubit (Toffoli) | 6 | [2, 2, 2, 2, 2, 2] | Three input -> three output |
|
||||
| Measurement projector | 2 | [2, 2] | Diagonal in computational basis |
|
||||
| Initial state |0> | 1 | [2] | Single output wire |
|
||||
|
||||
The circuit is converted into a tensor network graph where:
|
||||
- Each tensor is a node.
|
||||
- Each shared index (qubit wire between consecutive gates) is an edge.
|
||||
- Open indices represent initial states and final measurement outcomes.
|
||||
|
||||
```
|
||||
|0>---[H]---[CNOT_ctrl]---[Rz]---<meas>
|
||||
|
|
||||
|0>-----------[CNOT_tgt]---------<meas>
|
||||
```
|
||||
|
||||
Becomes:
|
||||
|
||||
```
|
||||
Node: init_0 (rank 1)
|
||||
|
|
||||
Node: H_0 (rank 2)
|
||||
|
|
||||
Node: CNOT_01 (rank 4)
|
||||
/ \
|
||||
| Node: Rz_0 (rank 2)
|
||||
| |
|
||||
| Node: meas_0 (rank 2)
|
||||
|
|
||||
Node: init_1 (rank 1)
|
||||
... (connected via CNOT shared index)
|
||||
Node: meas_1 (rank 2)
|
||||
```
|
||||
|
||||
### 3. Contraction Strategy
|
||||
|
||||
Contraction order determines whether the computation is tractable. The cost of
|
||||
contracting two tensors is the product of the dimensions of all indices involved.
|
||||
Finding the optimal contraction order is NP-hard (equivalent to finding minimum
|
||||
treewidth), so we use heuristics.
|
||||
|
||||
#### Contraction Path Optimization Pseudocode
|
||||
|
||||
```
|
||||
function find_contraction_path(tensor_network: TN) -> ContractionPath:
|
||||
// Phase 1: Simplify the network
|
||||
apply_trivial_contractions(tensor_network) // rank-1 tensors, diagonal pairs
|
||||
|
||||
// Phase 2: Detect community structure
|
||||
communities = detect_communities(tensor_network.graph)
|
||||
|
||||
// Phase 3: Contract within communities first (small subproblems)
|
||||
intra_paths = []
|
||||
for community in communities:
|
||||
subgraph = tensor_network.subgraph(community)
|
||||
if subgraph.num_tensors <= 20:
|
||||
// Exact dynamic programming for small subgraphs
|
||||
path = optimal_einsum_dp(subgraph)
|
||||
else:
|
||||
// Greedy with lookahead for larger subgraphs
|
||||
path = greedy_with_lookahead(subgraph, lookahead=2)
|
||||
intra_paths.append(path)
|
||||
|
||||
// Phase 4: Contract inter-community edges
|
||||
// Each community is now a single large tensor
|
||||
meta_graph = contract_communities(tensor_network, intra_paths)
|
||||
inter_path = greedy_with_lookahead(meta_graph, lookahead=3)
|
||||
|
||||
// Phase 5: Compose the full path
|
||||
return compose_paths(intra_paths, inter_path)
|
||||
|
||||
|
||||
function greedy_with_lookahead(tn: TN, lookahead: int) -> Path:
|
||||
path = []
|
||||
remaining = tn.clone()
|
||||
|
||||
while remaining.num_tensors > 1:
|
||||
best_cost = INFINITY
|
||||
best_pair = None
|
||||
|
||||
// Evaluate all candidate contractions
|
||||
for (i, j) in remaining.candidate_pairs():
|
||||
cost = contraction_cost(remaining, i, j)
|
||||
|
||||
// Lookahead: estimate cost of subsequent contractions
|
||||
if lookahead > 0:
|
||||
simulated = remaining.simulate_contraction(i, j)
|
||||
future_cost = estimate_future_cost(simulated, lookahead - 1)
|
||||
cost += future_cost * DISCOUNT_FACTOR
|
||||
|
||||
if cost < best_cost:
|
||||
best_cost = cost
|
||||
best_pair = (i, j)
|
||||
|
||||
path.append(best_pair)
|
||||
remaining.contract(best_pair)
|
||||
|
||||
return path
|
||||
```
|
||||
|
||||
#### Community Detection via ruvector-mincut
|
||||
|
||||
The `ruvector-mincut` crate provides graph partitioning that is directly applicable
|
||||
to contraction ordering:
|
||||
|
||||
```rust
|
||||
use ruvector_mincut::{partition, PartitionConfig};
|
||||
|
||||
fn partition_tensor_network(tn: &TensorNetwork) -> Vec<Vec<TensorId>> {
|
||||
let graph = tn.to_adjacency_graph();
|
||||
let config = PartitionConfig {
|
||||
num_partitions: estimate_optimal_partitions(tn),
|
||||
balance_factor: 1.1, // Allow 10% imbalance
|
||||
minimize: Objective::EdgeCut, // Minimize inter-partition wires
|
||||
};
|
||||
partition(&graph, &config)
|
||||
}
|
||||
```
|
||||
|
||||
The edge cut directly corresponds to the bond dimension of the inter-community
|
||||
contraction, so minimizing edge cut minimizes the most expensive contraction step.
|
||||
|
||||
### 4. MPS (Matrix Product State) Mode
|
||||
|
||||
For circuits with 1D-like connectivity (nearest-neighbor gates on a line), a Matrix
|
||||
Product State representation is more efficient than general tensor contraction.
|
||||
|
||||
```
|
||||
A[1] -- A[2] -- A[3] -- ... -- A[n]
|
||||
| | | |
|
||||
phys_1 phys_2 phys_3 phys_n
|
||||
```
|
||||
|
||||
Each site tensor A[i] has shape `[bond_left, physical, bond_right]` where:
|
||||
- `physical` = 2 (qubit dimension)
|
||||
- `bond_left`, `bond_right` = bond dimension chi
|
||||
|
||||
| Bond dimension (chi) | Memory per site | Total memory (n qubits) | Approximation |
|
||||
|---|---|---|---|
|
||||
| 1 | 16 bytes | 16n bytes | Product state only |
|
||||
| 16 | 4 KiB | 4n KiB | Low entanglement |
|
||||
| 64 | 64 KiB | 64n KiB | Moderate entanglement |
|
||||
| 256 | 1 MiB | n MiB | High entanglement |
|
||||
| 1024 | 16 MiB | 16n MiB | Near exact for many circuits |
|
||||
|
||||
**Truncation policy**: After each two-qubit gate, perform SVD on the updated bond.
|
||||
If the bond dimension exceeds `chi_max`, truncate the smallest singular values.
|
||||
Track the total discarded weight (sum of squared discarded singular values) as a
|
||||
fidelity estimate:
|
||||
|
||||
```rust
|
||||
pub struct MpsConfig {
|
||||
/// Maximum bond dimension. Truncation occurs above this.
|
||||
pub chi_max: usize,
|
||||
/// Minimum singular value to retain (relative to largest).
|
||||
pub svd_cutoff: f64,
|
||||
/// Accumulated truncation error (updated during simulation).
|
||||
pub fidelity_estimate: f64,
|
||||
}
|
||||
|
||||
impl Default for MpsConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
chi_max: 256,
|
||||
svd_cutoff: 1e-12,
|
||||
fidelity_estimate: 1.0,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Automatic Mode Selection
|
||||
|
||||
The engine analyzes the circuit before execution to recommend a backend:
|
||||
|
||||
```rust
|
||||
pub enum RecommendedBackend {
|
||||
StateVector { reason: &'static str },
|
||||
TensorNetwork { estimated_treewidth: usize, reason: &'static str },
|
||||
Mps { estimated_max_bond: usize, reason: &'static str },
|
||||
}
|
||||
|
||||
pub fn recommend_backend(circuit: &QuantumCircuit) -> RecommendedBackend {
|
||||
let n = circuit.num_qubits();
|
||||
let depth = circuit.depth();
|
||||
let connectivity = circuit.connectivity_graph();
|
||||
|
||||
// Rule 1: Small circuits always use state vector
|
||||
if n <= 20 {
|
||||
return RecommendedBackend::StateVector {
|
||||
reason: "Small circuit; state vector is fastest below 20 qubits",
|
||||
};
|
||||
}
|
||||
|
||||
// Rule 2: Check for 1D connectivity (MPS candidate)
|
||||
if connectivity.max_degree() <= 2 && connectivity.is_path_graph() {
|
||||
let estimated_bond = 2_usize.pow(depth.min(20) as u32);
|
||||
return RecommendedBackend::Mps {
|
||||
estimated_max_bond: estimated_bond,
|
||||
reason: "1D nearest-neighbor connectivity detected",
|
||||
};
|
||||
}
|
||||
|
||||
// Rule 3: Estimate treewidth for general TN
|
||||
let estimated_tw = estimate_treewidth(&connectivity, depth);
|
||||
if estimated_tw < 25 && n > 25 {
|
||||
return RecommendedBackend::TensorNetwork {
|
||||
estimated_treewidth: estimated_tw,
|
||||
reason: "Low treewidth relative to qubit count",
|
||||
};
|
||||
}
|
||||
|
||||
// Rule 4: Check memory feasibility for state vector
|
||||
let sv_memory = 16 * (1_usize << n); // bytes
|
||||
let available = estimate_available_memory();
|
||||
if sv_memory > available {
|
||||
// Force TN even if treewidth is high -- at least it has a chance
|
||||
return RecommendedBackend::TensorNetwork {
|
||||
estimated_treewidth: estimated_tw,
|
||||
reason: "State vector exceeds available memory; TN is only option",
|
||||
};
|
||||
}
|
||||
|
||||
RecommendedBackend::StateVector {
|
||||
reason: "High treewidth circuit; state vector is more efficient",
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6. When Tensor Networks Win vs Lose
|
||||
|
||||
**Tensor networks win when:**
|
||||
|
||||
| Scenario | Why TN wins | Example |
|
||||
|---|---|---|
|
||||
| Shallow circuits on many qubits | Treewidth ~ depth, not n | 50-qubit depth-4 QAOA |
|
||||
| Sparse graph connectivity | Low treewidth from graph structure | MaxCut on 3-regular graph |
|
||||
| Separate registers | Independent contractions | n/2 Bell pairs |
|
||||
| Near-Clifford | Stabilizer + few non-Clifford gates | Clifford + 5 T gates |
|
||||
| Amplitude computation | Contract to single output, not full state | Sampling one bitstring |
|
||||
|
||||
**Tensor networks lose when:**
|
||||
|
||||
| Scenario | Why TN loses | Fallback |
|
||||
|---|---|---|
|
||||
| Deep random circuits | Treewidth ~ n | State vector (if n <= 30) |
|
||||
| All-to-all connectivity | No structure to exploit | State vector |
|
||||
| Full state tomography needed | Must contract once per amplitude | State vector |
|
||||
| Very small circuits (n < 20) | Overhead exceeds state vector | State vector |
|
||||
| High-fidelity MPS needed | Bond dimension grows exponentially | State vector or exact TN |
|
||||
|
||||
### 7. Example: 50-Qubit Shallow QAOA
|
||||
|
||||
Consider QAOA depth p=1 on a 50-node 3-regular graph:
|
||||
|
||||
```
|
||||
Circuit structure:
|
||||
- 50 qubits, initialized to |+>
|
||||
- 75 ZZ gates (one per edge), parameterized by gamma
|
||||
- 50 Rx gates, parameterized by beta
|
||||
- Total: 125 + 50 = 175 gates
|
||||
- Circuit depth: 4 (H layer, ZZ layer (3-colorable), Rx layer, measure)
|
||||
|
||||
Graph treewidth of 3-regular graph: typically 8-15
|
||||
|
||||
Tensor network contraction:
|
||||
- Community detection finds ~5-8 communities of 6-10 nodes
|
||||
- Intra-community contraction: O(2^10) ~ 1024 per community
|
||||
- Inter-community bonds: ~15 edges cut
|
||||
- Effective contraction complexity: O(2^15) = 32768
|
||||
- Compare to state vector: O(2^50) = 1.1 * 10^15
|
||||
|
||||
Memory comparison:
|
||||
- State vector: 2^50 * 16 bytes = 16 PiB (impossible)
|
||||
- Tensor network: ~100 MiB working memory
|
||||
- Speedup factor: practically infinite (feasible vs infeasible)
|
||||
```
|
||||
|
||||
```
|
||||
Contraction Diagram (simplified):
|
||||
|
||||
Community A Community B Community C
|
||||
[q0-q9] [q10-q19] [q20-q29]
|
||||
| | |
|
||||
+--- bond=2^3 ----+---- bond=2^4 -----+
|
||||
|
|
||||
Community D Community E
|
||||
[q30-q39] [q40-q49]
|
||||
| |
|
||||
+--- bond=2^3 ----+
|
||||
|
||||
Peak intermediate tensor: 2^15 elements = 512 KiB
|
||||
```
|
||||
|
||||
### 8. Integration with State Vector Backend
|
||||
|
||||
Both backends implement the same trait:
|
||||
|
||||
```rust
|
||||
pub trait SimulationBackend {
|
||||
/// Execute the circuit and return measurement results.
|
||||
fn execute(
|
||||
&self,
|
||||
circuit: &QuantumCircuit,
|
||||
shots: usize,
|
||||
config: &SimulationConfig,
|
||||
) -> Result<SimulationResult, SimulationError>;
|
||||
|
||||
/// Compute expectation value of an observable.
|
||||
fn expectation_value(
|
||||
&self,
|
||||
circuit: &QuantumCircuit,
|
||||
observable: &Observable,
|
||||
config: &SimulationConfig,
|
||||
) -> Result<f64, SimulationError>;
|
||||
|
||||
/// Return the backend name for logging.
|
||||
fn name(&self) -> &'static str;
|
||||
}
|
||||
```
|
||||
|
||||
Users interact through `QuantumCircuit` and never need to know which backend is
|
||||
active:
|
||||
|
||||
```rust
|
||||
let circuit = QuantumCircuit::new(50)
|
||||
.h_all()
|
||||
.append_qaoa_layer(graph, gamma, beta)
|
||||
.measure_all();
|
||||
|
||||
// Automatic backend selection
|
||||
let result = ruqu::execute(&circuit, 1000)?;
|
||||
// -> Internally selects TensorNetwork backend due to n=50, low treewidth
|
||||
|
||||
// Or explicit backend override
|
||||
let result = ruqu::execute_with_backend(
|
||||
&circuit,
|
||||
1000,
|
||||
Backend::TensorNetwork(TnConfig::default()),
|
||||
)?;
|
||||
```
|
||||
|
||||
### 9. Future: ruvector-mincut Integration for Contraction Ordering
|
||||
|
||||
The `ruvector-mincut` crate currently solves balanced graph partitioning for vector
|
||||
index sharding. The same algorithm directly applies to tensor network contraction
|
||||
ordering via the following correspondence:
|
||||
|
||||
| Graph partitioning concept | TN contraction concept |
|
||||
|---|---|
|
||||
| Vertex | Tensor |
|
||||
| Edge weight | Bond dimension (log2) |
|
||||
| Partition | Contraction subtree |
|
||||
| Edge cut | Inter-partition bond cost |
|
||||
| Balanced partition | Balanced contraction tree |
|
||||
|
||||
Phase 1 (this ADR): Use `ruvector-mincut` for community detection in contraction
|
||||
path optimization.
|
||||
|
||||
Phase 2 (future): Extend `ruvector-mincut` with hypergraph partitioning for
|
||||
multi-index tensor contractions, enabling handling of higher-order tensor networks
|
||||
(e.g., PEPS for 2D circuits).
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Dramatically expanded qubit range**: Shallow circuits on 40-60 qubits become
|
||||
tractable on commodity hardware.
|
||||
2. **Surface code simulation**: Distance-7 surface codes (49 data + 48 ancilla = 97
|
||||
qubits) can be simulated for decoder validation using MPS (the circuit is
|
||||
geometrically local).
|
||||
3. **Unified interface**: Users write circuits once; backend selection is automatic.
|
||||
4. **Synergy with ruvector-mincut**: Leverages existing graph partitioning
|
||||
investment.
|
||||
5. **Complementary to state vector**: Each backend covers the other's weakness.
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Implementation complexity**: Tensor contraction, SVD truncation, and path
|
||||
optimization are non-trivial to implement correctly and efficiently.
|
||||
2. **Approximation risk**: MPS truncation introduces controlled but nonzero error.
|
||||
Users must understand fidelity estimates.
|
||||
3. **Compilation time**: The `ndarray` and `petgraph` dependencies add to compile
|
||||
time when the feature is enabled.
|
||||
4. **Testing surface**: Two backends doubles the testing matrix for correctness
|
||||
validation.
|
||||
5. **Performance unpredictability**: Contraction cost depends on circuit structure
|
||||
in ways that are hard to predict without running the path optimizer.
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Path optimizer finds poor ordering | Medium | High cost | Multiple heuristics + timeout fallback to greedy |
|
||||
| MPS fidelity silently degrades | Medium | Incorrect results | Track discarded weight; warn if fidelity < 0.99 |
|
||||
| Feature interaction bugs | Low | Incorrect results | Shared test suite: both backends must agree on small circuits |
|
||||
| Memory spike during contraction | Medium | OOM | Pre-estimate peak intermediate tensor size; abort if too large |
|
||||
|
||||
## References
|
||||
|
||||
- QuantRS2 tensor network implementation: internal reference
|
||||
- Markov & Shi, "Simulating Quantum Computation by Contracting Tensor Networks" (2008)
|
||||
- Gray & Kourtis, "Hyper-optimized tensor network contraction" (2021) -- cotengra
|
||||
- Schollwock, "The density-matrix renormalization group in the age of matrix product states" (2011)
|
||||
- ADR-QE-001: Core Engine Architecture (state vector backend)
|
||||
- ADR-QE-005: WASM Compilation Target
|
||||
- `ruvector-mincut` crate documentation
|
||||
- ADR-014: Coherence Engine (graph partitioning reuse)
|
||||
689
docs/adr/quantum-engine/ADR-QE-010-observability-monitoring.md
Normal file
689
docs/adr/quantum-engine/ADR-QE-010-observability-monitoring.md
Normal file
|
|
@ -0,0 +1,689 @@
|
|||
# ADR-QE-010: Observability & Monitoring Integration
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
ruVector provides comprehensive observability through the `ruvector-metrics` crate,
|
||||
which aggregates telemetry from all subsystems into a unified monitoring dashboard.
|
||||
The quantum simulation engine is a new subsystem that must participate in this
|
||||
observability infrastructure.
|
||||
|
||||
Effective monitoring of quantum simulation is essential for:
|
||||
|
||||
1. **Performance tuning**: Identifying bottlenecks in gate application, memory
|
||||
allocation, and parallelization efficiency.
|
||||
2. **Resource management**: Tracking memory consumption to prevent OOM conditions
|
||||
and to inform auto-scaling decisions.
|
||||
3. **Debugging**: Tracing the execution of specific circuits to diagnose incorrect
|
||||
results or unexpected behavior.
|
||||
4. **Capacity planning**: Understanding workload patterns (qubit counts, circuit
|
||||
depths, simulation frequency) to plan infrastructure.
|
||||
5. **Compliance**: Auditable logs of simulation executions for regulated
|
||||
environments (cryptographic validation, safety-critical applications).
|
||||
|
||||
### WASM Constraint
|
||||
|
||||
In WebAssembly deployment, there is no direct filesystem access and no native
|
||||
networking. Observability in WASM must use browser-compatible mechanisms:
|
||||
`console.log`, `console.warn`, `console.error`, or JavaScript callback functions
|
||||
registered by the host application.
|
||||
|
||||
### Existing Infrastructure
|
||||
|
||||
| Component | Role | Integration Point |
|
||||
|---|---|---|
|
||||
| `ruvector-metrics` | Metrics aggregation and export | Trait-based sink |
|
||||
| `ruvector-monitor` | Real-time dashboard UI | WebSocket feed |
|
||||
| Rust `tracing` crate | Structured logging and spans | Subscriber-based |
|
||||
| Prometheus / OpenTelemetry | External monitoring | Exporter plugins |
|
||||
| Ed25519 audit trail | Cryptographic logging | `ruqu-audit` crate |
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Metrics Schema
|
||||
|
||||
Every simulation execution emits a structured metrics record. The schema is
|
||||
versioned to allow evolution without breaking consumers.
|
||||
|
||||
```rust
|
||||
/// Metrics emitted after each quantum simulation execution.
|
||||
/// Schema version: 1.0.0
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct SimulationMetrics {
|
||||
/// Schema version for forward compatibility.
|
||||
pub schema_version: &'static str,
|
||||
|
||||
/// Unique identifier for this simulation run.
|
||||
pub simulation_id: Uuid,
|
||||
|
||||
/// Timestamp when simulation started (UTC).
|
||||
pub started_at: DateTime<Utc>,
|
||||
|
||||
/// Timestamp when simulation completed (UTC).
|
||||
pub completed_at: DateTime<Utc>,
|
||||
|
||||
// -- Circuit characteristics --
|
||||
|
||||
/// Number of qubits in the circuit.
|
||||
pub qubit_count: u32,
|
||||
|
||||
/// Total number of gates (before optimization).
|
||||
pub gate_count_raw: u64,
|
||||
|
||||
/// Total number of gates (after optimization/fusion).
|
||||
pub gate_count_optimized: u64,
|
||||
|
||||
/// Circuit depth (longest path from input to output).
|
||||
pub circuit_depth: u32,
|
||||
|
||||
/// Number of two-qubit gates (entangling operations).
|
||||
pub two_qubit_gate_count: u64,
|
||||
|
||||
// -- Execution metrics --
|
||||
|
||||
/// Total wall-clock execution time in milliseconds.
|
||||
pub execution_time_ms: f64,
|
||||
|
||||
/// Time spent in gate application (excluding allocation, measurement).
|
||||
pub gate_application_time_ms: f64,
|
||||
|
||||
/// Time spent in measurement sampling.
|
||||
pub measurement_time_ms: f64,
|
||||
|
||||
/// Peak memory consumption in bytes during simulation.
|
||||
pub peak_memory_bytes: u64,
|
||||
|
||||
/// Memory allocated for the state vector / tensor network.
|
||||
pub state_memory_bytes: u64,
|
||||
|
||||
/// Backend used for this simulation.
|
||||
pub backend: BackendType,
|
||||
|
||||
// -- Throughput --
|
||||
|
||||
/// Gates applied per second (optimized gate count / gate application time).
|
||||
pub gates_per_second: f64,
|
||||
|
||||
/// Qubits * depth per second (a normalized throughput metric).
|
||||
pub quantum_volume_rate: f64,
|
||||
|
||||
// -- Optimization statistics --
|
||||
|
||||
/// Number of gates eliminated by fusion.
|
||||
pub gates_fused: u64,
|
||||
|
||||
/// Number of gates eliminated as identity or redundant.
|
||||
pub gates_skipped: u64,
|
||||
|
||||
/// Number of gate commutations applied.
|
||||
pub gates_commuted: u64,
|
||||
|
||||
// -- Entanglement analysis --
|
||||
|
||||
/// Number of independent qubit subsets (entanglement groups).
|
||||
pub entanglement_groups: u32,
|
||||
|
||||
/// Sizes of each entanglement group.
|
||||
pub entanglement_group_sizes: Vec<u32>,
|
||||
|
||||
// -- Measurement outcomes (if measured) --
|
||||
|
||||
/// Number of measurement shots executed.
|
||||
pub measurement_shots: Option<u64>,
|
||||
|
||||
/// Distribution entropy of measurement outcomes (bits).
|
||||
pub outcome_entropy: Option<f64>,
|
||||
|
||||
// -- MPS-specific (tensor network backend) --
|
||||
|
||||
/// Maximum bond dimension reached (MPS mode only).
|
||||
pub max_bond_dimension: Option<u32>,
|
||||
|
||||
/// Estimated fidelity after MPS truncation.
|
||||
pub mps_fidelity_estimate: Option<f64>,
|
||||
|
||||
// -- Error information --
|
||||
|
||||
/// Whether the simulation completed successfully.
|
||||
pub success: bool,
|
||||
|
||||
/// Error message if simulation failed.
|
||||
pub error: Option<String>,
|
||||
|
||||
/// Error category for programmatic handling.
|
||||
pub error_kind: Option<SimulationErrorKind>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub enum BackendType {
|
||||
StateVector,
|
||||
TensorNetwork,
|
||||
Mps,
|
||||
Hybrid,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub enum SimulationErrorKind {
|
||||
QubitLimitExceeded,
|
||||
MemoryAllocationFailed,
|
||||
InvalidGateTarget,
|
||||
InvalidParameter,
|
||||
ContractionFailed,
|
||||
MpsFidelityBelowThreshold,
|
||||
Timeout,
|
||||
InternalError,
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Metrics Sink Trait
|
||||
|
||||
The engine publishes metrics through a trait abstraction, allowing different sinks
|
||||
for native and WASM environments:
|
||||
|
||||
```rust
|
||||
/// Trait for consuming simulation metrics.
|
||||
/// Implementations exist for native (ruvector-metrics), WASM (JS callback),
|
||||
/// and testing (in-memory collector).
|
||||
pub trait MetricsSink: Send + Sync {
|
||||
/// Publish a completed simulation's metrics.
|
||||
fn publish(&self, metrics: &SimulationMetrics);
|
||||
|
||||
/// Publish an incremental progress update (for long-running simulations).
|
||||
fn progress(&self, simulation_id: Uuid, percent_complete: f32, message: &str);
|
||||
|
||||
/// Publish a health status update.
|
||||
fn health(&self, status: EngineHealthStatus);
|
||||
}
|
||||
|
||||
/// Native implementation: forwards to ruvector-metrics.
|
||||
pub struct NativeMetricsSink {
|
||||
registry: Arc<ruvector_metrics::Registry>,
|
||||
}
|
||||
|
||||
impl MetricsSink for NativeMetricsSink {
|
||||
fn publish(&self, metrics: &SimulationMetrics) {
|
||||
// Emit as histogram/counter/gauge values
|
||||
self.registry.histogram("ruqu.execution_time_ms")
|
||||
.record(metrics.execution_time_ms);
|
||||
self.registry.gauge("ruqu.peak_memory_bytes")
|
||||
.set(metrics.peak_memory_bytes as f64);
|
||||
self.registry.counter("ruqu.simulations_total")
|
||||
.increment(1);
|
||||
self.registry.counter("ruqu.gates_applied_total")
|
||||
.increment(metrics.gate_count_optimized);
|
||||
self.registry.histogram("ruqu.gates_per_second")
|
||||
.record(metrics.gates_per_second);
|
||||
|
||||
if !metrics.success {
|
||||
self.registry.counter("ruqu.errors_total")
|
||||
.increment(1);
|
||||
}
|
||||
}
|
||||
|
||||
fn progress(&self, _id: Uuid, percent: f32, _msg: &str) {
|
||||
self.registry.gauge("ruqu.current_progress")
|
||||
.set(percent as f64);
|
||||
}
|
||||
|
||||
fn health(&self, status: EngineHealthStatus) {
|
||||
self.registry.gauge("ruqu.health_status")
|
||||
.set(status.as_numeric());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. WASM Metrics Sink
|
||||
|
||||
In WASM, metrics are delivered via JavaScript callbacks:
|
||||
|
||||
```rust
|
||||
#[cfg(target_arch = "wasm32")]
|
||||
pub struct WasmMetricsSink {
|
||||
/// JS callback function registered by host application.
|
||||
callback: js_sys::Function,
|
||||
}
|
||||
|
||||
#[cfg(target_arch = "wasm32")]
|
||||
impl MetricsSink for WasmMetricsSink {
|
||||
fn publish(&self, metrics: &SimulationMetrics) {
|
||||
let json = serde_json::to_string(metrics)
|
||||
.unwrap_or_else(|_| "{}".to_string());
|
||||
let js_value = JsValue::from_str(&json);
|
||||
let event_type = JsValue::from_str("simulation_complete");
|
||||
let _ = self.callback.call2(&JsValue::NULL, &event_type, &js_value);
|
||||
}
|
||||
|
||||
fn progress(&self, id: Uuid, percent: f32, message: &str) {
|
||||
let payload = format!(
|
||||
r#"{{"simulation_id":"{}","percent":{},"message":"{}"}}"#,
|
||||
id, percent, message
|
||||
);
|
||||
let js_value = JsValue::from_str(&payload);
|
||||
let event_type = JsValue::from_str("simulation_progress");
|
||||
let _ = self.callback.call2(&JsValue::NULL, &event_type, &js_value);
|
||||
}
|
||||
|
||||
fn health(&self, status: EngineHealthStatus) {
|
||||
let payload = format!(r#"{{"status":"{}"}}"#, status.as_str());
|
||||
let js_value = JsValue::from_str(&payload);
|
||||
let event_type = JsValue::from_str("engine_health");
|
||||
let _ = self.callback.call2(&JsValue::NULL, &event_type, &js_value);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
JavaScript host registration:
|
||||
|
||||
```javascript
|
||||
// Host application registers the metrics callback
|
||||
import init, { set_metrics_callback } from 'ruqu-wasm';
|
||||
|
||||
await init();
|
||||
|
||||
set_metrics_callback((eventType, data) => {
|
||||
const metrics = JSON.parse(data);
|
||||
switch (eventType) {
|
||||
case 'simulation_complete':
|
||||
console.log(`Simulation ${metrics.simulation_id} completed in ${metrics.execution_time_ms}ms`);
|
||||
dashboard.updateMetrics(metrics);
|
||||
break;
|
||||
case 'simulation_progress':
|
||||
progressBar.update(metrics.percent);
|
||||
break;
|
||||
case 'engine_health':
|
||||
healthIndicator.set(metrics.status);
|
||||
break;
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### 4. Tracing Integration
|
||||
|
||||
The engine integrates with the Rust `tracing` crate for structured logging and
|
||||
distributed tracing.
|
||||
|
||||
#### Span Hierarchy
|
||||
|
||||
```
|
||||
ruqu::simulation (root span for entire simulation)
|
||||
|
|
||||
+-- ruqu::circuit_validation (validate circuit structure)
|
||||
|
|
||||
+-- ruqu::backend_selection (automatic backend choice)
|
||||
|
|
||||
+-- ruqu::optimization (gate fusion, commutation, etc.)
|
||||
| |
|
||||
| +-- ruqu::optimization::fusion (individual fusion passes)
|
||||
| +-- ruqu::optimization::cancel (gate cancellation)
|
||||
|
|
||||
+-- ruqu::state_init (allocate and initialize state)
|
||||
|
|
||||
+-- ruqu::gate_application (apply all gates)
|
||||
| |
|
||||
| +-- ruqu::gate (individual gate -- DEBUG level only)
|
||||
|
|
||||
+-- ruqu::measurement (perform measurement sampling)
|
||||
|
|
||||
+-- ruqu::metrics_publish (emit metrics to sink)
|
||||
|
|
||||
+-- ruqu::state_cleanup (deallocate state vector)
|
||||
```
|
||||
|
||||
#### Instrumentation Code
|
||||
|
||||
```rust
|
||||
use tracing::{info, warn, debug, trace, instrument, Span};
|
||||
|
||||
#[instrument(
|
||||
name = "ruqu::simulation",
|
||||
skip(circuit, config, metrics_sink),
|
||||
fields(
|
||||
qubit_count = circuit.num_qubits(),
|
||||
gate_count = circuit.gate_count(),
|
||||
simulation_id = %Uuid::new_v4(),
|
||||
)
|
||||
)]
|
||||
pub fn execute(
|
||||
circuit: &QuantumCircuit,
|
||||
shots: usize,
|
||||
config: &SimulationConfig,
|
||||
metrics_sink: &dyn MetricsSink,
|
||||
) -> Result<SimulationResult, SimulationError> {
|
||||
info!(
|
||||
qubits = circuit.num_qubits(),
|
||||
gates = circuit.gate_count(),
|
||||
depth = circuit.depth(),
|
||||
shots = shots,
|
||||
"Starting quantum simulation"
|
||||
);
|
||||
|
||||
// Validate
|
||||
let _validation_span = tracing::info_span!("ruqu::circuit_validation").entered();
|
||||
validate_circuit(circuit)?;
|
||||
drop(_validation_span);
|
||||
|
||||
// Select backend
|
||||
let _backend_span = tracing::info_span!("ruqu::backend_selection").entered();
|
||||
let backend = select_backend(circuit, config);
|
||||
info!(backend = backend.name(), "Backend selected");
|
||||
drop(_backend_span);
|
||||
|
||||
// Optimize
|
||||
let _opt_span = tracing::info_span!("ruqu::optimization").entered();
|
||||
let optimized = optimize_circuit(circuit, config)?;
|
||||
info!(
|
||||
original_gates = circuit.gate_count(),
|
||||
optimized_gates = optimized.gate_count(),
|
||||
gates_fused = circuit.gate_count() - optimized.gate_count(),
|
||||
"Circuit optimization complete"
|
||||
);
|
||||
drop(_opt_span);
|
||||
|
||||
// Execute
|
||||
let result = backend.execute(&optimized, shots, config)?;
|
||||
|
||||
// At DEBUG level, log per-gate details
|
||||
debug!(
|
||||
execution_time_ms = result.execution_time_ms,
|
||||
peak_memory = result.peak_memory_bytes,
|
||||
"Simulation execution complete"
|
||||
);
|
||||
|
||||
// At TRACE level only for small circuits, log amplitude information
|
||||
if circuit.num_qubits() <= 10 {
|
||||
trace!(
|
||||
amplitudes = ?result.state_vector_snapshot(),
|
||||
"Final state vector (small circuit trace)"
|
||||
);
|
||||
}
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Structured Error Reporting
|
||||
|
||||
All errors carry structured context for programmatic handling:
|
||||
|
||||
```rust
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum SimulationError {
|
||||
#[error("Qubit limit exceeded: requested {requested}, maximum {maximum}")]
|
||||
QubitLimitExceeded {
|
||||
requested: u32,
|
||||
maximum: u32,
|
||||
estimated_memory_bytes: u64,
|
||||
available_memory_bytes: u64,
|
||||
},
|
||||
|
||||
#[error("Memory allocation failed for {requested_bytes} bytes")]
|
||||
MemoryAllocationFailed {
|
||||
requested_bytes: u64,
|
||||
qubit_count: u32,
|
||||
suggestion: &'static str,
|
||||
},
|
||||
|
||||
#[error("Invalid gate target: qubit {qubit} in {qubit_count}-qubit circuit")]
|
||||
InvalidGateTarget {
|
||||
gate_name: String,
|
||||
qubit: u32,
|
||||
qubit_count: u32,
|
||||
gate_index: usize,
|
||||
},
|
||||
|
||||
#[error("Invalid gate parameter: {parameter_name} = {value} ({reason})")]
|
||||
InvalidParameter {
|
||||
gate_name: String,
|
||||
parameter_name: String,
|
||||
value: f64,
|
||||
reason: &'static str,
|
||||
},
|
||||
|
||||
#[error("Tensor contraction failed: {reason}")]
|
||||
ContractionFailed {
|
||||
reason: String,
|
||||
estimated_treewidth: usize,
|
||||
suggestion: &'static str,
|
||||
},
|
||||
|
||||
#[error("MPS fidelity {fidelity:.6} below threshold {threshold:.6}")]
|
||||
MpsFidelityBelowThreshold {
|
||||
fidelity: f64,
|
||||
threshold: f64,
|
||||
max_bond_dimension: usize,
|
||||
suggestion: &'static str,
|
||||
},
|
||||
|
||||
#[error("Simulation timed out after {elapsed_ms}ms (limit: {timeout_ms}ms)")]
|
||||
Timeout {
|
||||
elapsed_ms: u64,
|
||||
timeout_ms: u64,
|
||||
gates_completed: u64,
|
||||
gates_remaining: u64,
|
||||
},
|
||||
|
||||
#[error("Internal error: {message}")]
|
||||
InternalError {
|
||||
message: String,
|
||||
source: Option<Box<dyn std::error::Error + Send + Sync>>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
Each error variant includes a `suggestion` field where applicable, guiding users
|
||||
toward resolution:
|
||||
|
||||
| Error | Suggestion |
|
||||
|---|---|
|
||||
| QubitLimitExceeded | "Reduce qubit count or enable tensor-network feature for large circuits" |
|
||||
| MemoryAllocationFailed | "Try tensor-network backend or reduce qubit count by 1-2 (halves/quarters memory)" |
|
||||
| ContractionFailed | "Circuit treewidth too high for tensor network; use state vector for <= 30 qubits" |
|
||||
| MpsFidelityBelowThreshold | "Increase chi_max or switch to exact state vector for high-fidelity results" |
|
||||
|
||||
### 6. Health Checks
|
||||
|
||||
The engine exposes health status for monitoring systems:
|
||||
|
||||
```rust
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct EngineHealthStatus {
|
||||
/// Whether the engine is ready to accept simulations.
|
||||
pub ready: bool,
|
||||
|
||||
/// Maximum qubits supportable given current available memory.
|
||||
pub max_supported_qubits: u32,
|
||||
|
||||
/// Available memory in bytes.
|
||||
pub available_memory_bytes: u64,
|
||||
|
||||
/// Number of CPU cores available for parallel gate application.
|
||||
pub available_cores: usize,
|
||||
|
||||
/// Whether the tensor-network backend is compiled in.
|
||||
pub tensor_network_available: bool,
|
||||
|
||||
/// Current engine version.
|
||||
pub version: &'static str,
|
||||
|
||||
/// Uptime since engine initialization (if applicable).
|
||||
pub uptime_seconds: Option<f64>,
|
||||
|
||||
/// Number of simulations executed in current session.
|
||||
pub simulations_executed: u64,
|
||||
|
||||
/// Total gates applied across all simulations in current session.
|
||||
pub total_gates_applied: u64,
|
||||
}
|
||||
|
||||
/// Check engine health. Callable at any time.
|
||||
pub fn quantum_engine_ready() -> EngineHealthStatus {
|
||||
let available_memory = estimate_available_memory();
|
||||
let max_qubits = compute_max_qubits(available_memory);
|
||||
|
||||
EngineHealthStatus {
|
||||
ready: max_qubits >= 4, // Minimum useful simulation
|
||||
max_supported_qubits: max_qubits,
|
||||
available_memory_bytes: available_memory,
|
||||
available_cores: rayon::current_num_threads(),
|
||||
tensor_network_available: cfg!(feature = "tensor-network"),
|
||||
version: env!("CARGO_PKG_VERSION"),
|
||||
uptime_seconds: None, // Library mode; no persistent uptime
|
||||
simulations_executed: SESSION_COUNTER.load(Ordering::Relaxed),
|
||||
total_gates_applied: SESSION_GATES.load(Ordering::Relaxed),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7. Logging Levels
|
||||
|
||||
| Level | Content | Audience | Performance Impact |
|
||||
|---|---|---|---|
|
||||
| ERROR | Simulation failures, OOM, invalid circuits | Operators, alerting | None |
|
||||
| WARN | Approaching memory limits (>80%), MPS fidelity degradation, slow contraction | Operators | Negligible |
|
||||
| INFO | Simulation start/end summaries, backend selection, optimization results | Developers, dashboards | Negligible |
|
||||
| DEBUG | Per-optimization-pass details, memory allocation sizes, thread utilization | Developers debugging | Low |
|
||||
| TRACE | Per-gate amplitude changes (small circuits only, n <= 10), SVD singular values | Deep debugging | High (small circuits only) |
|
||||
|
||||
TRACE level is gated on circuit size to prevent catastrophic log volume:
|
||||
|
||||
```rust
|
||||
// TRACE-level amplitude logging is only emitted for circuits with <= 10 qubits.
|
||||
// For larger circuits, TRACE only emits gate-level timing without amplitude data.
|
||||
if tracing::enabled!(tracing::Level::TRACE) {
|
||||
if circuit.num_qubits() <= 10 {
|
||||
trace!(amplitudes = ?state.as_slice(), "Post-gate state");
|
||||
} else {
|
||||
trace!(gate_time_ns = elapsed.as_nanos(), "Gate applied");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8. Dashboard Integration
|
||||
|
||||
Metrics from the quantum engine appear in the ruVector monitoring UI as a dedicated
|
||||
panel alongside vector operations, index health, and system resources.
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| ruVector Monitoring Dashboard |
|
||||
+------------------------------------------------------------------+
|
||||
| |
|
||||
| Vector Operations | Quantum Simulations |
|
||||
| ------------------- | ----------------------- |
|
||||
| Queries/sec: 12,450 | Simulations/min: 23 |
|
||||
| P99 latency: 2.3ms | Avg execution: 145ms |
|
||||
| Index size: 2.1M vectors | Avg qubits: 18.4 |
|
||||
| | Peak memory: 4.2 GiB |
|
||||
| | Backend: SV 87% / TN 13% |
|
||||
| | Gates/sec: 2.1B |
|
||||
| | Error rate: 0.02% |
|
||||
| | |
|
||||
| System Resources | Recent Simulations |
|
||||
| ------------------- | ----------------------- |
|
||||
| CPU: 34% | #a3f2.. 24q 230ms OK |
|
||||
| Memory: 61% (49/80 GiB) | #b891.. 16q 12ms OK |
|
||||
| Threads: 64/256 active | #c4d0.. 30q 1.2s OK |
|
||||
| | #d122.. 35q ERR OOM |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
Metrics are published via the existing `ruvector-metrics` WebSocket feed:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": "ruqu",
|
||||
"type": "simulation_complete",
|
||||
"timestamp": "2026-02-06T14:23:01.442Z",
|
||||
"data": {
|
||||
"simulation_id": "a3f2e891-...",
|
||||
"qubit_count": 24,
|
||||
"execution_time_ms": 230.4,
|
||||
"peak_memory_bytes": 268435456,
|
||||
"backend": "StateVector",
|
||||
"gates_per_second": 2147483648,
|
||||
"success": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 9. Prometheus / OpenTelemetry Export
|
||||
|
||||
For external monitoring, the native metrics sink exports standard Prometheus
|
||||
metrics:
|
||||
|
||||
```
|
||||
# HELP ruqu_simulations_total Total quantum simulations executed
|
||||
# TYPE ruqu_simulations_total counter
|
||||
ruqu_simulations_total{backend="state_vector",status="success"} 1847
|
||||
ruqu_simulations_total{backend="state_vector",status="error"} 3
|
||||
ruqu_simulations_total{backend="tensor_network",status="success"} 241
|
||||
|
||||
# HELP ruqu_execution_time_ms Simulation execution time histogram
|
||||
# TYPE ruqu_execution_time_ms histogram
|
||||
ruqu_execution_time_ms_bucket{backend="state_vector",le="10"} 423
|
||||
ruqu_execution_time_ms_bucket{backend="state_vector",le="100"} 1201
|
||||
ruqu_execution_time_ms_bucket{backend="state_vector",le="1000"} 1834
|
||||
ruqu_execution_time_ms_bucket{backend="state_vector",le="+Inf"} 1847
|
||||
|
||||
# HELP ruqu_peak_memory_bytes Peak memory during simulation
|
||||
# TYPE ruqu_peak_memory_bytes gauge
|
||||
ruqu_peak_memory_bytes 4294967296
|
||||
|
||||
# HELP ruqu_gates_per_second Gate application throughput
|
||||
# TYPE ruqu_gates_per_second gauge
|
||||
ruqu_gates_per_second 2.1e9
|
||||
|
||||
# HELP ruqu_max_supported_qubits Maximum qubits based on available memory
|
||||
# TYPE ruqu_max_supported_qubits gauge
|
||||
ruqu_max_supported_qubits 33
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Unified observability**: Quantum simulation telemetry integrates seamlessly
|
||||
with ruVector's existing monitoring infrastructure.
|
||||
2. **Cross-platform**: The trait-based sink design supports native, WASM, and
|
||||
testing environments without code changes in the engine.
|
||||
3. **Actionable errors**: Structured errors with suggestions reduce debugging time
|
||||
and improve developer experience.
|
||||
4. **Performance visibility**: Gates-per-second, memory consumption, and backend
|
||||
selection metrics enable informed performance tuning.
|
||||
5. **Compliance ready**: Structured logging with simulation IDs supports audit
|
||||
trail requirements.
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Metric cardinality**: High-frequency simulations could generate significant
|
||||
metric volume. Mitigated by aggregation at the sink level.
|
||||
2. **WASM callback overhead**: JSON serialization for WASM metrics adds ~0.1ms per
|
||||
simulation. Acceptable for typical workloads.
|
||||
3. **Tracing overhead at DEBUG/TRACE**: Enabled tracing at low levels adds
|
||||
measurable overhead. Production deployments should use INFO or above.
|
||||
4. **Schema evolution**: Changes to `SimulationMetrics` require versioned handling
|
||||
in consumers.
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Mitigation |
|
||||
|---|---|
|
||||
| Metric volume overwhelming storage | Configurable sampling rate; aggregate in sink |
|
||||
| WASM callback exceptions | Catch JS exceptions in callback wrapper; log to console |
|
||||
| Schema breaking changes | Version field in metrics; consumer-side version dispatch |
|
||||
| TRACE logging for large circuits | Qubit-count gate prevents amplitude logging above n=10 |
|
||||
|
||||
## References
|
||||
|
||||
- `ruvector-metrics` crate: internal metrics infrastructure
|
||||
- Rust `tracing` crate: https://docs.rs/tracing
|
||||
- OpenTelemetry Rust SDK: https://docs.rs/opentelemetry
|
||||
- ADR-QE-005: WASM Compilation Target (WASM constraints)
|
||||
- ADR-QE-011: Memory Gating & Power Management (resource monitoring)
|
||||
- Prometheus exposition format: https://prometheus.io/docs/instrumenting/exposition_formats/
|
||||
|
|
@ -0,0 +1,628 @@
|
|||
# ADR-QE-011: Memory Gating & Power Management
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
ruVector is designed to operate within the Cognitum computing paradigm: a tile-based
|
||||
architecture with 256 low-power processor cores, event-driven activation, and
|
||||
aggressive power gating. Agents (software components) remain fully dormant until an
|
||||
event triggers their activation. Once their work completes, they release all
|
||||
resources and return to dormancy.
|
||||
|
||||
The quantum simulation engine must adhere to this model:
|
||||
|
||||
1. **Zero idle footprint**: When no simulation is running, the engine consumes zero
|
||||
CPU cycles and zero heap memory beyond its compiled code and static data.
|
||||
2. **Rapid activation**: The engine must be ready to execute a simulation within
|
||||
microseconds of receiving a request.
|
||||
3. **Prompt resource release**: Upon simulation completion (or failure), all
|
||||
allocated memory is immediately freed.
|
||||
4. **Predictable memory**: Callers must be able to determine exact memory
|
||||
requirements before committing to a simulation.
|
||||
|
||||
### Memory Scale
|
||||
|
||||
The state vector for n qubits requires 2^n complex amplitudes, each consuming 16
|
||||
bytes (two f64 values):
|
||||
|
||||
| Qubits | Amplitudes | Memory | Notes |
|
||||
|--------|-----------|--------|-------|
|
||||
| 10 | 1,024 | 16 KiB | Trivial |
|
||||
| 15 | 32,768 | 512 KiB | Small |
|
||||
| 20 | 1,048,576 | 16 MiB | Moderate |
|
||||
| 25 | 33,554,432 | 512 MiB | Large |
|
||||
| 28 | 268,435,456 | 4 GiB | Needs dedicated memory |
|
||||
| 30 | 1,073,741,824 | 16 GiB | Workstation-class |
|
||||
| 32 | 4,294,967,296 | 64 GiB | Server-class |
|
||||
| 35 | 34,359,738,368 | 512 GiB | HPC |
|
||||
| 40 | 1,099,511,627,776 | 16 TiB | Infeasible (state vector) |
|
||||
|
||||
Each additional qubit doubles memory. This exponential scaling makes memory the
|
||||
primary resource constraint and the most important resource to manage.
|
||||
|
||||
### Edge and Embedded Constraints
|
||||
|
||||
On edge devices (embedded ruVector nodes, IoT gateways, mobile processors), memory
|
||||
is severely limited:
|
||||
|
||||
| Platform | Typical RAM | Max qubits (state vector) |
|
||||
|----------|------------|--------------------------|
|
||||
| Cognitum tile (single) | 256 MiB | 23 |
|
||||
| Cognitum tile cluster (4) | 1 GiB | 25 |
|
||||
| Raspberry Pi 4 | 8 GiB | 28 |
|
||||
| Mobile device | 4-6 GiB | 27-28 (with other apps) |
|
||||
| Laptop | 16-64 GiB | 29-31 |
|
||||
| Server | 256-512 GiB | 33-34 |
|
||||
|
||||
### WASM Memory Model
|
||||
|
||||
WebAssembly uses a linear memory that can grow but cannot shrink. Once a large
|
||||
simulation allocates pages, those pages remain mapped until the WASM instance is
|
||||
destroyed. This is a fundamental platform limitation that must be documented and
|
||||
accounted for.
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Zero-Idle Footprint Architecture
|
||||
|
||||
The quantum engine is implemented as a pure library with no runtime overhead:
|
||||
|
||||
```rust
|
||||
// The engine is a collection of functions and types.
|
||||
// No background threads, no event loops, no persistent state.
|
||||
// When not called, it consumes exactly zero CPU and zero heap.
|
||||
|
||||
pub struct QuantumEngine; // Zero-sized type; purely a namespace
|
||||
|
||||
impl QuantumEngine {
|
||||
/// Execute a simulation. All resources are allocated on entry
|
||||
/// and freed on exit (or on error).
|
||||
pub fn execute(
|
||||
circuit: &QuantumCircuit,
|
||||
shots: usize,
|
||||
config: &SimulationConfig,
|
||||
) -> Result<SimulationResult, SimulationError> {
|
||||
// 1. Estimate and validate memory
|
||||
let required = Self::estimate_memory(circuit.num_qubits());
|
||||
Self::validate_memory_available(required)?;
|
||||
|
||||
// 2. Allocate state vector (the big allocation)
|
||||
let mut state = Self::allocate_state(circuit.num_qubits())?;
|
||||
|
||||
// 3. Execute gates (all computation happens here)
|
||||
Self::apply_gates(circuit, &mut state, config)?;
|
||||
|
||||
// 4. Measure (if requested)
|
||||
let measurements = Self::measure(&state, shots)?;
|
||||
|
||||
// 5. Build result (copies out what we need)
|
||||
let result = SimulationResult::from_state_and_measurements(
|
||||
&state, measurements, circuit,
|
||||
);
|
||||
|
||||
// 6. state is dropped here -- Vec<Complex<f64>> deallocated
|
||||
// No cleanup needed. No finalizers. Just drop.
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
// state goes out of scope and is deallocated by Rust's ownership system
|
||||
}
|
||||
```
|
||||
|
||||
Key properties:
|
||||
- No `new()` or `init()` methods that create persistent state.
|
||||
- No `Drop` impl with complex cleanup logic.
|
||||
- No `Arc`, `Mutex`, or shared state between calls.
|
||||
- Each call is fully independent and self-contained.
|
||||
|
||||
### 2. On-Demand Allocation Strategy
|
||||
|
||||
State vectors are allocated at simulation start and freed at simulation end:
|
||||
|
||||
```rust
|
||||
fn allocate_state(n_qubits: u32) -> Result<StateVector, SimulationError> {
|
||||
let num_amplitudes = 1_usize.checked_shl(n_qubits)
|
||||
.ok_or(SimulationError::QubitLimitExceeded {
|
||||
requested: n_qubits,
|
||||
maximum: (usize::BITS - 1) as u32,
|
||||
estimated_memory_bytes: u64::MAX,
|
||||
available_memory_bytes: estimate_available_memory() as u64,
|
||||
})?;
|
||||
|
||||
let required_bytes = num_amplitudes
|
||||
.checked_mul(std::mem::size_of::<Complex<f64>>())
|
||||
.ok_or(SimulationError::MemoryAllocationFailed {
|
||||
requested_bytes: u64::MAX,
|
||||
qubit_count: n_qubits,
|
||||
suggestion: "Qubit count exceeds addressable memory",
|
||||
})?;
|
||||
|
||||
// Attempt allocation. Rust's global allocator will return an error
|
||||
// (with #[global_allocator] configured) or the OS will OOM-kill us.
|
||||
// We use try_reserve to handle this gracefully.
|
||||
let mut amplitudes = Vec::new();
|
||||
amplitudes.try_reserve_exact(num_amplitudes)
|
||||
.map_err(|_| SimulationError::MemoryAllocationFailed {
|
||||
requested_bytes: required_bytes as u64,
|
||||
qubit_count: n_qubits,
|
||||
suggestion: "Reduce qubit count or use tensor-network backend",
|
||||
})?;
|
||||
|
||||
// Initialize to |00...0> state
|
||||
amplitudes.resize(num_amplitudes, Complex::new(0.0, 0.0));
|
||||
amplitudes[0] = Complex::new(1.0, 0.0);
|
||||
|
||||
Ok(StateVector { amplitudes, n_qubits })
|
||||
}
|
||||
```
|
||||
|
||||
The allocation sequence:
|
||||
|
||||
```
|
||||
IDLE (zero memory)
|
||||
|
|
||||
v
|
||||
estimate_memory(n) --> returns bytes needed
|
||||
|
|
||||
v
|
||||
validate_memory_available(bytes) --> checks against OS/platform limits
|
||||
| returns Err if insufficient
|
||||
v
|
||||
Vec::try_reserve_exact(2^n) --> attempts allocation
|
||||
| returns Err on failure (no panic)
|
||||
v
|
||||
ALLOCATED (2^n * 16 bytes on heap)
|
||||
|
|
||||
v
|
||||
[... simulation runs ...]
|
||||
|
|
||||
v
|
||||
Vec::drop() --> automatic deallocation
|
||||
|
|
||||
v
|
||||
IDLE (zero memory)
|
||||
```
|
||||
|
||||
### 3. Memory Estimation API
|
||||
|
||||
Callers can query exact memory requirements before committing:
|
||||
|
||||
```rust
|
||||
/// Returns the number of bytes required to simulate n_qubits.
|
||||
/// This accounts for the state vector plus working memory for
|
||||
/// gate application (temporary buffers, measurement arrays, etc.).
|
||||
///
|
||||
/// # Returns
|
||||
/// - `Ok(bytes)` if the qubit count is representable
|
||||
/// - `Err(...)` if 2^n_qubits overflows usize
|
||||
pub fn estimate_memory(n_qubits: u32) -> Result<MemoryEstimate, SimulationError> {
|
||||
let num_amplitudes = 1_usize.checked_shl(n_qubits)
|
||||
.ok_or(SimulationError::QubitLimitExceeded {
|
||||
requested: n_qubits,
|
||||
maximum: (usize::BITS - 1) as u32,
|
||||
estimated_memory_bytes: u64::MAX,
|
||||
available_memory_bytes: 0,
|
||||
})?;
|
||||
|
||||
let state_vector_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>();
|
||||
|
||||
// Working memory: temporary buffer for gate application (1 amplitude slice)
|
||||
// Plus measurement result storage
|
||||
let working_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>() / 4;
|
||||
|
||||
// Thread-local scratch space (per Rayon thread)
|
||||
let thread_count = rayon::current_num_threads();
|
||||
let scratch_per_thread = 64 * 1024; // 64 KiB per thread for local buffers
|
||||
let thread_scratch = thread_count * scratch_per_thread;
|
||||
|
||||
Ok(MemoryEstimate {
|
||||
state_vector_bytes: state_vector_bytes as u64,
|
||||
working_bytes: working_bytes as u64,
|
||||
thread_scratch_bytes: thread_scratch as u64,
|
||||
total_bytes: (state_vector_bytes + working_bytes + thread_scratch) as u64,
|
||||
num_amplitudes: num_amplitudes as u64,
|
||||
})
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct MemoryEstimate {
|
||||
/// Bytes for the state vector (dominant cost).
|
||||
pub state_vector_bytes: u64,
|
||||
/// Bytes for gate-application working memory.
|
||||
pub working_bytes: u64,
|
||||
/// Bytes for thread-local scratch space.
|
||||
pub thread_scratch_bytes: u64,
|
||||
/// Total estimated bytes.
|
||||
pub total_bytes: u64,
|
||||
/// Number of complex amplitudes.
|
||||
pub num_amplitudes: u64,
|
||||
}
|
||||
|
||||
impl MemoryEstimate {
|
||||
/// Returns true if the estimate fits within the given byte budget.
|
||||
pub fn fits_in(&self, available_bytes: u64) -> bool {
|
||||
self.total_bytes <= available_bytes
|
||||
}
|
||||
|
||||
/// Suggest the maximum qubits for a given memory budget.
|
||||
pub fn max_qubits_for(available_bytes: u64) -> u32 {
|
||||
// Each qubit doubles memory; find largest n where 20 * 2^n <= available
|
||||
// Factor of 20 accounts for 16-byte amplitudes + 25% working memory
|
||||
let effective = available_bytes / 20;
|
||||
if effective == 0 { return 0; }
|
||||
(effective.ilog2()) as u32
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Allocation Failure Handling
|
||||
|
||||
The engine never panics on allocation failure. All paths return structured errors:
|
||||
|
||||
```rust
|
||||
// Pattern: every allocation is fallible and returns a descriptive error.
|
||||
|
||||
// State vector allocation failure:
|
||||
SimulationError::MemoryAllocationFailed {
|
||||
requested_bytes: 17_179_869_184, // 16 GiB
|
||||
qubit_count: 30,
|
||||
suggestion: "Reduce qubit count by 2 (to 28, ~4 GiB) or enable tensor-network backend",
|
||||
}
|
||||
|
||||
// Integer overflow (qubit count too large):
|
||||
SimulationError::QubitLimitExceeded {
|
||||
requested: 64,
|
||||
maximum: 33, // based on available memory
|
||||
estimated_memory_bytes: u64::MAX,
|
||||
available_memory_bytes: 68_719_476_736, // 64 GiB
|
||||
}
|
||||
```
|
||||
|
||||
Decision tree on allocation failure:
|
||||
|
||||
```
|
||||
Memory allocation failed
|
||||
|
|
||||
+-- Is tensor-network feature enabled?
|
||||
| |
|
||||
| +-- YES: Suggest tensor-network backend
|
||||
| | (may work if circuit has low treewidth)
|
||||
| |
|
||||
| +-- NO: Suggest reducing qubit count
|
||||
| Calculate: max_qubits = floor(log2(available / 20))
|
||||
| Suggest: "Reduce to {max_qubits} qubits ({memory} bytes)"
|
||||
|
|
||||
+-- Is the request wildly over budget (>100x)?
|
||||
| |
|
||||
| +-- YES: "Circuit requires {X} GiB but only {Y} MiB available"
|
||||
| |
|
||||
| +-- NO: "Circuit requires {X} GiB, {Y} GiB available.
|
||||
| Reducing by {delta} qubits would fit."
|
||||
|
|
||||
+-- Return SimulationError (no panic, no abort)
|
||||
```
|
||||
|
||||
### 5. CPU Yielding for Long Simulations
|
||||
|
||||
For simulations estimated to exceed 100ms, the engine can optionally yield between
|
||||
gate batches to allow the OS scheduler to manage power states:
|
||||
|
||||
```rust
|
||||
pub struct YieldConfig {
|
||||
/// Enable cooperative yielding between gate batches.
|
||||
/// Default: false (maximum throughput).
|
||||
pub enabled: bool,
|
||||
|
||||
/// Number of gates to apply before yielding.
|
||||
/// Default: 1000.
|
||||
pub gates_per_slice: usize,
|
||||
|
||||
/// Yield mechanism.
|
||||
/// Default: ThreadYield (std::thread::yield_now).
|
||||
pub yield_strategy: YieldStrategy,
|
||||
}
|
||||
|
||||
pub enum YieldStrategy {
|
||||
/// Call std::thread::yield_now() between slices.
|
||||
ThreadYield,
|
||||
/// Sleep for specified duration between slices.
|
||||
Sleep(Duration),
|
||||
/// Call a user-provided callback between slices.
|
||||
Callback(Box<dyn Fn(SliceProgress) + Send>),
|
||||
}
|
||||
|
||||
pub struct SliceProgress {
|
||||
pub gates_completed: u64,
|
||||
pub gates_remaining: u64,
|
||||
pub elapsed: Duration,
|
||||
pub estimated_remaining: Duration,
|
||||
}
|
||||
|
||||
// Usage in gate application loop:
|
||||
fn apply_gates_with_yield(
|
||||
circuit: &QuantumCircuit,
|
||||
state: &mut StateVector,
|
||||
yield_config: &YieldConfig,
|
||||
) -> Result<(), SimulationError> {
|
||||
let gates = circuit.gates();
|
||||
|
||||
for (i, gate) in gates.iter().enumerate() {
|
||||
apply_single_gate(gate, state)?;
|
||||
|
||||
if yield_config.enabled && (i + 1) % yield_config.gates_per_slice == 0 {
|
||||
match &yield_config.yield_strategy {
|
||||
YieldStrategy::ThreadYield => std::thread::yield_now(),
|
||||
YieldStrategy::Sleep(d) => std::thread::sleep(*d),
|
||||
YieldStrategy::Callback(cb) => cb(SliceProgress {
|
||||
gates_completed: (i + 1) as u64,
|
||||
gates_remaining: (gates.len() - i - 1) as u64,
|
||||
elapsed: start.elapsed(),
|
||||
estimated_remaining: estimate_remaining(i, gates.len(), start),
|
||||
}),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
Yield is **disabled by default** to maximize throughput. It is primarily intended
|
||||
for:
|
||||
- Edge devices where power management is critical.
|
||||
- Interactive applications where UI responsiveness matters.
|
||||
- Long-running simulations (>1 second) where progress reporting is needed.
|
||||
|
||||
### 6. Thread Management
|
||||
|
||||
The quantum engine does not create or manage its own threads:
|
||||
|
||||
```
|
||||
+-----------------------------------------------+
|
||||
| Global Rayon Thread Pool |
|
||||
| (shared by all ruVector subsystems) |
|
||||
| |
|
||||
| [Thread 0] [Thread 1] ... [Thread N-1] |
|
||||
| ^ ^ ^ |
|
||||
| | | | |
|
||||
| +--+---+ +--+---+ +---+--+ |
|
||||
| | ruQu | | ruQu | | idle | |
|
||||
| | gate | | gate | | | |
|
||||
| | apply | | apply| | | |
|
||||
| +-------+ +------+ +------+ |
|
||||
| |
|
||||
| During simulation: threads work on gates |
|
||||
| After simulation: threads return to pool |
|
||||
| Pool idle: OS can power-gate cores |
|
||||
+-----------------------------------------------+
|
||||
```
|
||||
|
||||
Key properties:
|
||||
- Rayon's global thread pool is initialized once by `ruvector-core` at startup.
|
||||
- The quantum engine calls `rayon::par_iter()` and related APIs, borrowing threads
|
||||
temporarily.
|
||||
- When simulation completes, all threads are returned to the global pool.
|
||||
- If no ruVector work is pending, Rayon threads park (blocking on a condvar),
|
||||
consuming zero CPU. The OS can then power-gate the underlying cores.
|
||||
|
||||
### 7. WASM Memory Considerations
|
||||
|
||||
WebAssembly linear memory has a specific behavior that affects resource management:
|
||||
|
||||
```
|
||||
WASM Memory Layout
|
||||
+------------------+------------------+
|
||||
| Initial pages | Grown pages |
|
||||
| (compiled size) | (runtime alloc) |
|
||||
+------------------+------------------+
|
||||
0 initial_size current_size
|
||||
|
||||
Growth: memory.grow(delta_pages) -> adds pages to the end
|
||||
Shrink: NOT SUPPORTED in WASM spec
|
||||
|
||||
After 25-qubit simulation:
|
||||
+------------------+----------------------------------+
|
||||
| Initial (1 MiB) | Grown for state vec (512 MiB) | <- HIGH WATER MARK
|
||||
+------------------+----------------------------------+
|
||||
|
||||
After simulation completes:
|
||||
+------------------+----------------------------------+
|
||||
| Initial (1 MiB) | FREED internally but pages |
|
||||
| | still mapped (512 MiB virtual) |
|
||||
+------------------+----------------------------------+
|
||||
The Rust allocator returns memory to its free list,
|
||||
but WASM pages are not returned to the host.
|
||||
```
|
||||
|
||||
**Implications and mitigations**:
|
||||
|
||||
1. **Document the behavior**: Users must understand that WASM memory is a high-water
|
||||
mark. A 25-qubit simulation permanently increases the WASM instance's memory
|
||||
footprint to ~512 MiB.
|
||||
|
||||
2. **Instance recycling**: For applications that run multiple simulations, create a
|
||||
new WASM instance periodically to reset the memory high-water mark.
|
||||
|
||||
3. **Memory budget enforcement**: The WASM host can set `WebAssembly.Memory` with a
|
||||
`maximum` parameter to cap growth:
|
||||
|
||||
```javascript
|
||||
const memory = new WebAssembly.Memory({
|
||||
initial: 16, // 1 MiB
|
||||
maximum: 8192, // 512 MiB cap
|
||||
});
|
||||
```
|
||||
|
||||
4. **Pre-check in WASM**: The engine's `estimate_memory()` function works in WASM
|
||||
and should be called before simulation to verify the allocation will succeed.
|
||||
|
||||
### 8. Cognitum Tile Integration
|
||||
|
||||
On Cognitum's tile-based architecture, the quantum engine maps to tiles as follows:
|
||||
|
||||
```
|
||||
Cognitum Processor (256 tiles)
|
||||
+--------+--------+--------+--------+
|
||||
| Tile 0 | Tile 1 | Tile 2 | Tile 3 | <- Assigned to quantum sim
|
||||
| ACTIVE | ACTIVE | ACTIVE | ACTIVE |
|
||||
+--------+--------+--------+--------+
|
||||
| Tile 4 | Tile 5 | Tile 6 | Tile 7 | <- Other ruVector work (or sleeping)
|
||||
| sleep | vecDB | sleep | sleep |
|
||||
+--------+--------+--------+--------+
|
||||
| ... | ... | ... | ... |
|
||||
| sleep | sleep | sleep | sleep | <- Power gated (zero consumption)
|
||||
+--------+--------+--------+--------+
|
||||
```
|
||||
|
||||
**Power state diagram for a quantum simulation lifecycle**:
|
||||
|
||||
```
|
||||
State: ALL_TILES_IDLE
|
||||
|
|
||||
| Simulation request arrives
|
||||
v
|
||||
State: ALLOCATING
|
||||
Action: Wake tiles 0-3 (or however many are needed)
|
||||
Action: Allocate state vector across tile-local memory
|
||||
Power: Tiles 0-3 ACTIVE, rest SLEEP
|
||||
|
|
||||
v
|
||||
State: SIMULATING
|
||||
Action: Apply gates in parallel across active tiles
|
||||
Power: Tiles 0-3 at full clock rate
|
||||
Duration: microseconds to seconds depending on circuit
|
||||
|
|
||||
v
|
||||
State: MEASURING
|
||||
Action: Sample measurement outcomes
|
||||
Power: Tile 0 only (measurement is sequential)
|
||||
|
|
||||
v
|
||||
State: DEALLOCATING
|
||||
Action: Free state vector
|
||||
Action: Return tiles to idle pool
|
||||
|
|
||||
v
|
||||
State: ALL_TILES_IDLE
|
||||
Power: Tiles 0-3 back to SLEEP
|
||||
Memory: Zero heap allocation
|
||||
```
|
||||
|
||||
**Tile assignment policy**:
|
||||
- Small simulations (n <= 20): 1 tile sufficient.
|
||||
- Medium simulations (20 < n <= 25): 2-4 tiles for parallel gate application.
|
||||
- Large simulations (25 < n <= 30): All available tiles.
|
||||
- The tile scheduler (part of Cognitum runtime) handles assignment. The quantum
|
||||
engine simply uses Rayon parallelism; the runtime maps Rayon threads to tiles.
|
||||
|
||||
### 9. Memory Budget Table
|
||||
|
||||
Quick reference for capacity planning:
|
||||
|
||||
| Qubits | State Vector | Working Memory | Total | Platform Fit |
|
||||
|--------|-------------|---------------|-------|-------------|
|
||||
| 10 | 16 KiB | 4 KiB | 20 KiB | Any |
|
||||
| 12 | 64 KiB | 16 KiB | 80 KiB | Any |
|
||||
| 14 | 256 KiB | 64 KiB | 320 KiB | Any |
|
||||
| 16 | 1 MiB | 256 KiB | 1.3 MiB | Any |
|
||||
| 18 | 4 MiB | 1 MiB | 5 MiB | Any |
|
||||
| 20 | 16 MiB | 4 MiB | 20 MiB | Any |
|
||||
| 22 | 64 MiB | 16 MiB | 80 MiB | Cognitum single tile |
|
||||
| 24 | 256 MiB | 64 MiB | 320 MiB | Cognitum 2+ tiles |
|
||||
| 26 | 1 GiB | 256 MiB | 1.3 GiB | Cognitum cluster |
|
||||
| 28 | 4 GiB | 1 GiB | 5 GiB | Laptop / RPi 8GB |
|
||||
| 30 | 16 GiB | 4 GiB | 20 GiB | Workstation |
|
||||
| 32 | 64 GiB | 16 GiB | 80 GiB | Server |
|
||||
| 34 | 256 GiB | 64 GiB | 320 GiB | Large server |
|
||||
|
||||
### 10. Allocation and Deallocation Sequence Diagram
|
||||
|
||||
```
|
||||
Caller Engine OS/Allocator
|
||||
| | |
|
||||
| execute(circuit) | |
|
||||
|-------------------->| |
|
||||
| | |
|
||||
| | estimate_memory(n) |
|
||||
| | validate_available() |
|
||||
| | |
|
||||
| | try_reserve_exact(2^n) |
|
||||
| |------------------------>|
|
||||
| | |
|
||||
| | Ok(ptr) or Err |
|
||||
| |<------------------------|
|
||||
| | |
|
||||
| | [if Err: return |
|
||||
| | SimulationError] |
|
||||
| | |
|
||||
| | initialize |00...0> |
|
||||
| | apply gates |
|
||||
| | measure |
|
||||
| | |
|
||||
| | build result |
|
||||
| | (copies measurements, |
|
||||
| | expectation values) |
|
||||
| | |
|
||||
| | drop(state_vector) |
|
||||
| |------------------------>|
|
||||
| | | free(ptr, 2^n * 16)
|
||||
| | |
|
||||
| Ok(result) | |
|
||||
|<--------------------| |
|
||||
| | |
|
||||
| [Engine holds ZERO | |
|
||||
| heap memory now] | |
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **True zero-idle cost**: No background resource consumption. Perfectly aligned
|
||||
with Cognitum's event-driven architecture and power gating.
|
||||
2. **Predictable memory**: `estimate_memory()` gives exact requirements before
|
||||
committing, preventing OOM surprises.
|
||||
3. **Graceful degradation**: Allocation failures return structured errors with
|
||||
actionable suggestions, never panics.
|
||||
4. **Platform portable**: The same allocation strategy works on native (Linux, macOS,
|
||||
Windows), WASM, and embedded (Cognitum tiles).
|
||||
5. **No resource leaks**: Rust's ownership system guarantees deallocation on all
|
||||
exit paths (success, error, panic).
|
||||
|
||||
### Negative
|
||||
|
||||
1. **No state caching**: Each simulation allocates and deallocates independently.
|
||||
Repeated simulations on the same qubit count pay allocation cost each time.
|
||||
Mitigation: allocation is O(2^n) but fast compared to O(G * 2^n) simulation.
|
||||
2. **WASM memory high-water mark**: Cannot reclaim WASM linear memory pages.
|
||||
Documented as a platform limitation with instance-recycling workaround.
|
||||
3. **No memory pooling**: Could theoretically amortize allocation across simulations,
|
||||
but this conflicts with the zero-idle-footprint requirement.
|
||||
4. **Yield overhead**: When enabled, cooperative yielding adds per-slice overhead.
|
||||
Mitigated by making it opt-in and configurable.
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| OOM despite estimate_memory check | Low | Crash | Check returns conservative estimate including working memory |
|
||||
| WASM instance runs out of address space | Medium | Failure | Set `WebAssembly.Memory` maximum; document limitation |
|
||||
| Allocation latency spike (OS page faults) | Medium | Slow start | Consider `madvise` / `mlock` hints for large allocations |
|
||||
| Rayon thread pool contention | Medium | Degraded perf | Quantum engine yields between slices; Rayon work-stealing handles contention |
|
||||
|
||||
## References
|
||||
|
||||
- Cognitum Architecture Specification: event-driven tile-based computing
|
||||
- Rust `Vec::try_reserve_exact`: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.try_reserve_exact
|
||||
- WebAssembly Memory: https://webassembly.github.io/spec/core/syntax/modules.html#memories
|
||||
- Rayon thread pool: https://docs.rs/rayon
|
||||
- ADR-QE-001: Core Engine Architecture (zero-overhead design principle)
|
||||
- ADR-QE-005: WASM Compilation Target (WASM constraints)
|
||||
- ADR-QE-009: Tensor Network Evaluation Mode (alternative for large circuits)
|
||||
- ADR-QE-010: Observability & Monitoring (memory metrics reporting)
|
||||
|
|
@ -0,0 +1,876 @@
|
|||
# ADR-QE-012: Min-Cut Coherence Integration
|
||||
|
||||
**Status**: Proposed
|
||||
**Date**: 2026-02-06
|
||||
**Authors**: ruv.io, RuVector Team
|
||||
**Deciders**: Architecture Review Board
|
||||
|
||||
---
|
||||
|
||||
## Context
|
||||
|
||||
The ruVector ecosystem contains several components that must work together for
|
||||
quantum error correction (QEC) simulation:
|
||||
|
||||
1. **ruQu (existing)**: A real-time coherence gating system that performs
|
||||
boundary-to-boundary min-cut analysis on surface code error patterns. It includes
|
||||
a three-filter syndrome pipeline (Structural | Shift | Evidence), a Minimum Weight
|
||||
Perfect Matching (MWPM) decoder, and an early warning system that predicts
|
||||
correlated failures 100+ cycles ahead.
|
||||
|
||||
2. **ruvector-mincut (existing)**: A graph partitioning crate that computes minimum
|
||||
cuts and balanced partitions. Currently used for vector index sharding but
|
||||
directly applicable to syndrome graph decomposition.
|
||||
|
||||
3. **Coherence Engine (ADR-014)**: Computes coherence energy via sheaf Laplacian
|
||||
analysis. The "mincut-gated-transformer" concept uses coherence energy to skip
|
||||
computation on "healthy" regions, achieving up to 50% FLOPs reduction.
|
||||
|
||||
4. **Quantum Simulation Engine (new, ADR-QE-001 through ADR-QE-011)**: The
|
||||
state-vector and tensor-network simulator being designed in this ADR series.
|
||||
|
||||
The challenge is integrating these components into a coherent (pun intended)
|
||||
pipeline where simulated quantum circuits produce syndromes, those syndromes are
|
||||
decoded in real-time, and coherence analysis feeds back into simulation parameters.
|
||||
|
||||
### Surface Code Background
|
||||
|
||||
A distance-d surface code encodes 1 logical qubit in d^2 data qubits + (d^2 - 1)
|
||||
ancilla qubits:
|
||||
|
||||
| Distance | Data qubits | Ancilla qubits | Total qubits | Error threshold |
|
||||
|----------|------------|----------------|--------------|----------------|
|
||||
| 3 | 9 | 8 | 17 | ~1% |
|
||||
| 5 | 25 | 24 | 49 | ~1% |
|
||||
| 7 | 49 | 48 | 97 | ~1% |
|
||||
| 9 | 81 | 80 | 161 | ~1% |
|
||||
| 11 | 121 | 120 | 241 | ~1% |
|
||||
|
||||
Syndrome extraction involves measuring ancilla qubits each cycle. The measurement
|
||||
outcomes (syndromes) indicate where errors may have occurred. The decoder's job is
|
||||
to determine the most likely error pattern from the syndrome and apply corrections.
|
||||
|
||||
### Performance Requirements
|
||||
|
||||
ruQu's existing decoder targets P99 latency of <4 microseconds for syndrome
|
||||
decoding. The integrated simulation + decode pipeline must meet:
|
||||
|
||||
| Operation | Target latency | Notes |
|
||||
|-----------|---------------|-------|
|
||||
| Single syndrome decode | <4 us | Existing ruQu target (MWPM) |
|
||||
| Syndrome extraction sim | <5 ms | One round of ancilla measurement |
|
||||
| Full cycle (sim + decode) | <10 ms | Distance-3, single error cycle |
|
||||
| Full cycle (sim + decode) | <50 ms | Distance-5 |
|
||||
| Full cycle (sim + decode) | <200 ms | Distance-7 (tensor network) |
|
||||
| Early warning evaluation | <1 ms | Check predicted vs actual syndromes |
|
||||
|
||||
## Decision
|
||||
|
||||
### 1. Architecture Overview
|
||||
|
||||
The integration follows a pipeline architecture where data flows from quantum
|
||||
simulation through syndrome extraction, filtering, decoding, and coherence analysis:
|
||||
|
||||
```
|
||||
+------------------------------------------------------------------+
|
||||
| Quantum Error Correction Pipeline |
|
||||
+------------------------------------------------------------------+
|
||||
| |
|
||||
| +------------------+ +---------------------+ |
|
||||
| | Quantum Circuit | | Error Model | |
|
||||
| | (surface code |---->| (depolarizing, | |
|
||||
| | syndrome | | biased noise, | |
|
||||
| | extraction) | | correlated) | |
|
||||
| +------------------+ +---------------------+ |
|
||||
| | | |
|
||||
| v v |
|
||||
| +--------------------------------------------+ |
|
||||
| | Quantum Simulation Engine | |
|
||||
| | (state vector or tensor network) | |
|
||||
| | - Simulates noisy syndrome extraction | |
|
||||
| | - Outputs ancilla measurement outcomes | |
|
||||
| +--------------------------------------------+ |
|
||||
| | |
|
||||
| | syndrome bitstring |
|
||||
| v |
|
||||
| +--------------------------------------------+ |
|
||||
| | SyndromeFilter (ruQu) | |
|
||||
| | Filter 1: Structural (lattice geometry) | |
|
||||
| | Filter 2: Shift (temporal correlations) | |
|
||||
| | Filter 3: Evidence (statistical weight) | |
|
||||
| +--------------------------------------------+ |
|
||||
| | |
|
||||
| | filtered syndrome |
|
||||
| v |
|
||||
| +--------------------------------------------+ |
|
||||
| | MWPM Decoder (ruQu) | |
|
||||
| | - Minimum Weight Perfect Matching | |
|
||||
| | - Returns Pauli correction operators | |
|
||||
| | - Target: <4 us P99 latency | |
|
||||
| +--------------------------------------------+ |
|
||||
| | |
|
||||
| | correction operators (X, Z Paulis) |
|
||||
| v |
|
||||
| +--------------------------------------------+ |
|
||||
| | Correction Application | |
|
||||
| | - Apply Pauli gates to simulated state | |
|
||||
| | - Verify logical qubit integrity | |
|
||||
| +--------------------------------------------+ |
|
||||
| | |
|
||||
| | corrected state |
|
||||
| v |
|
||||
| +-----------------------+ +-------------------------+ |
|
||||
| | Coherence Engine | | Early Warning System | |
|
||||
| | (sheaf Laplacian) | | (100+ cycle prediction) | |
|
||||
| | - Compute coherence |<-->| - Correlate historical | |
|
||||
| | energy | | syndromes | |
|
||||
| | - Gate simulation | | - Predict failures | |
|
||||
| | FLOPs if healthy | | - Feed back to sim | |
|
||||
| +-----------------------+ +-------------------------+ |
|
||||
| | | |
|
||||
| v v |
|
||||
| +--------------------------------------------+ |
|
||||
| | Cryptographic Audit Trail | |
|
||||
| | - Ed25519 signed decisions | |
|
||||
| | - Blake3 hash chains | |
|
||||
| | - Every syndrome, decode, correction logged | |
|
||||
| +--------------------------------------------+ |
|
||||
| |
|
||||
+------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### 2. Syndrome-to-Decoder Bridge
|
||||
|
||||
The quantum simulation engine outputs raw measurement bitstrings. These are
|
||||
converted to the syndrome format expected by ruQu's decoder:
|
||||
|
||||
```rust
|
||||
/// Bridge between quantum simulation output and ruQu decoder input.
|
||||
pub struct SyndromeBridge;
|
||||
|
||||
impl SyndromeBridge {
|
||||
/// Convert simulation measurement outcomes to ruQu syndrome format.
|
||||
///
|
||||
/// The simulation measures ancilla qubits. A detection event occurs
|
||||
/// when an ancilla measurement differs from the previous round
|
||||
/// (or from the expected value in the first round).
|
||||
pub fn extract_syndrome(
|
||||
measurements: &MeasurementOutcome,
|
||||
code: &SurfaceCodeLayout,
|
||||
previous_round: Option<&SyndromeRound>,
|
||||
) -> SyndromeRound {
|
||||
let mut detections = Vec::new();
|
||||
|
||||
for ancilla in code.ancilla_qubits() {
|
||||
let current = measurements.get(ancilla.index());
|
||||
let previous = previous_round
|
||||
.map(|r| r.get(ancilla.id()))
|
||||
.unwrap_or(0); // Expected value in first round
|
||||
|
||||
if current != previous {
|
||||
detections.push(Detection {
|
||||
ancilla_id: ancilla.id(),
|
||||
ancilla_type: ancilla.stabilizer_type(), // X or Z
|
||||
position: ancilla.lattice_position(),
|
||||
round: measurements.round_number(),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
SyndromeRound {
|
||||
round: measurements.round_number(),
|
||||
detections,
|
||||
raw_measurements: measurements.ancilla_bits().to_vec(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Apply decoder corrections back to the simulation state.
|
||||
pub fn apply_corrections(
|
||||
state: &mut StateVector,
|
||||
corrections: &DecoderCorrection,
|
||||
code: &SurfaceCodeLayout,
|
||||
) {
|
||||
for (qubit_id, pauli) in &corrections.operations {
|
||||
let qubit_index = code.data_qubit_index(*qubit_id);
|
||||
match pauli {
|
||||
Pauli::X => state.apply_x(qubit_index),
|
||||
Pauli::Z => state.apply_z(qubit_index),
|
||||
Pauli::Y => {
|
||||
state.apply_x(qubit_index);
|
||||
state.apply_z(qubit_index);
|
||||
}
|
||||
Pauli::I => {} // No correction needed
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. SyndromeFilter Pipeline (ruQu Integration)
|
||||
|
||||
The three-filter pipeline processes raw syndromes before decoding:
|
||||
|
||||
```rust
|
||||
/// ruQu's three-stage syndrome filtering pipeline.
|
||||
pub struct SyndromeFilterPipeline {
|
||||
structural: StructuralFilter,
|
||||
shift: ShiftFilter,
|
||||
evidence: EvidenceFilter,
|
||||
}
|
||||
|
||||
impl SyndromeFilterPipeline {
|
||||
/// Process a syndrome round through all three filters.
|
||||
pub fn filter(&mut self, syndrome: SyndromeRound) -> FilteredSyndrome {
|
||||
// Filter 1: Structural
|
||||
// Removes detections inconsistent with lattice geometry.
|
||||
// E.g., isolated detections with no nearby partner.
|
||||
let after_structural = self.structural.apply(&syndrome);
|
||||
|
||||
// Filter 2: Shift
|
||||
// Accounts for temporal correlations between rounds.
|
||||
// Detections that appear and disappear in consecutive rounds
|
||||
// may be measurement errors (not data errors).
|
||||
let after_shift = self.shift.apply(&after_structural);
|
||||
|
||||
// Filter 3: Evidence
|
||||
// Weights remaining detections by statistical evidence.
|
||||
// Uses error model probabilities to assign confidence scores.
|
||||
let after_evidence = self.evidence.apply(&after_shift);
|
||||
|
||||
after_evidence
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. MWPM Decoder Integration
|
||||
|
||||
The filtered syndrome feeds into ruQu's MWPM decoder:
|
||||
|
||||
```rust
|
||||
/// Interface to ruQu's Minimum Weight Perfect Matching decoder.
|
||||
pub trait SyndromeDecoder {
|
||||
/// Decode a filtered syndrome into correction operations.
|
||||
/// Target: <4 microseconds P99 latency.
|
||||
fn decode(
|
||||
&self,
|
||||
syndrome: &FilteredSyndrome,
|
||||
code: &SurfaceCodeLayout,
|
||||
) -> DecoderCorrection;
|
||||
|
||||
/// Decode with timing information for performance monitoring.
|
||||
fn decode_timed(
|
||||
&self,
|
||||
syndrome: &FilteredSyndrome,
|
||||
code: &SurfaceCodeLayout,
|
||||
) -> (DecoderCorrection, DecoderTiming);
|
||||
}
|
||||
|
||||
pub struct DecoderCorrection {
|
||||
/// Pauli corrections to apply to data qubits.
|
||||
pub operations: Vec<(QubitId, Pauli)>,
|
||||
|
||||
/// Confidence score (0.0 = no confidence, 1.0 = certain).
|
||||
pub confidence: f64,
|
||||
|
||||
/// Whether a logical error was detected (correction may be wrong).
|
||||
pub logical_error_detected: bool,
|
||||
|
||||
/// Matching weight (lower is more likely).
|
||||
pub matching_weight: f64,
|
||||
}
|
||||
|
||||
pub struct DecoderTiming {
|
||||
/// Total decode time.
|
||||
pub total_ns: u64,
|
||||
|
||||
/// Time spent building the matching graph.
|
||||
pub graph_construction_ns: u64,
|
||||
|
||||
/// Time spent in the MWPM algorithm.
|
||||
pub matching_ns: u64,
|
||||
|
||||
/// Number of detection events in the input.
|
||||
pub num_detections: usize,
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Min-Cut Graph Partitioning for Parallel Decoding
|
||||
|
||||
For large surface codes (distance >= 7), the syndrome graph can be partitioned
|
||||
using `ruvector-mincut` for parallel decoding:
|
||||
|
||||
```rust
|
||||
use ruvector_mincut::{partition, PartitionConfig, WeightedGraph};
|
||||
|
||||
/// Partition the syndrome graph for parallel decoding.
|
||||
/// This exploits spatial locality in the surface code: errors in
|
||||
/// distant regions can be decoded independently.
|
||||
pub fn parallel_decode(
|
||||
syndrome: &FilteredSyndrome,
|
||||
code: &SurfaceCodeLayout,
|
||||
decoder: &dyn SyndromeDecoder,
|
||||
) -> DecoderCorrection {
|
||||
// Build the detection graph (nodes = detections, edges = possible errors)
|
||||
let detection_graph = build_detection_graph(syndrome, code);
|
||||
|
||||
// If small enough, decode directly
|
||||
if detection_graph.num_nodes() <= 20 {
|
||||
return decoder.decode(syndrome, code);
|
||||
}
|
||||
|
||||
// Partition the detection graph using ruvector-mincut
|
||||
let config = PartitionConfig {
|
||||
num_partitions: estimate_partition_count(&detection_graph),
|
||||
balance_factor: 1.2,
|
||||
minimize: Objective::EdgeCut,
|
||||
};
|
||||
let partitions = partition(&detection_graph, &config);
|
||||
|
||||
// Decode each partition independently (in parallel via Rayon)
|
||||
let partial_corrections: Vec<DecoderCorrection> = partitions
|
||||
.par_iter()
|
||||
.map(|partition| {
|
||||
let sub_syndrome = syndrome.restrict_to(partition);
|
||||
decoder.decode(&sub_syndrome, code)
|
||||
})
|
||||
.collect();
|
||||
|
||||
// Handle boundary edges (detections that span partitions)
|
||||
let boundary_correction = decode_boundary_edges(
|
||||
syndrome, code, &partitions, decoder,
|
||||
);
|
||||
|
||||
// Merge all corrections
|
||||
merge_corrections(partial_corrections, boundary_correction)
|
||||
}
|
||||
|
||||
/// Estimate optimal partition count based on detection density.
|
||||
fn estimate_partition_count(graph: &WeightedGraph) -> usize {
|
||||
let n = graph.num_nodes();
|
||||
if n <= 20 { 1 }
|
||||
else if n <= 50 { 2 }
|
||||
else if n <= 100 { 4 }
|
||||
else { (n / 25).min(rayon::current_num_threads()) }
|
||||
}
|
||||
```
|
||||
|
||||
This matches ruQu's existing boundary-to-boundary min-cut analysis: the partition
|
||||
boundaries correspond to the cuts in the syndrome graph where independent decoding
|
||||
regions meet.
|
||||
|
||||
### 6. Coherence Gating for Simulation FLOPs Reduction
|
||||
|
||||
The sheaf Laplacian coherence energy (from ADR-014) provides a measure of how
|
||||
"healthy" a quantum state region is. High coherence energy means the region is
|
||||
behaving as expected (low error rate). This enables a novel optimization:
|
||||
|
||||
```
|
||||
Coherence Gating Decision Tree
|
||||
================================
|
||||
|
||||
For each region R of the surface code:
|
||||
|
||||
1. Compute coherence energy E(R) via sheaf Laplacian
|
||||
|
||||
2. Compare to thresholds:
|
||||
|
||||
E(R) > E_high (0.95)
|
||||
|
|
||||
+-- Region is HEALTHY
|
||||
| Action: SKIP detailed simulation for this region
|
||||
| Use: simplified noise model (Pauli channel approximation)
|
||||
| Savings: ~50% FLOPs for this region
|
||||
|
|
||||
E_low (0.70) < E(R) <= E_high (0.95)
|
||||
|
|
||||
+-- Region is NOMINAL
|
||||
| Action: STANDARD simulation
|
||||
| Use: full gate-by-gate simulation with noise
|
||||
| Savings: none
|
||||
|
|
||||
E(R) <= E_low (0.70)
|
||||
|
|
||||
+-- Region is DEGRADED
|
||||
| Action: ENHANCED simulation
|
||||
| Use: full simulation + additional diagnostics
|
||||
| Extra: log detailed error patterns, trigger early warning
|
||||
| Savings: negative (more work, but necessary)
|
||||
```
|
||||
|
||||
Implementation:
|
||||
|
||||
```rust
|
||||
/// Coherence-gated simulation mode.
|
||||
/// Uses coherence energy to decide simulation fidelity per region.
|
||||
pub struct CoherenceGatedSimulator {
|
||||
/// Full-fidelity simulator for nominal/degraded regions.
|
||||
full_simulator: Box<dyn SimulationBackend>,
|
||||
|
||||
/// Simplified simulator for healthy regions.
|
||||
simplified_simulator: SimplifiedNoiseModel,
|
||||
|
||||
/// Coherence engine for computing region health.
|
||||
coherence_engine: CoherenceEngine,
|
||||
|
||||
/// Thresholds for gating decisions.
|
||||
high_threshold: f64,
|
||||
low_threshold: f64,
|
||||
}
|
||||
|
||||
impl CoherenceGatedSimulator {
|
||||
/// Simulate one QEC cycle with coherence gating.
|
||||
pub fn simulate_cycle(
|
||||
&mut self,
|
||||
state: &mut StateVector,
|
||||
code: &SurfaceCodeLayout,
|
||||
error_model: &ErrorModel,
|
||||
history: &SyndromeHistory,
|
||||
) -> CycleResult {
|
||||
// Step 1: Compute coherence energy per region
|
||||
let regions = code.spatial_regions();
|
||||
let coherence = self.coherence_engine.compute_regional(
|
||||
history, ®ions,
|
||||
);
|
||||
|
||||
// Step 2: Classify regions and simulate accordingly
|
||||
let mut cycle_syndromes = Vec::new();
|
||||
let mut flops_saved = 0_u64;
|
||||
let mut flops_total = 0_u64;
|
||||
|
||||
for (region, energy) in regions.iter().zip(coherence.energies()) {
|
||||
let region_qubits = code.qubits_in_region(region);
|
||||
|
||||
if *energy > self.high_threshold {
|
||||
// HEALTHY: Use simplified Pauli noise model
|
||||
let syndrome = self.simplified_simulator.simulate_region(
|
||||
state, ®ion_qubits, error_model,
|
||||
);
|
||||
let full_cost = estimate_full_sim_cost(®ion_qubits);
|
||||
let simplified_cost = estimate_simplified_cost(®ion_qubits);
|
||||
flops_saved += full_cost - simplified_cost;
|
||||
flops_total += simplified_cost;
|
||||
cycle_syndromes.push(syndrome);
|
||||
|
||||
} else if *energy > self.low_threshold {
|
||||
// NOMINAL: Full simulation
|
||||
let syndrome = self.full_simulator.simulate_region(
|
||||
state, ®ion_qubits, error_model,
|
||||
);
|
||||
let cost = estimate_full_sim_cost(®ion_qubits);
|
||||
flops_total += cost;
|
||||
cycle_syndromes.push(syndrome);
|
||||
|
||||
} else {
|
||||
// DEGRADED: Full simulation + diagnostics
|
||||
let syndrome = self.full_simulator.simulate_region_with_diagnostics(
|
||||
state, ®ion_qubits, error_model,
|
||||
);
|
||||
let cost = estimate_full_sim_cost(®ion_qubits) * 12 / 10;
|
||||
flops_total += cost;
|
||||
cycle_syndromes.push(syndrome);
|
||||
|
||||
// Trigger early warning system
|
||||
tracing::warn!(
|
||||
region = %region.id(),
|
||||
coherence_energy = energy,
|
||||
"Degraded coherence detected; enhanced monitoring active"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
CycleResult {
|
||||
syndromes: merge_region_syndromes(cycle_syndromes),
|
||||
flops_saved,
|
||||
flops_total,
|
||||
coherence_energies: coherence,
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 7. Cryptographic Audit Trail
|
||||
|
||||
All syndrome decisions are signed and chained for tamper-evident logging, following
|
||||
the existing ruQu pattern:
|
||||
|
||||
```rust
|
||||
use ed25519_dalek::{SigningKey, Signature, Signer};
|
||||
use blake3::Hasher;
|
||||
|
||||
/// Cryptographically auditable decision record.
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
pub struct AuditRecord {
|
||||
/// Sequence number in the audit chain.
|
||||
pub sequence: u64,
|
||||
|
||||
/// Blake3 hash of the previous record (chain linkage).
|
||||
pub previous_hash: [u8; 32],
|
||||
|
||||
/// Timestamp (nanosecond precision).
|
||||
pub timestamp_ns: u128,
|
||||
|
||||
/// The decision being recorded.
|
||||
pub decision: AuditableDecision,
|
||||
|
||||
/// Ed25519 signature over (sequence || previous_hash || timestamp || decision).
|
||||
pub signature: Signature,
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
pub enum AuditableDecision {
|
||||
/// Raw syndrome from simulation.
|
||||
SyndromeExtracted {
|
||||
round: u64,
|
||||
detections: Vec<Detection>,
|
||||
simulation_id: Uuid,
|
||||
},
|
||||
|
||||
/// Filtered syndrome after pipeline.
|
||||
SyndromeFiltered {
|
||||
round: u64,
|
||||
detections_before: usize,
|
||||
detections_after: usize,
|
||||
filters_applied: Vec<String>,
|
||||
},
|
||||
|
||||
/// Decoder correction decision.
|
||||
CorrectionApplied {
|
||||
round: u64,
|
||||
corrections: Vec<(QubitId, Pauli)>,
|
||||
confidence: f64,
|
||||
decode_time_ns: u64,
|
||||
},
|
||||
|
||||
/// Coherence gating decision.
|
||||
CoherenceGating {
|
||||
round: u64,
|
||||
region_id: String,
|
||||
coherence_energy: f64,
|
||||
decision: GatingDecision,
|
||||
flops_saved: u64,
|
||||
},
|
||||
|
||||
/// Early warning alert.
|
||||
EarlyWarning {
|
||||
round: u64,
|
||||
predicted_failure_round: u64,
|
||||
confidence: f64,
|
||||
affected_region: String,
|
||||
},
|
||||
|
||||
/// Logical error detected.
|
||||
LogicalError {
|
||||
round: u64,
|
||||
error_type: String,
|
||||
decoder_confidence: f64,
|
||||
},
|
||||
}
|
||||
|
||||
#[derive(Debug, Serialize, Deserialize)]
|
||||
pub enum GatingDecision {
|
||||
SkipDetailedSimulation,
|
||||
StandardSimulation,
|
||||
EnhancedSimulation,
|
||||
}
|
||||
|
||||
/// Audit trail manager.
|
||||
pub struct AuditTrail {
|
||||
signing_key: SigningKey,
|
||||
chain_head: [u8; 32],
|
||||
sequence: u64,
|
||||
}
|
||||
|
||||
impl AuditTrail {
|
||||
/// Record a decision in the audit trail.
|
||||
pub fn record(&mut self, decision: AuditableDecision) -> AuditRecord {
|
||||
let timestamp_ns = std::time::SystemTime::now()
|
||||
.duration_since(std::time::UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_nanos();
|
||||
|
||||
// Compute hash of the decision content
|
||||
let mut hasher = Hasher::new();
|
||||
hasher.update(&self.sequence.to_le_bytes());
|
||||
hasher.update(&self.chain_head);
|
||||
hasher.update(×tamp_ns.to_le_bytes());
|
||||
hasher.update(&bincode::serialize(&decision).unwrap());
|
||||
let content_hash = hasher.finalize();
|
||||
|
||||
// Sign the hash
|
||||
let signature = self.signing_key.sign(content_hash.as_bytes());
|
||||
|
||||
let record = AuditRecord {
|
||||
sequence: self.sequence,
|
||||
previous_hash: self.chain_head,
|
||||
timestamp_ns,
|
||||
decision,
|
||||
signature,
|
||||
};
|
||||
|
||||
// Update chain
|
||||
self.chain_head = *content_hash.as_bytes();
|
||||
self.sequence += 1;
|
||||
|
||||
record
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 8. Early Warning Feedback Loop
|
||||
|
||||
ruQu's early warning system predicts correlated failures 100+ cycles ahead. This
|
||||
prediction feeds back into the simulation engine to validate decoder robustness:
|
||||
|
||||
```rust
|
||||
/// Early warning integration with quantum simulation.
|
||||
pub struct EarlyWarningIntegration {
|
||||
warning_system: EarlyWarningSystem,
|
||||
error_injector: ErrorInjector,
|
||||
}
|
||||
|
||||
impl EarlyWarningIntegration {
|
||||
/// Check early warning predictions and optionally inject
|
||||
/// targeted errors to validate decoder response.
|
||||
pub fn process_cycle(
|
||||
&mut self,
|
||||
history: &SyndromeHistory,
|
||||
state: &mut StateVector,
|
||||
code: &SurfaceCodeLayout,
|
||||
) -> Vec<EarlyWarningAction> {
|
||||
let predictions = self.warning_system.predict(history);
|
||||
let mut actions = Vec::new();
|
||||
|
||||
for prediction in &predictions {
|
||||
if prediction.confidence > 0.8 {
|
||||
// High-confidence prediction: inject targeted errors
|
||||
// to validate that the decoder handles this failure mode
|
||||
let targeted_errors = self.error_injector.generate_targeted(
|
||||
&prediction.affected_region,
|
||||
&prediction.predicted_error_pattern,
|
||||
code,
|
||||
);
|
||||
|
||||
actions.push(EarlyWarningAction::InjectTargetedErrors {
|
||||
region: prediction.affected_region.clone(),
|
||||
errors: targeted_errors,
|
||||
prediction_confidence: prediction.confidence,
|
||||
predicted_failure_round: prediction.failure_round,
|
||||
});
|
||||
|
||||
tracing::info!(
|
||||
confidence = prediction.confidence,
|
||||
failure_round = prediction.failure_round,
|
||||
region = %prediction.affected_region,
|
||||
"Early warning: injecting targeted errors for decoder validation"
|
||||
);
|
||||
} else if prediction.confidence > 0.5 {
|
||||
// Moderate confidence: increase monitoring, do not inject
|
||||
actions.push(EarlyWarningAction::IncreasedMonitoring {
|
||||
region: prediction.affected_region.clone(),
|
||||
enhanced_diagnostics: true,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
actions
|
||||
}
|
||||
}
|
||||
|
||||
pub enum EarlyWarningAction {
|
||||
/// Inject targeted errors to test decoder response.
|
||||
InjectTargetedErrors {
|
||||
region: String,
|
||||
errors: Vec<InjectedError>,
|
||||
prediction_confidence: f64,
|
||||
predicted_failure_round: u64,
|
||||
},
|
||||
/// Increase monitoring without error injection.
|
||||
IncreasedMonitoring {
|
||||
region: String,
|
||||
enhanced_diagnostics: bool,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### 9. Performance Targets
|
||||
|
||||
| Pipeline stage | Target latency | Distance-3 | Distance-5 | Distance-7 |
|
||||
|---|---|---|---|---|
|
||||
| Syndrome extraction (sim) | Varies | 2 ms | 15 ms | 80 ms |
|
||||
| Syndrome filtering | <0.5 ms | 0.1 ms | 0.2 ms | 0.4 ms |
|
||||
| MWPM decoding | <4 us | 1 us | 2 us | 3.5 us |
|
||||
| Correction application | <0.1 ms | 0.01 ms | 0.05 ms | 0.08 ms |
|
||||
| Coherence computation | <1 ms | 0.3 ms | 0.5 ms | 0.8 ms |
|
||||
| Audit record creation | <0.05 ms | 0.02 ms | 0.03 ms | 0.04 ms |
|
||||
| **Total cycle** | | **~3 ms** | **~16 ms** | **~82 ms** |
|
||||
|
||||
For distance-7 and above, the tensor network backend (ADR-QE-009) is used for
|
||||
the syndrome extraction simulation, as 97 qubits exceeds state-vector capacity.
|
||||
|
||||
### 10. Integration Data Flow Summary
|
||||
|
||||
```
|
||||
+-------------------+
|
||||
| QuantumCircuit | Surface code syndrome extraction circuit
|
||||
| (parameterized by | with noise model applied
|
||||
| error model) |
|
||||
+--------+----------+
|
||||
|
|
||||
v
|
||||
+--------+----------+
|
||||
| SimulationEngine | State vector (d<=5) or tensor network (d>=7)
|
||||
| execute() |
|
||||
+--------+----------+
|
||||
|
|
||||
| MeasurementOutcome (ancilla bitstring)
|
||||
v
|
||||
+--------+----------+
|
||||
| SyndromeBridge | Convert measurements to detection events
|
||||
| extract_syndrome()|
|
||||
+--------+----------+
|
||||
|
|
||||
| SyndromeRound
|
||||
v
|
||||
+--------+----------+
|
||||
| SyndromeFilter | Three-stage filtering (Structural|Shift|Evidence)
|
||||
| Pipeline |
|
||||
+--------+----------+
|
||||
|
|
||||
| FilteredSyndrome
|
||||
v
|
||||
+--------+----------+ +------------------+
|
||||
| MWPM Decoder |<--->| ruvector-mincut | Parallel decoding
|
||||
| (ruQu) | | graph partition | for large codes
|
||||
+--------+----------+ +------------------+
|
||||
|
|
||||
| DecoderCorrection (Pauli operators)
|
||||
v
|
||||
+--------+----------+
|
||||
| Correction Apply | Apply X/Z/Y Paulis to simulated state
|
||||
+--------+----------+
|
||||
|
|
||||
| Corrected state
|
||||
v
|
||||
+--------+--+------+-----+---+
|
||||
| | | |
|
||||
v v v v
|
||||
Coherence Early Warning Audit Trail
|
||||
Engine System (Ed25519 +
|
||||
(sheaf (100+ cycle Blake3)
|
||||
Laplacian) prediction)
|
||||
| |
|
||||
| +---> Feeds back to simulation
|
||||
| (targeted error injection)
|
||||
|
|
||||
+---> Coherence gating
|
||||
(skip/standard/enhanced sim)
|
||||
~50% FLOPs reduction when healthy
|
||||
```
|
||||
|
||||
### 11. API Surface
|
||||
|
||||
The complete integration is exposed through a high-level API:
|
||||
|
||||
```rust
|
||||
/// High-level QEC simulation with full pipeline integration.
|
||||
pub struct QecSimulator {
|
||||
engine: QuantumEngine,
|
||||
bridge: SyndromeBridge,
|
||||
filter: SyndromeFilterPipeline,
|
||||
decoder: Box<dyn SyndromeDecoder>,
|
||||
coherence: Option<CoherenceGatedSimulator>,
|
||||
early_warning: Option<EarlyWarningIntegration>,
|
||||
audit: AuditTrail,
|
||||
history: SyndromeHistory,
|
||||
}
|
||||
|
||||
impl QecSimulator {
|
||||
/// Run N cycles of QEC simulation.
|
||||
pub fn run_cycles(
|
||||
&mut self,
|
||||
code: &SurfaceCodeLayout,
|
||||
error_model: &ErrorModel,
|
||||
num_cycles: usize,
|
||||
) -> QecSimulationResult {
|
||||
let mut results = Vec::with_capacity(num_cycles);
|
||||
|
||||
for cycle in 0..num_cycles {
|
||||
let cycle_result = self.run_single_cycle(code, error_model, cycle);
|
||||
results.push(cycle_result);
|
||||
}
|
||||
|
||||
QecSimulationResult {
|
||||
cycles: results,
|
||||
logical_error_rate: self.compute_logical_error_rate(&results),
|
||||
total_flops_saved: results.iter().map(|r| r.flops_saved).sum(),
|
||||
decoder_latency_p99: self.compute_decoder_p99(&results),
|
||||
}
|
||||
}
|
||||
|
||||
fn run_single_cycle(
|
||||
&mut self,
|
||||
code: &SurfaceCodeLayout,
|
||||
error_model: &ErrorModel,
|
||||
cycle: usize,
|
||||
) -> CycleResult {
|
||||
// ... full pipeline as described above
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Unified pipeline**: Simulation, decoding, coherence analysis, and auditing
|
||||
work together seamlessly rather than as disconnected tools.
|
||||
2. **Real performance gains**: Coherence gating can reduce simulation FLOPs by
|
||||
~50% for healthy regions, directly applicable to long QEC simulations.
|
||||
3. **Decoder validation**: The simulation engine provides a controlled environment
|
||||
to test decoder correctness under various error models.
|
||||
4. **Early warning validation**: Predicted failures can be injected and the decoder's
|
||||
response verified, increasing confidence in the early warning system.
|
||||
5. **Auditable**: Every decision in the pipeline is cryptographically signed and
|
||||
hash-chained, meeting compliance requirements for safety-critical applications.
|
||||
6. **Leverages existing infrastructure**: `ruvector-mincut`, ruQu's decoder, and
|
||||
the coherence engine are reused rather than reimplemented.
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Coupling**: The integration creates dependencies between previously independent
|
||||
crates. Changes to ruQu's syndrome format require updates to the bridge.
|
||||
Mitigation: trait abstractions at integration boundaries.
|
||||
2. **Complexity**: The full pipeline has many stages, each with its own configuration
|
||||
and failure modes. Mitigation: sensible defaults and the high-level `QecSimulator`
|
||||
API that hides complexity.
|
||||
3. **Performance overhead**: Coherence computation and audit trail signing add
|
||||
latency to each cycle. Mitigation: both are optional and can be disabled.
|
||||
4. **Tensor network dependency**: Distance >= 7 codes require the tensor network
|
||||
backend, which is behind a feature flag and may not always be compiled in.
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Coherence gating skips a region that has real errors | Low | Missed errors | Conservative thresholds; periodic full-fidelity verification cycles |
|
||||
| MWPM decoder exceeds 4us on partitioned syndrome | Medium | Latency violation | Adaptive partition count; fallback to non-partitioned decode |
|
||||
| Early warning false positives cause unnecessary error injection | Medium | Wasted cycles | Confidence threshold (>0.8) gates injection; injection is rate-limited |
|
||||
| Audit trail storage grows unboundedly | Medium | Disk exhaustion | Configurable retention; periodic pruning of old records |
|
||||
| Syndrome format version mismatch between sim and decoder | Low | Decode failure | Version field in SyndromeRound; compatibility checks at pipeline init |
|
||||
|
||||
## References
|
||||
|
||||
- ruQu crate: boundary-to-boundary min-cut coherence gating
|
||||
- ruQu SyndromeFilter: three-filter pipeline (Structural | Shift | Evidence)
|
||||
- `ruvector-mincut` crate: graph partitioning for parallel decoding
|
||||
- ADR-014: Coherence Engine (sheaf Laplacian coherence computation)
|
||||
- ADR-CE-001: Sheaf Laplacian (mathematical foundation)
|
||||
- ADR-QE-001: Core Engine Architecture (simulation backends)
|
||||
- ADR-QE-009: Tensor Network Evaluation Mode (large code simulation)
|
||||
- ADR-QE-010: Observability & Monitoring (metrics for pipeline stages)
|
||||
- ADR-QE-011: Memory Gating & Power Management (resource constraints)
|
||||
- Fowler et al., "Surface codes: Towards practical large-scale quantum computation" (2012)
|
||||
- Higgott, "PyMatching: A Python package for decoding quantum codes with MWPM" (2022)
|
||||
- Dennis et al., "Topological quantum memory" (2002) -- MWPM decoding
|
||||
- Ed25519: https://ed25519.cr.yp.to/
|
||||
- Blake3: https://github.com/BLAKE3-team/BLAKE3
|
||||
|
|
@ -0,0 +1,816 @@
|
|||
# Quantum Simulation Engine: Domain-Driven Design - Integration Patterns
|
||||
|
||||
**Version**: 0.1
|
||||
**Date**: 2026-02-06
|
||||
**Status**: Draft
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document defines the cross-domain integration patterns, anti-corruption layers, shared kernel, and context mapping that connect the quantum simulation engine (`ruqu-core`, `ruqu-algorithms`, `ruqu-wasm`) to the existing ruVector subsystems. It specifies how the simulation domain communicates with the coherence engine, agent system, graph database, and WASM platform without contaminating bounded context boundaries.
|
||||
|
||||
---
|
||||
|
||||
## Context Map
|
||||
|
||||
```
|
||||
+-------------------------------------------------------------------+
|
||||
| CONTEXT MAP |
|
||||
| |
|
||||
| +--------------------+ Shared Kernel +------------------+ |
|
||||
| | |<----(ruvector-math)--->| | |
|
||||
| | Quantum Sim | | Coherence | |
|
||||
| | Engine | | Engine | |
|
||||
| | (ruqu-core, | Anti-Corruption | (ruvector- | |
|
||||
| | ruqu-algorithms) |<----(CoherenceBridge) | coherence) | |
|
||||
| | | | | |
|
||||
| +--------+-----------+ +------------------+ |
|
||||
| | ^ |
|
||||
| | Customer-Supplier | |
|
||||
| v | |
|
||||
| +--------------------+ +---------+--------+ |
|
||||
| | | Partnership | | |
|
||||
| | Agent System |<----------------->| Graph Database | |
|
||||
| | (claude-flow) | | (ruvector-graph)| |
|
||||
| | | | | |
|
||||
| +--------------------+ +------------------+ |
|
||||
| | |
|
||||
| | Conformist |
|
||||
| v |
|
||||
| +--------------------+ Published Language |
|
||||
| | |<----(OpenQASM 3.0) |
|
||||
| | WASM Platform | |
|
||||
| | (ruqu-wasm) | |
|
||||
| | | |
|
||||
| +--------------------+ |
|
||||
+-------------------------------------------------------------------+
|
||||
```
|
||||
|
||||
### Relationship Summary
|
||||
|
||||
| Upstream | Downstream | Pattern | Shared Artifact |
|
||||
|----------|------------|---------|-----------------|
|
||||
| Quantum Engine | Coherence Engine | Anti-Corruption Layer | `CoherenceBridge` trait |
|
||||
| ruvector-math | Quantum Engine, Coherence Engine | Shared Kernel | `Complex<f64>`, SIMD traits |
|
||||
| Quantum Engine | Agent System | Customer-Supplier | `SimulationContract` |
|
||||
| ruvector-graph | Quantum Engine | Partnership | Adjacency structures |
|
||||
| External tools | Quantum Engine | Published Language | OpenQASM 3.0 |
|
||||
| WASM platform | ruqu-wasm | Conformist | WASM constraints accepted |
|
||||
|
||||
---
|
||||
|
||||
## 1. Anti-Corruption Layer: Coherence Bridge
|
||||
|
||||
The Coherence Bridge translates between the quantum simulation domain language and the ruQu coherence domain. It prevents internal types from either domain from leaking into the other.
|
||||
|
||||
### Purpose
|
||||
|
||||
- Map syndrome bitstrings produced by surface code experiments into the `SyndromeFilter` input format expected by the coherence engine
|
||||
- Map decoder correction outputs (Pauli operators) to gate operations the simulation can apply
|
||||
- Translate coherence scores into the `CoherenceScore` value object used by simulation sessions
|
||||
- Isolate the quantum simulation engine from changes in the coherence engine's internal API
|
||||
|
||||
### Interface
|
||||
|
||||
```rust
|
||||
/// Anti-corruption layer between quantum simulation and coherence engine.
|
||||
///
|
||||
/// All translation between bounded contexts passes through this trait.
|
||||
/// Neither domain's internal types appear on the wrong side of this boundary.
|
||||
pub trait CoherenceBridge: Send + Sync {
|
||||
/// Translate a quantum syndrome into a coherence engine filter input.
|
||||
///
|
||||
/// The simulation produces `SyndromeBits`; the coherence engine expects
|
||||
/// `DetectorBitmap` with specific tile routing. This method handles the
|
||||
/// mapping, including stabilizer-to-detector index translation.
|
||||
fn syndrome_to_filter_input(
|
||||
&self,
|
||||
syndrome: &SyndromeBits,
|
||||
code_distance: u32,
|
||||
) -> Result<CoherenceFilterInput, BridgeError>;
|
||||
|
||||
/// Translate a coherence decoder correction into Pauli gate operations.
|
||||
///
|
||||
/// The coherence engine's decoder outputs correction vectors in its own
|
||||
/// format. This method maps them to `PauliOp` sequences that the
|
||||
/// simulation engine can apply as gate operations.
|
||||
fn correction_to_pauli_ops(
|
||||
&self,
|
||||
correction: &CoherenceCorrectionOutput,
|
||||
) -> Result<Vec<(QubitIndex, PauliOp)>, BridgeError>;
|
||||
|
||||
/// Query the current coherence score for a simulation region.
|
||||
///
|
||||
/// Returns a domain-native `CoherenceScore` value object, hiding
|
||||
/// the coherence engine's internal energy representation.
|
||||
fn query_coherence_score(
|
||||
&self,
|
||||
region_id: &str,
|
||||
) -> Result<CoherenceScore, BridgeError>;
|
||||
|
||||
/// Submit simulation metrics to the coherence monitoring system.
|
||||
///
|
||||
/// Translates `SimulationMetrics` into the coherence engine's
|
||||
/// signal ingestion format without exposing internal types.
|
||||
fn report_simulation_metrics(
|
||||
&self,
|
||||
session_id: &str,
|
||||
metrics: &SimulationMetrics,
|
||||
) -> Result<(), BridgeError>;
|
||||
}
|
||||
|
||||
/// Opaque input type for the coherence filter (ACL boundary type).
|
||||
pub struct CoherenceFilterInput {
|
||||
pub detector_bitmap: Vec<u64>,
|
||||
pub tile_id: u8,
|
||||
pub round_id: u64,
|
||||
}
|
||||
|
||||
/// Opaque output type from the coherence decoder (ACL boundary type).
|
||||
pub struct CoherenceCorrectionOutput {
|
||||
pub corrections: Vec<(u32, u8)>, // (qubit_index, pauli_code)
|
||||
pub confidence: f64,
|
||||
}
|
||||
|
||||
/// Errors specific to the bridge translation layer.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum BridgeError {
|
||||
#[error("syndrome dimension mismatch: expected {expected}, got {actual}")]
|
||||
SyndromeDimensionMismatch { expected: usize, actual: usize },
|
||||
|
||||
#[error("unknown correction code: {0}")]
|
||||
UnknownCorrectionCode(u8),
|
||||
|
||||
#[error("coherence engine unavailable: {0}")]
|
||||
CoherenceUnavailable(String),
|
||||
|
||||
#[error("tile routing failed for code distance {0}")]
|
||||
TileRoutingFailed(u32),
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Sketch
|
||||
|
||||
```rust
|
||||
/// Production implementation backed by the ruQu coherence engine.
|
||||
pub struct RuQuCoherenceBridge {
|
||||
/// Reference to the coherence engine's filter pipeline.
|
||||
filter_pipeline: Arc<dyn FilterPipelineAccess>,
|
||||
/// Stabilizer-to-detector mapping, precomputed per code distance.
|
||||
detector_maps: HashMap<u32, StabilizerDetectorMap>,
|
||||
}
|
||||
|
||||
impl CoherenceBridge for RuQuCoherenceBridge {
|
||||
fn syndrome_to_filter_input(
|
||||
&self,
|
||||
syndrome: &SyndromeBits,
|
||||
code_distance: u32,
|
||||
) -> Result<CoherenceFilterInput, BridgeError> {
|
||||
let map = self.detector_maps.get(&code_distance)
|
||||
.ok_or(BridgeError::TileRoutingFailed(code_distance))?;
|
||||
|
||||
let mut bitmap = vec![0u64; (map.detector_count + 63) / 64];
|
||||
for (stab_idx, &fired) in syndrome.0.iter().enumerate() {
|
||||
if fired {
|
||||
let det_idx = map.stabilizer_to_detector(stab_idx);
|
||||
bitmap[det_idx / 64] |= 1u64 << (det_idx % 64);
|
||||
}
|
||||
}
|
||||
|
||||
Ok(CoherenceFilterInput {
|
||||
detector_bitmap: bitmap,
|
||||
tile_id: map.tile_for_distance(code_distance),
|
||||
round_id: 0, // Filled by caller
|
||||
})
|
||||
}
|
||||
|
||||
fn correction_to_pauli_ops(
|
||||
&self,
|
||||
correction: &CoherenceCorrectionOutput,
|
||||
) -> Result<Vec<(QubitIndex, PauliOp)>, BridgeError> {
|
||||
correction.corrections.iter()
|
||||
.map(|(qubit, code)| {
|
||||
let op = match code {
|
||||
0 => PauliOp::I,
|
||||
1 => PauliOp::X,
|
||||
2 => PauliOp::Y,
|
||||
3 => PauliOp::Z,
|
||||
other => return Err(BridgeError::UnknownCorrectionCode(*other)),
|
||||
};
|
||||
Ok((QubitIndex(*qubit), op))
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn query_coherence_score(
|
||||
&self,
|
||||
region_id: &str,
|
||||
) -> Result<CoherenceScore, BridgeError> {
|
||||
let energy = self.filter_pipeline.current_energy(region_id)
|
||||
.map_err(|e| BridgeError::CoherenceUnavailable(e.to_string()))?;
|
||||
// Invert: high energy = low coherence
|
||||
Ok(CoherenceScore(1.0 / (1.0 + energy as f64)))
|
||||
}
|
||||
|
||||
fn report_simulation_metrics(
|
||||
&self,
|
||||
_session_id: &str,
|
||||
_metrics: &SimulationMetrics,
|
||||
) -> Result<(), BridgeError> {
|
||||
// Translate to coherence signal format and submit
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Shared Kernel: ruvector-math
|
||||
|
||||
Both the quantum simulation engine and the coherence engine depend on a shared mathematical foundation. Changes to `ruvector-math` must be validated against both domains before release.
|
||||
|
||||
### Shared Types
|
||||
|
||||
```rust
|
||||
// ruvector-math provides these types used by both domains:
|
||||
|
||||
/// Complex number with f64 components (re, im).
|
||||
/// Used by quantum state vectors AND coherence restriction maps.
|
||||
pub struct Complex<T> {
|
||||
pub re: T,
|
||||
pub im: T,
|
||||
}
|
||||
|
||||
/// Cache-line-aligned vector for SIMD operations.
|
||||
/// Used by both state vector operations and residual computation.
|
||||
#[repr(align(64))]
|
||||
pub struct AlignedVec<T> {
|
||||
data: Vec<T>,
|
||||
}
|
||||
|
||||
/// SIMD dispatch trait: implementations select AVX2, NEON, or scalar
|
||||
/// at runtime depending on platform capabilities.
|
||||
pub trait SimdOps {
|
||||
fn dot_product_f64(a: &[f64], b: &[f64]) -> f64;
|
||||
fn complex_multiply(a: &[Complex<f64>], b: &[Complex<f64>], out: &mut [Complex<f64>]);
|
||||
fn norm_squared(v: &[Complex<f64>]) -> f64;
|
||||
fn axpy(alpha: f64, x: &[f64], y: &mut [f64]);
|
||||
}
|
||||
```
|
||||
|
||||
### Change Coordination Protocol
|
||||
|
||||
1. Any proposed change to `ruvector-math` must include tests for both the quantum engine use case and the coherence engine use case.
|
||||
2. The CI pipeline runs `cargo test -p ruqu-core` and `cargo test -p ruvector-coherence` after any change to `ruvector-math`.
|
||||
3. Breaking changes require a version bump and simultaneous updates to both downstream crates.
|
||||
4. Performance regressions in SIMD operations must be caught by benchmarks in both domains.
|
||||
|
||||
### Boundary
|
||||
|
||||
Only the types and functions listed above cross the shared kernel boundary. Internal implementation details of `ruvector-math` (e.g., specific SIMD intrinsics, platform detection) are not shared.
|
||||
|
||||
---
|
||||
|
||||
## 3. Customer-Supplier: Agent System Integration
|
||||
|
||||
The ruVector agent system (powered by claude-flow) acts as the customer, invoking the quantum simulation engine as a supplier. The contract defines what the agent can request and what it receives in return.
|
||||
|
||||
### Contract
|
||||
|
||||
```rust
|
||||
/// Contract for agent system access to the quantum simulation engine.
|
||||
///
|
||||
/// The agent system (customer) invokes these operations.
|
||||
/// The quantum engine (supplier) fulfills them.
|
||||
pub trait SimulationContract: Send + Sync {
|
||||
/// Build a circuit from a high-level description.
|
||||
fn build_circuit(&self, spec: CircuitSpec) -> Result<CircuitHandle, ContractError>;
|
||||
|
||||
/// Run a simulation and return results.
|
||||
fn run_simulation(&self, circuit: CircuitHandle, config: RunConfig)
|
||||
-> Result<SimulationOutput, ContractError>;
|
||||
|
||||
/// Run a VQE optimization and return the ground state energy.
|
||||
fn run_vqe(&self, spec: VQESpec) -> Result<VQEOutput, ContractError>;
|
||||
|
||||
/// Query resource requirements before committing to a run.
|
||||
fn estimate_resources(&self, circuit: CircuitHandle) -> Result<ResourceEstimate, ContractError>;
|
||||
}
|
||||
|
||||
/// High-level circuit specification from the agent.
|
||||
pub struct CircuitSpec {
|
||||
pub qubit_count: u32,
|
||||
pub gate_sequence: Vec<GateSpec>,
|
||||
pub parameters: HashMap<String, f64>,
|
||||
}
|
||||
|
||||
/// Agent-facing gate specification (simplified from internal Gate).
|
||||
pub struct GateSpec {
|
||||
pub gate_type: String,
|
||||
pub target: u32,
|
||||
pub control: Option<u32>,
|
||||
pub angle: Option<f64>,
|
||||
}
|
||||
|
||||
/// Configuration limits the agent can set.
|
||||
pub struct RunConfig {
|
||||
pub max_shots: u32,
|
||||
pub max_memory_mb: u32,
|
||||
pub timeout_seconds: u32,
|
||||
pub backend_preference: Option<String>,
|
||||
}
|
||||
|
||||
/// Results returned to the agent.
|
||||
pub struct SimulationOutput {
|
||||
pub measurement_counts: HashMap<String, u32>,
|
||||
pub expectation_values: Vec<(String, f64)>,
|
||||
pub metrics: SimulationMetrics,
|
||||
}
|
||||
|
||||
/// VQE-specific results.
|
||||
pub struct VQEOutput {
|
||||
pub ground_state_energy: f64,
|
||||
pub optimal_parameters: Vec<f64>,
|
||||
pub iterations: u32,
|
||||
pub converged: bool,
|
||||
}
|
||||
|
||||
/// Resource estimate before execution.
|
||||
pub struct ResourceEstimate {
|
||||
pub memory_bytes: usize,
|
||||
pub estimated_time_ms: f64,
|
||||
pub qubit_count: u32,
|
||||
pub gate_count: u32,
|
||||
}
|
||||
```
|
||||
|
||||
### Agent Integration Flow
|
||||
|
||||
```
|
||||
Agent Context Quantum Engine Result
|
||||
| | |
|
||||
| 1. build_circuit() | |
|
||||
|--------------------->| |
|
||||
| CircuitHandle | |
|
||||
|<---------------------| |
|
||||
| | |
|
||||
| 2. estimate_resources| |
|
||||
|--------------------->| |
|
||||
| ResourceEstimate | |
|
||||
|<---------------------| |
|
||||
| | |
|
||||
| 3. run_simulation() | |
|
||||
|--------------------->| |
|
||||
| | [executes internally]|
|
||||
| |---+ |
|
||||
| | | circuit -> state |
|
||||
| | | gates -> measure |
|
||||
| |<--+ |
|
||||
| SimulationOutput | |
|
||||
|<---------------------| |
|
||||
| | |
|
||||
| 4. Agent acts on | |
|
||||
| results | |
|
||||
v v v
|
||||
```
|
||||
|
||||
### Resource Limits
|
||||
|
||||
The supplier enforces resource limits set by the customer:
|
||||
|
||||
- Memory: Capped at `max_memory_mb`; returns error if state vector exceeds budget
|
||||
- Time: Monitored per-step; simulation aborted if `timeout_seconds` exceeded
|
||||
- Qubits: Platform limit (30 for state vector, higher for tensor network) communicated via `estimate_resources`
|
||||
|
||||
---
|
||||
|
||||
## 4. Published Language: OpenQASM Compatibility
|
||||
|
||||
A future integration point for importing and exporting circuits in the OpenQASM 3.0 standard, enabling interoperability with IBM Qiskit, Google Cirq, and other quantum frameworks.
|
||||
|
||||
### Translation Layer
|
||||
|
||||
```rust
|
||||
/// Trait for OpenQASM import/export.
|
||||
pub trait OpenQASMTranslator {
|
||||
/// Parse an OpenQASM 3.0 string into the internal circuit representation.
|
||||
fn import(&self, qasm: &str) -> Result<QuantumCircuit, TranslationError>;
|
||||
|
||||
/// Export an internal circuit to OpenQASM 3.0 format.
|
||||
fn export(&self, circuit: &QuantumCircuit) -> Result<String, TranslationError>;
|
||||
}
|
||||
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum TranslationError {
|
||||
#[error("unsupported gate in OpenQASM: {0}")]
|
||||
UnsupportedGate(String),
|
||||
|
||||
#[error("parse error at line {line}: {message}")]
|
||||
ParseError { line: u32, message: String },
|
||||
|
||||
#[error("circuit uses features not supported by OpenQASM 3.0: {0}")]
|
||||
UnsupportedFeature(String),
|
||||
}
|
||||
```
|
||||
|
||||
### Scope
|
||||
|
||||
- Phase 1: Import basic gate circuits (H, CNOT, Rz, measure)
|
||||
- Phase 2: Export circuits with parameter bindings
|
||||
- Phase 3: Support custom gate definitions and classical control flow
|
||||
|
||||
---
|
||||
|
||||
## 5. Conformist: WASM Platform
|
||||
|
||||
The `ruqu-wasm` crate conforms to WASM platform constraints without attempting to work around them. Limitations are accepted as-is, with graceful degradation where capabilities are reduced.
|
||||
|
||||
### Accepted Constraints
|
||||
|
||||
| Constraint | Impact | Mitigation |
|
||||
|------------|--------|------------|
|
||||
| No native threads | Single-threaded execution | Sequential gate application; no rayon |
|
||||
| 4GB memory limit | Max ~25 qubits (state vector) | Tensor network backend for larger circuits |
|
||||
| No filesystem | Cannot persist results | Return all data via JS callbacks |
|
||||
| No system clock | Timing metrics unavailable | Use `performance.now()` via JS bridge |
|
||||
| No SIMD (some runtimes) | Slower math | Feature-gated SIMD; scalar fallback |
|
||||
|
||||
### WASM API Surface
|
||||
|
||||
```rust
|
||||
/// Public API exposed to JavaScript via wasm-bindgen.
|
||||
///
|
||||
/// This is the conformist boundary: we accept WASM constraints
|
||||
/// and expose only what the platform allows.
|
||||
#[cfg(target_arch = "wasm32")]
|
||||
pub mod wasm_api {
|
||||
use wasm_bindgen::prelude::*;
|
||||
|
||||
#[wasm_bindgen]
|
||||
pub struct WasmSimulator {
|
||||
session: SimulationSession,
|
||||
}
|
||||
|
||||
#[wasm_bindgen]
|
||||
impl WasmSimulator {
|
||||
/// Create a new simulator for the given qubit count.
|
||||
#[wasm_bindgen(constructor)]
|
||||
pub fn new(qubit_count: u32) -> Result<WasmSimulator, JsValue> {
|
||||
// Enforce WASM-specific qubit limit
|
||||
if qubit_count > 25 {
|
||||
return Err(JsValue::from_str(
|
||||
"WASM platform supports at most 25 qubits in state vector mode"
|
||||
));
|
||||
}
|
||||
// ... construction
|
||||
Ok(WasmSimulator { session: todo!() })
|
||||
}
|
||||
|
||||
/// Add a gate to the circuit.
|
||||
pub fn add_gate(&mut self, gate_type: &str, target: u32, control: Option<u32>)
|
||||
-> Result<(), JsValue> { Ok(()) }
|
||||
|
||||
/// Run the simulation and return measurement counts as JSON.
|
||||
pub fn run(&mut self, shots: u32) -> Result<String, JsValue> {
|
||||
Ok("{}".to_string())
|
||||
}
|
||||
|
||||
/// Get memory usage estimate in bytes.
|
||||
pub fn memory_estimate(&self) -> usize { 0 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Partnership: Graph Database Integration
|
||||
|
||||
The `ruvector-graph` crate and the quantum simulation engine have a bidirectional partnership around graph-structured problems, particularly QAOA and MaxCut.
|
||||
|
||||
### Data Flow
|
||||
|
||||
```rust
|
||||
/// Graph data provided by ruvector-graph for quantum optimization.
|
||||
pub struct GraphProblem {
|
||||
pub vertex_count: u32,
|
||||
pub edges: Vec<(u32, u32, f64)>, // (source, target, weight)
|
||||
pub problem_type: GraphProblemType,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy)]
|
||||
pub enum GraphProblemType { MaxCut, GraphColoring, TSP }
|
||||
|
||||
/// Results returned to ruvector-graph for annotation.
|
||||
pub struct QuantumGraphResult {
|
||||
pub objective_value: CutValue,
|
||||
pub partition: Vec<bool>,
|
||||
pub confidence: f64,
|
||||
pub circuit_depth: CircuitDepth,
|
||||
}
|
||||
|
||||
/// Partnership interface: both sides contribute and consume.
|
||||
pub trait GraphQuantumPartnership {
|
||||
/// Graph -> Quantum: convert graph problem to QAOA circuit.
|
||||
fn graph_to_qaoa_circuit(
|
||||
&self,
|
||||
problem: &GraphProblem,
|
||||
layers: u32,
|
||||
) -> Result<QuantumCircuit, DomainError>;
|
||||
|
||||
/// Quantum -> Graph: feed optimization results back as graph annotations.
|
||||
fn annotate_graph_with_result(
|
||||
&self,
|
||||
problem: &GraphProblem,
|
||||
result: &QuantumGraphResult,
|
||||
) -> Result<GraphAnnotation, DomainError>;
|
||||
|
||||
/// Shared interest: partition graph using ruvector-mincut for subproblem decomposition.
|
||||
fn decompose_problem(
|
||||
&self,
|
||||
problem: &GraphProblem,
|
||||
max_subproblem_qubits: u32,
|
||||
) -> Result<Vec<GraphProblem>, DomainError>;
|
||||
}
|
||||
|
||||
/// Annotation written back to the graph database.
|
||||
pub struct GraphAnnotation {
|
||||
pub vertex_labels: HashMap<u32, String>,
|
||||
pub edge_labels: HashMap<(u32, u32), String>,
|
||||
pub metadata: HashMap<String, String>,
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Cutting Concerns
|
||||
|
||||
### Error Handling Across Boundaries
|
||||
|
||||
Each bounded context defines its own error type. At integration boundaries, errors are translated through the ACL rather than propagated directly.
|
||||
|
||||
```rust
|
||||
/// Integration boundary error: wraps domain errors from either side.
|
||||
#[derive(Debug, thiserror::Error)]
|
||||
pub enum IntegrationError {
|
||||
#[error("quantum engine error: {0}")]
|
||||
QuantumEngine(#[from] DomainError),
|
||||
|
||||
#[error("coherence bridge error: {0}")]
|
||||
CoherenceBridge(#[from] BridgeError),
|
||||
|
||||
#[error("contract violation: {0}")]
|
||||
ContractViolation(String),
|
||||
|
||||
#[error("resource limit exceeded: {0}")]
|
||||
ResourceLimit(String),
|
||||
}
|
||||
```
|
||||
|
||||
### Observability
|
||||
|
||||
Distributed tracing spans cross crate boundaries with a shared trace context.
|
||||
|
||||
- Each integration call propagates a `TraceId` through the ACL
|
||||
- The coherence bridge logs translation events at `DEBUG` level
|
||||
- Agent contract calls log at `INFO` with duration and resource usage
|
||||
- WASM calls use `console.log` via the JS bridge when tracing is enabled
|
||||
|
||||
### Resource Management
|
||||
|
||||
Memory and thread resources are coordinated with the ruVector runtime.
|
||||
|
||||
- State vector allocation checks the global memory budget before proceeding
|
||||
- Tensor network contractions respect thread pool limits shared with rayon
|
||||
- WASM mode has a fixed 4GB ceiling enforced at the conformist boundary
|
||||
- All resource allocation events emit `MemoryAllocated` / `MemoryReleased` domain events
|
||||
|
||||
### Configuration Propagation
|
||||
|
||||
Configuration flows from the ruVector root config into the quantum engine.
|
||||
|
||||
```rust
|
||||
/// Quantum engine configuration derived from ruVector global config.
|
||||
pub struct QuantumEngineConfig {
|
||||
pub max_qubits: u32,
|
||||
pub default_backend: BackendType,
|
||||
pub memory_budget_bytes: usize,
|
||||
pub thread_count: usize,
|
||||
pub coherence_bridge_enabled: bool,
|
||||
pub wasm_mode: bool,
|
||||
}
|
||||
|
||||
impl From<&RuVectorConfig> for QuantumEngineConfig {
|
||||
fn from(global: &RuVectorConfig) -> Self {
|
||||
Self {
|
||||
max_qubits: global.quantum.max_qubits.unwrap_or(30),
|
||||
default_backend: global.quantum.backend.parse().unwrap_or(BackendType::StateVector),
|
||||
memory_budget_bytes: global.memory.budget_bytes,
|
||||
thread_count: global.runtime.thread_count,
|
||||
coherence_bridge_enabled: global.coherence.enabled,
|
||||
wasm_mode: cfg!(target_arch = "wasm32"),
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Event Flow Diagrams
|
||||
|
||||
### 1. VQE Optimization Flow
|
||||
|
||||
```
|
||||
Agent CircuitBuilder SimSession QuantumState Optimizer
|
||||
| | | | |
|
||||
| build_circuit(spec) | | | |
|
||||
|-------------------->| | | |
|
||||
| CircuitHandle | | | |
|
||||
|<--------------------| | | |
|
||||
| | | | |
|
||||
| run_vqe(spec) | | | |
|
||||
|-------------------------------------------------------------->| |
|
||||
| | | | init(params) |
|
||||
| | | |<---------------|
|
||||
| | | | |
|
||||
| | +-----|---LOOP----------|--------+ |
|
||||
| | | | | | |
|
||||
| | | start() | | |
|
||||
| | | |----->| | | |
|
||||
| | | | apply_gates() | | |
|
||||
| | | | |---------->| | |
|
||||
| | | | | expectation_value | |
|
||||
| | | | |---------->| | |
|
||||
| | | | | energy | | |
|
||||
| | | |<----|-----------| | |
|
||||
| | | | | update(grad) |
|
||||
| | | | |------->| |
|
||||
| | | | | new_params |
|
||||
| | | | |<-------| |
|
||||
| | +-----|---END LOOP------|--------+ |
|
||||
| | | | |
|
||||
| VQEOutput(energy, params) | | |
|
||||
|<-------------------------------------------------------------| |
|
||||
| | | | |
|
||||
```
|
||||
|
||||
### 2. Surface Code QEC with Coherence Bridge
|
||||
|
||||
```
|
||||
SurfaceCodeExp NoiseService CoherenceBridge ruQu Filters Decoder
|
||||
| | | | |
|
||||
| run_cycle() | | | |
|
||||
|--+ | | | |
|
||||
| | inject_errors() | | | |
|
||||
| |---------------->| | | |
|
||||
| | error_list | | | |
|
||||
| |<----------------| | | |
|
||||
| | | | | |
|
||||
| | extract_syndrome() | | |
|
||||
| |--+ | | | |
|
||||
| | | SyndromeBits | | | |
|
||||
| |<-+ | | | |
|
||||
| | | | | |
|
||||
| | syndrome_to_filter_input() | | |
|
||||
| |--------------------------------->| | |
|
||||
| | | FilterInput | | |
|
||||
| | | | process() | |
|
||||
| | | |----------------->| |
|
||||
| | | | Verdict | |
|
||||
| | | |<-----------------| |
|
||||
| | | | | |
|
||||
| | | correction_to_pauli_ops() | |
|
||||
| |<---------------------------------| | |
|
||||
| | | | | |
|
||||
| | decode(syndrome)| | | |
|
||||
| |------------------------------------------------------------------>|
|
||||
| | correction | | | |
|
||||
| |<------------------------------------------------------------------|
|
||||
| | | | | |
|
||||
| | check_logical_error() | | |
|
||||
| |--+ | | | |
|
||||
| | | bool | | | |
|
||||
| |<-+ | | | |
|
||||
| | | | | |
|
||||
| CycleReport | | | |
|
||||
|<-+ | | | |
|
||||
```
|
||||
|
||||
### 3. WASM Deployment Flow
|
||||
|
||||
```
|
||||
Browser JS ruqu-wasm (WASM) ruqu-core Results
|
||||
| | | |
|
||||
| new WasmSimulator(n) | | |
|
||||
|--------------------->| | |
|
||||
| | QuantumState::new(n)| |
|
||||
| |-------------------->| |
|
||||
| | state | |
|
||||
| |<--------------------| |
|
||||
| WasmSimulator | | |
|
||||
|<---------------------| | |
|
||||
| | | |
|
||||
| add_gate("h", 0) | | |
|
||||
|--------------------->| | |
|
||||
| | circuit.add_gate() | |
|
||||
| |-------------------->| |
|
||||
| Ok | | |
|
||||
|<---------------------| | |
|
||||
| | | |
|
||||
| add_gate("cx", 1, 0) | | |
|
||||
|--------------------->| | |
|
||||
| | circuit.add_gate() | |
|
||||
| |-------------------->| |
|
||||
| Ok | | |
|
||||
|<---------------------| | |
|
||||
| | | |
|
||||
| run(1000) | | |
|
||||
|--------------------->| | |
|
||||
| | session.start() | |
|
||||
| |-------------------->| |
|
||||
| | run_to_completion() | |
|
||||
| |-------------------->| |
|
||||
| | | [gate loop] |
|
||||
| | |---+ |
|
||||
| | | | apply_gate() |
|
||||
| | |<--+ |
|
||||
| | | measure() |
|
||||
| | |---+ |
|
||||
| | | | outcomes |
|
||||
| | |<--+ |
|
||||
| | SimulationMetrics | |
|
||||
| |<--------------------| |
|
||||
| | | |
|
||||
| | JSON.serialize(counts) |
|
||||
| |---------------------------------------->|
|
||||
| "{\"00\": 503, \"11\": 497}" | |
|
||||
|<---------------------| | |
|
||||
| | | |
|
||||
| [JS callback with results] | |
|
||||
| | | |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Phase 1: Standalone ruqu-core
|
||||
|
||||
**Goal**: A self-contained crate with no external dependencies except `ruvector-math`.
|
||||
|
||||
- Implement `QuantumCircuit`, `QuantumState`, `SimulationSession` aggregates
|
||||
- Implement `CircuitBuilder`, `GateFusionService`, `NoiseInjectionService`
|
||||
- All value objects and domain events defined
|
||||
- Unit tests and property-based tests for normalization, gate unitarity
|
||||
- No coherence bridge, no agent integration, no WASM
|
||||
|
||||
**Dependency**: `ruvector-math` (shared kernel only)
|
||||
|
||||
### Phase 2: ruqu-algorithms + Coherence Integration
|
||||
|
||||
**Goal**: Add VQE, surface code experiments, and the coherence bridge.
|
||||
|
||||
- Implement `VQEOptimization`, `SurfaceCodeExperiment` aggregates
|
||||
- Implement `TensorNetworkState` for circuits exceeding state vector limits
|
||||
- Build `CoherenceBridge` anti-corruption layer
|
||||
- Integrate with ruQu `FilterPipeline` and `MWPMDecoder`
|
||||
- Add `PauliExpectationService`, `ContractionPathOptimizer`
|
||||
- Integration tests: VQE convergence, surface code logical error rate vs theory
|
||||
|
||||
**Dependencies**: `ruqu-core`, `ruvector-math`, `ruqu` (coherence bridge target)
|
||||
|
||||
### Phase 3: ruqu-wasm
|
||||
|
||||
**Goal**: Deploy to browser environments with graceful degradation.
|
||||
|
||||
- Implement `WasmSimulator` conformist wrapper
|
||||
- Add `wasm-bindgen` API surface
|
||||
- Enforce WASM constraints (25-qubit limit, no threads, no filesystem)
|
||||
- JavaScript test harness running circuits in headless browser
|
||||
- Performance benchmarks: gate throughput in WASM vs native
|
||||
|
||||
**Dependencies**: `ruqu-core`, `wasm-bindgen`, `wasm-pack`
|
||||
|
||||
### Phase 4: Full Agent System Integration
|
||||
|
||||
**Goal**: Complete customer-supplier integration with the claude-flow agent system.
|
||||
|
||||
- Implement `SimulationContract` trait and production adapter
|
||||
- Add resource estimation and budget enforcement
|
||||
- Implement `GraphQuantumPartnership` for QAOA/MaxCut
|
||||
- Integration with `ruvector-graph` for graph problem decomposition
|
||||
- End-to-end tests: agent builds circuit, runs simulation, acts on results
|
||||
- OpenQASM import/export (published language)
|
||||
|
||||
**Dependencies**: All previous phases, `ruvector-graph`, `claude-flow` agent SDK
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Evans, E. (2003). "Domain-Driven Design: Tackling Complexity in the Heart of Software."
|
||||
2. Vernon, V. (2013). "Implementing Domain-Driven Design." Chapter 13: Integrating Bounded Contexts.
|
||||
3. Coherence Engine DDD: `docs/architecture/coherence-engine-ddd.md`
|
||||
4. ruQu crate: `crates/ruQu/`
|
||||
5. ruvector-math: shared kernel for SIMD and complex number operations
|
||||
6. OpenQASM 3.0 specification: https://openqasm.com/
|
||||
530
docs/architecture/quantum-engine/quantum-engine-ddd-strategic.md
Normal file
530
docs/architecture/quantum-engine/quantum-engine-ddd-strategic.md
Normal file
|
|
@ -0,0 +1,530 @@
|
|||
# Quantum Simulation Engine: Domain-Driven Design - Strategic Design
|
||||
|
||||
**Version**: 0.1
|
||||
**Date**: 2026-02-06
|
||||
**Status**: Draft
|
||||
|
||||
---
|
||||
|
||||
## Domain Vision
|
||||
|
||||
The Quantum Simulation Engine provides **on-device quantum algorithm experimentation** within ruVector's always-on, agentic environment. It enables hybrid classical-quantum research on edge devices, allowing agents to leverage quantum algorithms (VQE, Grover, QAOA, QEC) without cloud services.
|
||||
|
||||
> **This is not a cloud quantum API.** The engine answers: "What does this quantum circuit produce?" entirely on the local device, using classical state-vector simulation with SIMD acceleration.
|
||||
|
||||
The engine follows ruVector's event-driven model: **inert when idle, activated on demand, resources released immediately**. A 20-qubit simulation allocates 16 MiB of state vector on activation and frees it the moment the circuit completes. No background threads, no persistent memory, no warm pools.
|
||||
|
||||
### The Universal Simulation Object
|
||||
|
||||
The power lies in a **single underlying state-vector engine** inside ruqu-sim. Once the linear algebra is fixed, everything else becomes interpretation:
|
||||
|
||||
| Domain | Qubits Become | Gates Become | Measurement Becomes | Circuit Becomes |
|
||||
|--------|---------------|--------------|---------------------|-----------------|
|
||||
| **Chemistry** | Molecular orbitals | Fermionic operators | Energy estimates | VQE ansatz |
|
||||
| **Optimization** | Decision variables | Mixing/cost ops | Cut values | QAOA circuit |
|
||||
| **Search** | Database indices | Oracle + diffusion | Found element | Grover iterations |
|
||||
| **Error Correction** | Data + ancilla qubits | Stabilizer checks | Syndrome bits | QEC cycle |
|
||||
| **Cryptography** | Key register bits | Quantum Fourier transform | Period estimate | Shor subroutine |
|
||||
| **Machine Learning** | Feature dimensions | Parameterized rotations | Classification | Quantum kernel |
|
||||
|
||||
**Same linear algebra, different interpretations. Same state vector = superposition. Same measurement = probabilistic collapse with Born rule.**
|
||||
|
||||
---
|
||||
|
||||
## Strategic Design
|
||||
|
||||
### Core Domain
|
||||
|
||||
**Quantum State Simulation** - The heart of the system, managing quantum state vectors, applying unitary gate operations, and performing projective measurements. This is where the primary complexity and innovation reside. **Most circuits run in a single fast pass; only large entangled states or iterative variational loops require sustained computation.**
|
||||
|
||||
### Supporting Domains
|
||||
|
||||
1. **Circuit Construction** - Building, validating, and optimizing quantum circuits
|
||||
2. **State Management** - State vector lifecycle, entanglement tracking, memory gating
|
||||
3. **Measurement & Observation** - Projective measurement, expectation values, syndrome extraction
|
||||
4. **Algorithm Execution** - High-level quantum algorithm implementations (VQE, Grover, QAOA, QEC)
|
||||
5. **Optimization & Backend** - SIMD acceleration, gate fusion, tensor network backends
|
||||
6. **Deployment & Integration** - WASM compilation, agent bridge, coherence bridge to ruQu
|
||||
|
||||
### Generic Domains
|
||||
|
||||
1. **Linear Algebra** - Complex number math, matrix-vector products, Kronecker products (via `ruvector-math`)
|
||||
2. **Random Sampling** - Measurement outcome sampling, noise injection (via `rand` crate)
|
||||
3. **Logging/Tracing** - Event recording, performance metrics (via `tracing` crate + `ruvector-metrics`)
|
||||
|
||||
### Application Evolution
|
||||
|
||||
| Timeline | Capabilities | Key Value |
|
||||
|----------|-------------|-----------|
|
||||
| **Phase 1 (Now)** | State vector sim, basic gates, VQE/Grover/QAOA | Local quantum experimentation without cloud |
|
||||
| **Phase 2 (6mo)** | Tensor networks, noise models, surface code cycles | Error correction research on edge devices |
|
||||
| **Phase 3 (12mo)** | GPU acceleration, OpenQASM 3.0 import, 30+ qubits | Production-grade quantum algorithm research |
|
||||
| **Phase 4 (24mo)** | Quantum hardware bridge, hybrid cloud-local execution | Real quantum device integration |
|
||||
|
||||
> **Edge-First Quantum**: The system eventually enables agents to reason about quantum algorithms without any network dependency.
|
||||
|
||||
---
|
||||
|
||||
## Ecosystem Integration Map
|
||||
|
||||
```
|
||||
+---------------------------------------------------------------------------+
|
||||
| QUANTUM SIMULATION ENGINE |
|
||||
| |
|
||||
| +-------------------------------------------------------------------+ |
|
||||
| | CIRCUIT CONSTRUCTION DOMAIN | |
|
||||
| | QuantumCircuit | Gate | GateSchedule | CircuitOptimizer | |
|
||||
| | Parameterized templates (VQE ansatz, QAOA mixer, Grover oracle) | |
|
||||
| +-------------------------------------------------------------------+ |
|
||||
| | |
|
||||
| v |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | CORE: QUANTUM STATE | | STATE MANAGEMENT | |
|
||||
| | SIMULATION |<-| DOMAIN | |
|
||||
| | | | | |
|
||||
| | * State vector engine | | * Allocation / deallocation | |
|
||||
| | * Gate application (SIMD) | | * Entanglement tracking | |
|
||||
| | * Unitary evolution | | * Memory gating (zero-idle) | |
|
||||
| | * Tensor contraction | | * State checkpointing | |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | | | |
|
||||
| v v v |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | MEASUREMENT & | | ALGORITHM EXECUTION | |
|
||||
| | OBSERVATION DOMAIN | | DOMAIN | |
|
||||
| | | | | |
|
||||
| | * Projective measurement | | * VQE + classical optimizer | |
|
||||
| | * Expectation values | | * Grover auto-iteration | |
|
||||
| | * Shot-based sampling | | * QAOA graph-based circuits | |
|
||||
| | * Syndrome extraction | | * Surface code + decoder | |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | |
|
||||
| v |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| | OPTIMIZATION & | | DEPLOYMENT & | |
|
||||
| | BACKEND DOMAIN | | INTEGRATION DOMAIN | |
|
||||
| | | | | |
|
||||
| | * SIMD dispatch | | * WASM bindings (ruqu-wasm) | |
|
||||
| | * Gate fusion | | * Agent bridge (activation) | |
|
||||
| | * Tensor network backend | | * Observability / metrics | |
|
||||
| | * Cache-local strategies | | * Coherence bridge (ruQu) | |
|
||||
| +-----------------------------+ +-----------------------------+ |
|
||||
| |
|
||||
+---------------------------------------------------------------------------+
|
||||
|
|
||||
+--------------------+---------------------+
|
||||
| | |
|
||||
v v v
|
||||
+--------------+ +-----------------+ +------------------+
|
||||
| ruvector- | | ruvector- | | ruQu |
|
||||
| math (SIMD) | | metrics | | (decoder bridge) |
|
||||
+--------------+ +-----------------+ +------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------+ +-----------------+ +------------------+
|
||||
| ruvector- | | ruvector- | | cognitum-gate- |
|
||||
| graph | | nervous-system | | kernel (tiles) |
|
||||
+--------------+ +-----------------+ +------------------+
|
||||
| |
|
||||
v v
|
||||
+--------------+ +-----------------+
|
||||
| ruvector- | | sona (adaptive |
|
||||
| mincut | | learning) |
|
||||
+--------------+ +-----------------+
|
||||
```
|
||||
|
||||
### Crate-to-Context Mapping
|
||||
|
||||
| Bounded Context | Primary Crate | Supporting Crates |
|
||||
|-----------------|---------------|-------------------|
|
||||
| Circuit Construction | `ruqu-sim` (new) | - |
|
||||
| Quantum State Simulation (Core) | `ruqu-sim` (new) | `ruvector-math` |
|
||||
| State Management | `ruqu-sim` (new) | - |
|
||||
| Measurement & Observation | `ruqu-sim` (new) | `rand` |
|
||||
| Algorithm Execution | `ruqu-sim` (new) | `ruvector-graph` (QAOA) |
|
||||
| Optimization & Backend | `ruqu-sim` (new) | `ruvector-math` (SIMD) |
|
||||
| Deployment & Integration | `ruqu-wasm` (new) | `ruqu`, `ruvector-metrics`, `ruvector-nervous-system` |
|
||||
|
||||
---
|
||||
|
||||
## Context Map
|
||||
|
||||
```
|
||||
+-----------------------------------------------------------------------+
|
||||
| QUANTUM ENGINE CONTEXT MAP |
|
||||
| |
|
||||
| [Published Language] |
|
||||
| OpenQASM 3.0 format |
|
||||
| | |
|
||||
| v |
|
||||
| +------------------+ +------------------+ |
|
||||
| | | Shared | | |
|
||||
| | CIRCUIT | Kernel | STATE | |
|
||||
| | CONSTRUCTION |<------->| MANAGEMENT | |
|
||||
| | | (Gate, | | |
|
||||
| | Builds circuits | QubitIdx| Allocates and | |
|
||||
| | Validates gates | types) | tracks state | |
|
||||
| +--------+---------+ +--------+---------+ |
|
||||
| | | |
|
||||
| | Customer | Customer |
|
||||
| | Supplier | Supplier |
|
||||
| v v |
|
||||
| +------------------+ +------------------+ |
|
||||
| | | | | |
|
||||
| | MEASUREMENT & |-------->| ALGORITHM | |
|
||||
| | OBSERVATION |Supplier | EXECUTION | |
|
||||
| | |Customer | | |
|
||||
| | Measures states | | Runs VQE/QAOA/ | |
|
||||
| | Extracts syndr. | | Grover/QEC | |
|
||||
| +--------+---------+ +--------+---------+ |
|
||||
| | | |
|
||||
| +------------+---------------+ |
|
||||
| | |
|
||||
| v |
|
||||
| +------------------+ +------------------+ |
|
||||
| | | | | |
|
||||
| | OPTIMIZATION & | | DEPLOYMENT & | |
|
||||
| | BACKEND | | INTEGRATION | |
|
||||
| | | | | |
|
||||
| | SIMD, fusion, | | WASM, agents, | |
|
||||
| | tensor networks | | ruQu bridge | |
|
||||
| +------------------+ +--------+---------+ |
|
||||
| | |
|
||||
| Conformist | Anti-Corruption |
|
||||
| (ruVector | Layer |
|
||||
| APIs) | (ruQu decoder) |
|
||||
| | |
|
||||
+--------------------------------------------------+---------------------+
|
||||
|
|
||||
v
|
||||
[Existing ruVector Ecosystem]
|
||||
|
||||
Context Relationships:
|
||||
<-------> Shared Kernel (shared types across boundary)
|
||||
-------> Customer-Supplier (downstream depends on upstream)
|
||||
Conformist: Deployment conforms to existing ruVector APIs
|
||||
ACL: CoherenceBridge wraps ruQu decoder behind anti-corruption layer
|
||||
Published Language: OpenQASM 3.0 for circuit interchange
|
||||
Open Host Service: ruqu-wasm exposes JS API
|
||||
```
|
||||
|
||||
### Relationship Summary
|
||||
|
||||
| Upstream | Downstream | Pattern | Shared Types |
|
||||
|----------|------------|---------|-------------|
|
||||
| Circuit Construction | State Management | **Shared Kernel** | `Gate`, `QubitIndex`, `GateMatrix` |
|
||||
| Measurement & Observation | Algorithm Execution | **Customer-Supplier** | `MeasurementOutcome`, `ExpectationValue` |
|
||||
| State Management | Algorithm Execution | **Customer-Supplier** | `QuantumState`, `StateCheckpoint` |
|
||||
| State Management | Measurement & Observation | **Customer-Supplier** | `QuantumState`, `Amplitude` |
|
||||
| Optimization & Backend | Core Simulation | **Partnership** | `FusedGateMatrix`, `OptimizationHint` |
|
||||
| Existing ruVector APIs | Deployment & Integration | **Conformist** | ruVector event types, metric types |
|
||||
| ruQu decoder API | Deployment & Integration | **Anti-Corruption Layer** | Isolated behind `CoherenceBridge` |
|
||||
| Circuit Construction | External tools | **Published Language** | OpenQASM 3.0 circuit format |
|
||||
| Deployment & Integration | JS consumers | **Open Host Service** | `ruqu-wasm` JS API |
|
||||
|
||||
---
|
||||
|
||||
## Ubiquitous Language
|
||||
|
||||
### Quantum Fundamentals
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Qubit** | Fundamental unit of quantum information existing in superposition of |0> and |1> basis states |
|
||||
| **Amplitude** | Complex number representing probability amplitude of a basis state; measurement probability is its squared modulus |
|
||||
| **State Vector** | Array of 2^n complex amplitudes representing the full quantum state of an n-qubit register |
|
||||
| **Basis State** | One of 2^n classical bit-string configurations; each has an associated amplitude |
|
||||
| **Superposition** | State where multiple basis states have nonzero amplitude |
|
||||
| **Entanglement** | Quantum correlation preventing independent per-qubit factorization of the joint state |
|
||||
| **Born Rule** | Measurement probability equals squared modulus of amplitude: P(x) = |alpha_x|^2 |
|
||||
|
||||
### Circuit Model
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Gate** | Unitary matrix operation acting on 1 or 2 qubits; transforms state via matrix-vector multiply |
|
||||
| **Circuit** | Ordered sequence of gates applied to a qubit register; the program of a quantum computation |
|
||||
| **Gate Matrix** | Unitary matrix defining gate action; must satisfy U * U_dagger = I |
|
||||
| **Qubit Index** | Zero-based integer identifying a qubit; determines which amplitude pairs a gate addresses |
|
||||
| **Circuit Depth** | Maximum sequential gate layers; primary determinant of simulation time |
|
||||
| **Parameterized Gate** | Gate whose matrix depends on continuous real parameters (e.g., Ry(theta)) |
|
||||
| **Gate Fusion** | Combining adjacent gates on same qubits into a single matrix multiply |
|
||||
| **Gate Schedule** | Topologically sorted gate-to-timestep assignment respecting qubit-sharing constraints |
|
||||
|
||||
### Measurement & Algorithms
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **Measurement** | Projective observation collapsing superposition to a basis state per the Born rule |
|
||||
| **Mid-Circuit Measurement** | Measurement during (not only at end of) circuit execution |
|
||||
| **Shot** | Single circuit execution + measurement; repeated shots build statistics |
|
||||
| **Expectation Value** | Observable average over quantum state: <psi|H|psi> |
|
||||
| **Pauli String** | Tensor product of per-qubit Pauli operators (I/X/Y/Z) with coefficient |
|
||||
| **Hamiltonian** | Hermitian operator (weighted sum of Pauli strings) representing total energy |
|
||||
| **Syndrome** | Classical bits from ancilla measurements indicating error presence and location |
|
||||
| **Ansatz** | Parameterized circuit template encoding the variational search space |
|
||||
| **VQE** | Variational Quantum Eigensolver; iteratively minimizes Hamiltonian expectation |
|
||||
| **QAOA** | Quantum Approximate Optimization Algorithm; alternating cost/mixer unitaries |
|
||||
| **Grover Search** | Amplitude amplification finding marked items in O(sqrt(N)) queries |
|
||||
| **Oracle** | Black-box gate marking target states by phase flip |
|
||||
| **Surface Code** | 2D topological QEC code with stabilizer checks on lattice faces/vertices |
|
||||
| **Logical Error Rate** | Undetected logical error probability per QEC cycle |
|
||||
| **Decoder** | Classical algorithm mapping syndromes to corrections; bridge to ruQu |
|
||||
|
||||
### Simulation Infrastructure
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **State Allocator** | On-demand allocation/deallocation enforcing zero-idle policy |
|
||||
| **Memory Estimate** | Predicted bytes: 2^n * 16; gating threshold for allocation |
|
||||
| **Entanglement Tracker** | Tracks qubit correlations enabling subsystem splitting |
|
||||
| **State Checkpoint** | Serialized state snapshot for mid-circuit save/restore |
|
||||
| **Tensor Network** | Alternative representation via contracted tensor factors; efficient for low entanglement |
|
||||
| **Contraction Path** | Tensor contraction order minimizing total FLOPs |
|
||||
|
||||
---
|
||||
|
||||
## Bounded Context Details
|
||||
|
||||
### Context 1: Circuit Construction Domain
|
||||
|
||||
**Purpose**: Language for expressing quantum computations. Validation, scheduling, optimization, OpenQASM interchange.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **QuantumCircuit** | Aggregate Root | Ordered gate collection with register metadata |
|
||||
| **Gate** | Entity | Single unitary with target qubits and optional parameters |
|
||||
| **GateSchedule** | Entity | Time-step assignment for parallel execution analysis |
|
||||
| **CircuitOptimizer** | Domain Service | Fusion, cancellation, and commutation rules |
|
||||
| GateId, QubitIndex, GateMatrix, ParameterBinding, GateType | Value Objects | Immutable circuit building blocks |
|
||||
|
||||
**Events**: `CircuitCreated`, `GateAppended`, `CircuitOptimized`, `CircuitValidated`, `ParametersBound`
|
||||
|
||||
**Invariants**: (1) Gate unitarity. (2) Qubit indices within bounds. (3) No duplicate targets per gate. (4) All parameters bound before execution.
|
||||
|
||||
---
|
||||
|
||||
### Context 2: State Management Domain
|
||||
|
||||
**Purpose**: State vector lifecycle following zero-idle model. Entanglement tracking. Memory gating.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **QuantumState** | Aggregate Root | Owns the 2^n complex amplitude array |
|
||||
| **EntanglementTracker** | Entity | Bipartite entanglement graph for subsystem analysis |
|
||||
| **StateAllocator** | Domain Service | On-demand allocation, immediate deallocation |
|
||||
| Amplitude, QubitCount, MemoryEstimate, StateCheckpoint | Value Objects | State representation primitives |
|
||||
|
||||
**Events**: `StateAllocated`, `StateDeallocated`, `EntanglementDetected`, `SubsystemSplit`, `CheckpointCreated`, `MemoryLimitExceeded`
|
||||
|
||||
**Invariants**: (1) Normalization preserved. (2) Zero-idle: no state persists beyond execution. (3) Allocation gated by device capacity. (4) Checkpoint restore reproduces exact amplitudes.
|
||||
|
||||
---
|
||||
|
||||
### Context 3: Measurement & Observation Domain
|
||||
|
||||
**Purpose**: Projective measurement with collapse. Analytical expectation values. Syndrome extraction for QEC.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **MeasurementEngine** | Aggregate Root | Born-rule sampling and state collapse |
|
||||
| **ExpectationCalculator** | Entity | Analytical <psi|H|psi> from Pauli decomposition |
|
||||
| **SyndromeExtractor** | Entity | Ancilla measurement and classical bit extraction |
|
||||
| MeasurementOutcome, PauliString, Hamiltonian, SyndromeBits, ShotResult | Value Objects | Measurement data types |
|
||||
|
||||
**Events**: `MeasurementPerformed`, `ExpectationComputed`, `SyndromeExtracted`, `ShotsCompleted`
|
||||
|
||||
**Invariants**: (1) Born rule: probabilities sum to 1.0. (2) Post-measurement collapse to definite state. (3) Hamiltonian Hermiticity. (4) Syndrome bit count matches code.
|
||||
|
||||
---
|
||||
|
||||
### Context 4: Algorithm Execution Domain
|
||||
|
||||
**Purpose**: High-level quantum algorithms as orchestrated loops over circuits, states, and measurements.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **VQERunner** | Entity | Iterative ansatz parameter optimization to minimize energy |
|
||||
| **GroverSearch** | Entity | Oracle + diffusion with auto-computed iteration count |
|
||||
| **QAOASolver** | Entity | Graph-based cost/mixer circuit construction and angle optimization |
|
||||
| **SurfaceCodeSimulator** | Entity | Stabilizer cycles, syndrome extraction, decoder invocation |
|
||||
| AlgorithmResult, OptimizationTrace, CutValue, LogicalErrorRate, ConvergenceCriteria | Value Objects | Algorithm output types |
|
||||
|
||||
**Events**: `VQEIterationCompleted`, `VQEConverged`, `GroverSearchCompleted`, `QAOARoundCompleted`, `SurfaceCodeCycleCompleted`, `LogicalErrorDetected`
|
||||
|
||||
**Invariants**: (1) Grover iteration count = floor(pi/4 * sqrt(N/M)). (2) VQE energy is upper bound on ground state. (3) QAOA cost/mixer alternate with correct parameter count. (4) Surface code distance matches lattice.
|
||||
|
||||
---
|
||||
|
||||
### Context 5: Optimization & Backend Domain
|
||||
|
||||
**Purpose**: Performance backends that accelerate simulation without altering semantics. SIMD, fusion, tensor networks.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **SimulationBackend** | Aggregate Root | Selects optimal execution strategy |
|
||||
| **GateFuser** | Entity | Combines compatible gate sequences into single operations |
|
||||
| **TensorContractor** | Entity | Tensor network decomposition for low-entanglement states |
|
||||
| **SIMDDispatcher** | Entity | Platform detection and optimized kernel dispatch |
|
||||
| OptimizationHint, ContractionPath, FusedGateMatrix, PlatformCapabilities | Value Objects | Backend selection metadata |
|
||||
|
||||
**Events**: `BackendSelected`, `GatesFused`, `TensorNetworkContracted`, `SIMDKernelDispatched`
|
||||
|
||||
**Invariants**: (1) Fused gates produce identical results to sequential. (2) Tensor contraction matches state-vector. (3) SIMD falls back to scalar if unavailable. (4) Intermediates stay within memory budget.
|
||||
|
||||
---
|
||||
|
||||
### Context 6: Deployment & Integration Domain
|
||||
|
||||
**Purpose**: WASM compilation, agent activation bridge, ruQu decoder anti-corruption layer, observability.
|
||||
|
||||
| Entity / Value Object | Type | Responsibility |
|
||||
|----------------------|------|---------------|
|
||||
| **WASMBindings** | Entity | Open Host Service via wasm-bindgen JS API |
|
||||
| **AgentBridge** | Entity | ruvector-nervous-system integration for context-triggered activation |
|
||||
| **MetricsReporter** | Entity | Publishes SimulationMetrics to ruvector-metrics |
|
||||
| **CoherenceBridge** | Entity | ACL translating syndromes to ruQu's DetectorBitmap/SyndromeRound |
|
||||
| PlatformCapabilities, QubitLimit, SimulationMetrics, DecoderResult | Value Objects | Integration data types |
|
||||
|
||||
**Events**: `SimulationRequested`, `SimulationCompleted`, `ResourcesReleased`, `DecoderInvoked`, `MetricsPublished`
|
||||
|
||||
**Integration Patterns**:
|
||||
- **Anti-Corruption Layer**: CoherenceBridge isolates engine from ruQu's internal DDD model
|
||||
- **Conformist**: Deployment conforms to existing ruVector event types and metric schemas
|
||||
- **Open Host Service**: ruqu-wasm exposes clean JS/TS API for browser experimentation
|
||||
- **Published Language**: OpenQASM 3.0 for circuit interchange with external tools
|
||||
|
||||
---
|
||||
|
||||
## Cross-Cutting Concerns
|
||||
|
||||
### Zero-Idle Resource Model
|
||||
|
||||
```
|
||||
IDLE (0 bytes) --> ACTIVATE (allocate 2^n * 16 bytes) --> COMPUTE --> RELEASE (0 bytes)
|
||||
```
|
||||
|
||||
No warm pools, no pre-allocated buffers, no background threads.
|
||||
|
||||
### Memory Gating
|
||||
|
||||
| Qubits | State Vector Size | Decision |
|
||||
|--------|-------------------|----------|
|
||||
| 10 | 16 KiB | Always permit |
|
||||
| 15 | 512 KiB | Always permit |
|
||||
| 20 | 16 MiB | Permit on most devices |
|
||||
| 25 | 512 MiB | Gate: check available RAM |
|
||||
| 30 | 16 GiB | Gate: likely refuse on edge |
|
||||
| 35+ | 512 GiB+ | Always refuse (state vector); consider tensor network |
|
||||
|
||||
### Error Model
|
||||
|
||||
| Context | Error | Severity | Recovery |
|
||||
|---------|-------|----------|----------|
|
||||
| Circuit Construction | Non-unitary gate | Fatal | Reject circuit |
|
||||
| State Management | Memory limit exceeded | Recoverable | Try tensor network or refuse |
|
||||
| State Management | Normalization drift | Warning | Renormalize |
|
||||
| Measurement | Zero-probability outcome | Warning | Return uniform |
|
||||
| Algorithm Execution | VQE non-convergence | Recoverable | Return best-so-far |
|
||||
| Deployment | WASM memory limit | Fatal | Report to agent |
|
||||
| Deployment | ruQu decoder unavailable | Recoverable | Skip correction, log |
|
||||
|
||||
### Observability
|
||||
|
||||
All simulation runs produce `SimulationMetrics` (circuit name, qubit count, gate count, depth, shots, backend type, wall time, peak memory, SIMD utilization) flowing through `ruvector-metrics` for unified dashboard integration.
|
||||
|
||||
### Security
|
||||
|
||||
| Concern | Mitigation |
|
||||
|---------|------------|
|
||||
| Timing side channels in measurement | Constant-time sampling via rejection method |
|
||||
| Memory contents after deallocation | Zero-fill on deallocation (SecureAllocator mode) |
|
||||
| Denial-of-service via large qubit counts | Memory gating with hard upper bound per request |
|
||||
| Untrusted OpenQASM input | Parser validates unitarity and qubit bounds before execution |
|
||||
| WASM sandbox escape | No file I/O, no network; pure computation within WASM sandbox |
|
||||
|
||||
---
|
||||
|
||||
## Module Structure
|
||||
|
||||
```
|
||||
crates/ruqu-sim/src/
|
||||
+-- lib.rs # Public API
|
||||
+-- circuit/ # Circuit Construction context
|
||||
| +-- quantum_circuit.rs # QuantumCircuit aggregate
|
||||
| +-- gate.rs # Gate entity, GateType enum
|
||||
| +-- schedule.rs # GateSchedule
|
||||
| +-- optimizer.rs # CircuitOptimizer (fusion, cancel)
|
||||
| +-- openqasm.rs # OpenQASM 3.0 import/export
|
||||
+-- state/ # State Management context
|
||||
| +-- quantum_state.rs # QuantumState aggregate
|
||||
| +-- allocator.rs # StateAllocator (zero-idle)
|
||||
| +-- entanglement.rs # EntanglementTracker
|
||||
| +-- checkpoint.rs # StateCheckpoint
|
||||
+-- measurement/ # Measurement & Observation context
|
||||
| +-- engine.rs # MeasurementEngine
|
||||
| +-- expectation.rs # ExpectationCalculator
|
||||
| +-- syndrome.rs # SyndromeExtractor
|
||||
+-- algorithms/ # Algorithm Execution context
|
||||
| +-- vqe.rs, grover.rs # VQERunner, GroverSearch
|
||||
| +-- qaoa.rs # QAOASolver
|
||||
| +-- surface_code.rs # SurfaceCodeSimulator
|
||||
+-- backend/ # Optimization & Backend context
|
||||
| +-- simulation_backend.rs # SimulationBackend
|
||||
| +-- gate_fuser.rs # GateFuser
|
||||
| +-- tensor_network.rs # TensorContractor
|
||||
| +-- simd_dispatch.rs # SIMDDispatcher
|
||||
| +-- kernels/ # avx2.rs, avx512.rs, neon.rs, wasm_simd.rs, scalar.rs
|
||||
+-- types.rs, events.rs, error.rs
|
||||
|
||||
crates/ruqu-wasm/src/
|
||||
+-- lib.rs # wasm-bindgen entry
|
||||
+-- js_api.rs # JS-facing API
|
||||
+-- agent_bridge.rs # ruvector-nervous-system integration
|
||||
+-- coherence_bridge.rs # ACL for ruQu decoder
|
||||
+-- metrics.rs # ruvector-metrics export
|
||||
```
|
||||
|
||||
### Dependency Graph
|
||||
|
||||
```
|
||||
ruqu-sim
|
||||
+-- ruvector-math (SIMD kernels, complex math)
|
||||
+-- rand (measurement sampling)
|
||||
+-- ruvector-graph (QAOA graph input)
|
||||
|
||||
ruqu-wasm
|
||||
+-- ruqu-sim (core simulation)
|
||||
+-- ruqu (coherence bridge ACL)
|
||||
+-- ruvector-metrics (observability)
|
||||
+-- ruvector-nervous-system (agent activation)
|
||||
+-- wasm-bindgen (JS bindings)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Single-gate (1q, 20-qubit register) | < 50 us |
|
||||
| Full circuit (100 gates, 15 qubits) | < 10 ms |
|
||||
| Hamiltonian expectation (10q, 50 terms) | < 1 ms |
|
||||
| SIMD speedup over scalar | > 3x (AVX2), > 6x (AVX-512) |
|
||||
| Grover (20 qubits, 1 target) | < 500 ms |
|
||||
| VQE convergence (H2, 4 qubits) | < 5s, < 100 iterations |
|
||||
| State allocation/deallocation | < 10 us / < 1 us |
|
||||
| WASM circuit (10 qubits, 50 gates) | < 50 ms |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. Evans, E. (2003). "Domain-Driven Design: Tackling Complexity in the Heart of Software."
|
||||
2. Vernon, V. (2013). "Implementing Domain-Driven Design."
|
||||
3. Nielsen, M. A. & Chuang, I. L. (2010). "Quantum Computation and Quantum Information."
|
||||
4. Peruzzo, A. et al. (2014). "A variational eigenvalue solver on a photonic quantum processor."
|
||||
5. Farhi, E. et al. (2014). "A Quantum Approximate Optimization Algorithm."
|
||||
6. Fowler, A. G. et al. (2012). "Surface codes: Towards practical large-scale quantum computation."
|
||||
7. ruQu crate: Existing coherence assessment and syndrome processing in ruVector.
|
||||
8. Coherence Engine DDD: `/docs/architecture/coherence-engine-ddd.md`
|
||||
1426
docs/architecture/quantum-engine/quantum-engine-ddd-tactical.md
Normal file
1426
docs/architecture/quantum-engine/quantum-engine-ddd-tactical.md
Normal file
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue