ruvector

mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-30 12:13:34 +00:00

Author	SHA1	Message	Date
rUv	42a5c47fe7	fix: format all files, add EXO crate READMEs, convert path deps to version deps - Run cargo fmt across entire workspace - Create README.md files for all 9 EXO-AI crates - Convert path dependencies to crates.io version dependencies for publishing - Add [patch.crates-io] to exo workspace for local development Co-Authored-By: claude-flow <ruv@ruv.net>	2026-02-27 16:21:14 +00:00
Claude	d48b70a84b	feat(rvdna): add PCSK9 rs11591147 protective cardiovascular SNP Add PCSK9 R46L loss-of-function variant (NEJM 2006: OR 0.77 CHD, 0.40 MI) as a protective cardiovascular SNP with negative weights. Include PCSK9→LDL-C biomarker correlation (15-21% lower LDL in carriers). Refactor gene-biomarker correlations from match to additive if-chain so multiple gene effects can stack on the same biomarker (e.g., APOE raises LDL while PCSK9 R46L lowers it). Panel expanded to 20 SNPs. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 06:44:14 +00:00
Claude	366eae172f	feat(rvdna): add LPA cardiovascular SNPs from SOTA meta-analysis evidence Add rs10455872 (OR 1.6-1.75/allele CHD) and rs3798220 (OR 1.49-1.54/allele) from 2024 LPA meta-analyses. Include Lp(a) biomarker reference (0-75 nmol/L) and gene-biomarker correlation in population model. Separate NUM_ONEHOT_SNPS (17) from NUM_SNPS (19) to preserve 64-dim vector layout with LPA encoded in summary dimension 63. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 06:41:06 +00:00
Claude	8b85624352	refactor(rvdna): consolidate SNP arrays, cache metadata, optimize streaming Structural improvements from deep code review: - Consolidate 5 parallel arrays (SNP_WEIGHTS, HOM_REF, HOM_ALT, HET, ALLELE_FREQS) into single SnpDef struct array — eliminates entire class of parallel-array misalignment bugs - Cache category_meta() with LazyLock — avoids per-call Vec allocation (critical in generate_synthetic_population hot path) - Hoist Normal::new out of inner loop in generate_readings — pre-compute distributions per biomarker instead of per-step*per-biomarker - Add clinically meaningful lower bounds: LDL normal_low 0→50 mg/dL (critical_low 25), Triglycerides normal_low 0→35 mg/dL (critical_low 20) - Optimize RingBuffer::clear from O(capacity) to O(1) — head/len reset is sufficient since push overwrites before read - Use NUM_SNPS const for vector encoding bounds instead of magic number 51 All 172 tests pass, zero clippy warnings for rvdna. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 06:31:44 +00:00
Claude	ad27974f6c	feat(rvdna): add gene-biomarker correlations, CUSUM changepoint detection, and interaction tests - Add gene→biomarker correlations in synthetic population: APOE e4→lower HDL/higher triglycerides, MTHFR→lower B12, NQO1 null→higher CRP - Add CUSUM changepoint detection algorithm to StreamProcessor for detecting sustained biomarker shifts beyond simple anomaly detection - Add 4 new integration tests: MTHFR×COMT interaction, DRD2×COMT interaction, APOE→HDL population correlation, CUSUM changepoint detection - Remove unused variant_categories import - All 172 tests pass, all ADR-014 performance targets exceeded https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 06:08:41 +00:00
Claude	3f8eb9b890	refine(rvdna): calibrate SNP weights from SOTA clinical meta-analyses Evidence-based refinements from peer-reviewed clinical research: - TP53 rs1042522 (Pro72Arg): hom_ref 0.10→0.00 — CC/Pro/Pro is not independently risk-associated; prior non-zero baseline was unjustified - BRCA2 rs11571833 (K3326X): het 0.25→0.20 — aligned with iCOGS meta-analysis OR 1.28 for breast cancer (Meeks et al., JNCI 2016, 76,637 cases / 83,796 controls) - NQO1 rs1800566 (Pro187Ser): het 0.20→0.15, hom_alt 0.45→0.30 — aligned with comprehensive meta-analysis OR 1.18 for TT vs CC (Lajin & Alachkar, Br J Cancer 2013, 92 studies, 21,178 cases); larger 2022 meta-analysis (43,736 cases) found no overall association Validated unchanged weights against SOTA evidence: - APOE rs429358: OR 3-4x het, 8-15x hom (Belloy JAMA Neurology 2023) - SLCO1B1 rs4363657: OR 4.5/allele, 16.9 hom (SEARCH/NEJM; CPIC 2022) - COMT×OPRM1 interaction: confirmed p=0.037 (orthopedic trauma study) All 48 tests pass (33 unit + 15 integration). https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 05:53:16 +00:00
Claude	22dc2686fa	refine(rvdna): calibrate biomarker weights and ranges from Genetic Lifehacks clinical data Evidence-based adjustments from geneticlifehacks.com research articles: - MTHFR C677T (rs1801133): het weight 0.30→0.35 to match documented 40% enzyme activity decrease - MTHFR A1298C (rs1801131): het 0.15→0.10, hom_alt 0.35→0.25 to match documented ~20% enzyme decrease - Homocysteine reference range: 4-12→5-15 μmol/L (clinical consensus), critical_high 50→30 (moderate hyperhomocysteinemia threshold) - Add MTHFR A1298C × COMT interaction (1.25x Neurological): A1298C homozygous + COMT slow = amplified depression risk - Add DRD2/ANKK1 × COMT interaction (1.2x Neurological): rs1800497 × Val158Met working memory interaction - Guard vector encoding with .take(4) so expanded interaction table (now 6 entries) doesn't overflow dims 56-59 Sources: - geneticlifehacks.com/mthfr/ (enzyme activity percentages) - geneticlifehacks.com/mthfr-c677t/ (MTHFR-COMT depression data) - geneticlifehacks.com/understanding-homocysteine-levels/ (ref ranges) - geneticlifehacks.com/dopamine-receptor-genes/ (DRD2×COMT interaction) All 48 tests pass (33 unit + 15 integration), benchmark compiles. https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 05:49:22 +00:00
Claude	c96d226357	perf(rvdna): optimize biomarker engine — fix bug, reduce allocations, halve ring buffer memory - Fix snp_idx silent fallback: unwrap_or(0) masked missing SNPs with incorrect index-0 lookups; now returns Option<usize> - RingBuffer: eliminate Option<T> wrapper, halving per-slot memory for f64 (8 bytes vs 16); use T::Default instead - window_mean_std: replace two-pass sum+variance with single-pass Welford's online algorithm (2x fewer cache misses) - compute_risk_scores: pre-compute category max scores via category_meta() to avoid re-scanning SNP_WEIGHTS per call; use &str keys in intermediate HashMap to reduce String allocations - HashMap capacity hints throughout (StreamProcessor, genotypes, biomarker_values, cat_scores) to eliminate rehashing - generate_synthetic_population: hoist APOE lookup out of inner loop, reserve biomarker_values capacity upfront - All 48 tests pass (33 unit + 15 integration), benchmark compiles https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 05:37:52 +00:00
Claude	65d671dcea	style(rvdna): apply linter formatting to biomarker module https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 05:20:54 +00:00
Claude	db237d3d4d	feat(rvdna): add health biomarker analysis engine with streaming simulation Implement ADR-014 Health Biomarker Analysis Architecture: - biomarker.rs: Composite risk scoring engine with 17-SNP weight matrix, gene-gene interaction modifiers (COMT×OPRM1, MTHFR compound, BRCA1×TP53), 64-dim HNSW-aligned profile vectors, clinical reference ranges for 12 biomarkers, and deterministic synthetic population generation - biomarker_stream.rs: Streaming biomarker simulator with generic RingBuffer, configurable noise/drift/anomaly injection, z-score anomaly detection, linear regression trend analysis, and exponential moving averages - 35 unit tests + 15 integration tests (168 total, 0 failures) - Criterion benchmark suite targeting ADR-014 performance budgets https://claude.ai/code/session_014FpaYVohmyLH5dcBZTgmSY	2026-02-22 05:19:23 +00:00
rUv	d102bb87ab	fix: apply cargo fmt across workspace and fix CI issues - Run cargo fmt --all to fix formatting in 362 files across the entire workspace - Add PGDG repository for PostgreSQL 17 in CI test-all-features and benchmark jobs - Add missing rvf dependency crates to standalone Dockerfile for domain-expansion - Add sona-learning and domain-expansion features to standalone Dockerfile build - Create npu.rs stub for ruvector-sparse-inference (fixes rustfmt resolution error) Co-Authored-By: claude-flow <ruv@ruv.net>	2026-02-21 20:56:38 +00:00
rUv	4ef45dbde3	feat(rvdna): native 23andMe genotyping pipeline v0.2.0 Replaces the Python rvdna-bridge with a pure Rust implementation: - 7-stage pipeline: parse, QC, classification, pharma, health, compound, report - CYP2D6/CYP2C19 diplotype calling with confidence gating (Strong/Moderate/Weak/Unsupported) - 17 health variant interpretations (APOE, BRCA1/2, TP53, MTHFR, COMT, OPRM1, etc.) - Genotype normalization (case/strand insensitive, allele-sorted) - CPIC drug recommendations gated on Moderate+ confidence - Panel QC signatures with het rate metrics - MTHFR compound analysis and pain sensitivity profiling - 91 tests passing (79 lib + 12 security) Published as rvdna v0.2.0 on crates.io. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-02-20 20:40:51 +00:00
Claude	6c7b1495bd	feat: integrate ruvector-solver into DNA and quantum components DNA crate (rvdna): - Add ruvector-solver dependency with forward-push feature - New kmer_pagerank module: KmerGraphRanker uses Forward Push PPR to rank sequences by structural centrality in k-mer overlap graphs - New solver_bench benchmark suite with 3 groups: A) Localized relevance via Forward Push PPR (20-200x speedup) B) Laplacian solve for denoising via Neumann/CG (10-80x speedup) C) Cohort-scale label propagation via CG solver - README: add DNA Solver Benchmarks section with dataset citations (GIAB, NA12878, 1000 Genomes), graph construction docs, benchmark tables, and reproducibility instructions Quantum crate (prime-radiant-category): - Add ruvector-solver dependency with neumann/cg features - SparseMatrix: replace O(nnz) COO Vec with O(1) HashMap entries, add to_csr_f64() and spmv_f64() using solver CsrMatrix - ComplexMatrix: add Jacobi eigenvalue algorithm for real-symmetric matrices (much more stable than power iteration + deflation), add to_csr_real() and is_real_valued() helper methods - DensityMatrix: add SpectralDecomposition cache, purity_fast() via Frobenius norm O(n²) vs O(n³), static eigenvalue helpers - SimplicialComplex: add graph_laplacian_csr() for spectral analysis - SolverBackedOperator: sparse quantum operator using CsrMatrix SpMV for 40-60 effective qubit scaling (vs ~33 with dense matrices) - New quantum_solver_bench: SpMV scaling, eigenvalue convergence, memory scaling benchmarks from 10 to 30 qubits All 362 tests pass (81 quantum + 102 DNA + 179 solver). https://claude.ai/code/session_01TiqLbr2DaNAntQHaVeLfiR	2026-02-20 13:37:24 +00:00
rUv	d31c029cdb	feat(rvdna): rename package to rvdna, publish to crates.io and npm Rename dna-analyzer-example to rvdna across all source files, tests, and benchmarks. Add crates.io metadata (repository, docs, keywords). Publish rvdna v0.1.0 to crates.io and @ruvector/rvdna v0.1.0 to npm with NAPI-RS platform loader, JS fallbacks, and TypeScript definitions. Also publishes workspace deps at v2.0.2: ruvector-math, ruvector-core, ruvector-filter, ruvector-collections, ruvector-graph, ruvector-gnn. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-02-12 15:47:00 +00:00
rUv	75491580a1	feat: add package.json for rvdna example with WASM bindings and build scripts	2026-02-12 15:32:55 +00:00
Claude	68f479018f	perf(dna): 1.8x kmer speedup, 10x SW memory reduction Smith-Waterman: rolling 2-row DP replaces 3 full (Q+1)*(R+1) matrices. Only prev+curr rows for H/E, single scalar for F. Memory drops from ~600KB to ~12KB for 100x500bp alignment, fitting L1 cache. Traceback matrix retained (tb==0 encodes stop condition, no full H needed). K-mer encoding: zero-allocation canonical hashing eliminates Vec alloc per k-mer in MinHash::sketch() via dual MurmurHash3 (fwd + rc strands). types.rs to_kmer_vector: rolling polynomial hash computes O(1) per k-mer instead of O(k). Removes leading nucleotide, shifts, adds trailing in constant time using precomputed 5^(k-1). Benchmarks (100bp query x 500bp ref / k=11): kmer/encode_1kb: 4.1µs → 2.3µs (1.78x) kmer/encode_100kb: 364µs → 199µs (1.83x) smith_waterman: 416µs → 386µs (1.08x, 10x less memory) full pipeline: 1.98ms → 1.52ms (1.30x end-to-end) 95 tests pass, zero failures. https://claude.ai/code/session_013B6stXbYwAkWHbE16sjUrq	2026-02-11 13:59:26 +00:00
Claude	f7003a6d61	feat(dna): implement missing capabilities + optimize hot paths - Affine gap scoring: 3-matrix Smith-Waterman (H/E/F) with flat 1D arrays for cache-friendly access, direct slice indexing - Indel detection: call_indel() for insertion/deletion from pileup data - VCF output: VCFv4.3 format with proper CHROM/POS/REF/ALT/QUAL columns - CYP2C19 pharmacogenomics: star allele calling (1/2/3/17), phenotype prediction, drug recommendations (clopidogrel, voriconazole) - Cancer signal detection: methylation entropy + extreme ratio scoring, CancerSignalDetector with configurable risk threshold - Molecular weight: monoisotopic Da for all 20 amino acids - Isoelectric point: Henderson-Hasselbalch bisection with sidechain pKa - K-mer encoding: zero-allocation canonical hashing (hash both strands, take min) eliminates O(n) Vec allocs per sliding window - CRC32: lookup table replaces bit-by-bit (~8x faster header checksums) - Benchmarks: added RVDNA, epigenomics, protein analysis groups 95 tests pass (54 lib + 12 kmer + 17 pipeline + 12 security) https://claude.ai/code/session_013B6stXbYwAkWHbE16sjUrq	2026-02-11 05:31:16 +00:00
Claude	79d73169ed	feat(dna): add RVDNA AI-native format, real gene data, 8-stage pipeline New RVDNA binary format (.rvdna) purpose-built for AI genomic analysis: - 2-bit nucleotide encoding (4x compression vs ASCII FASTA) - Pre-computed k-mer vectors with int8 quantization for instant HNSW search - Sparse attention matrices in COO format for direct tensor consumption - Variant probability tensors with f16 genotype likelihoods - Zero-copy memory-mappable with 64-byte aligned sections - CRC32 checksums, section-level integrity verification Real human gene sequences from NCBI RefSeq: - HBB (hemoglobin beta, NM_000518.5) - sickle cell gene - TP53 (tumor suppressor, NM_000546.6) - exons 5-8 hotspot - BRCA1 (DNA repair, NM_007294.4) - exon 11 fragment - CYP2D6 (drug metabolism, NM_000106.6) - pharmacogenomic - INS (insulin, NM_000207.3) - preproinsulin Pipeline upgraded to 8 stages using real data: 1. Load 5 real human genes (2,340 bp total) 2. K-mer similarity matrix across gene panel 3. Smith-Waterman alignment on HBB 4. Sickle cell variant detection at HBB codon 6 5. HBB → hemoglobin beta translation (MVHLTPEEKSAVTALWGKVN verified) 6. Horvath epigenetic clock 7. CYP2D6 4/10 pharmacogenomics 8. RVDNA format conversion with pre-computed vectors 87 tests, 0 failures. ADR-013 documents the format specification. https://claude.ai/code/session_013B6stXbYwAkWHbE16sjUrq	2026-02-11 04:48:28 +00:00
Claude	ceeb6fdbad	feat(dna): complete SOTA genomic analysis pipeline with full test suite Implements a comprehensive DNA analyzer demonstrating RuVector's vector computing capabilities for bioinformatics: Modules (9): - types: Core domain types (DnaSequence, Nucleotide, ProteinSequence, etc.) - kmer: HNSW k-mer indexing with FNV-1a hashing and MinHash sketching - alignment: Smith-Waterman local alignment with CIGAR generation - variant: SNP calling from pileup data with genotype classification - protein: DNA-to-protein translation with contact graph prediction - epigenomics: Horvath clock biological age prediction from CpG methylation - pharma: CYP2D6 star allele calling and metabolizer phenotype prediction - pipeline: DAG-based genomic analysis orchestration - error: Typed error handling across all modules Testing (41 tests, 0 mocks): - 12 k-mer integration tests (encoding, HNSW search, MinHash Jaccard) - 17 pipeline e2e tests (alignment, variant calling, pharmacogenomics) - 12 security tests (buffer overflow, path traversal, concurrency, bounds) Benchmarks: Criterion suite for kmer, alignment, variant, protein, pipeline Binary: 7-stage demo (sequence gen, k-mer search, alignment, variant calling, protein analysis, epigenomics, pharmacogenomics) https://claude.ai/code/session_013B6stXbYwAkWHbE16sjUrq	2026-02-11 04:29:28 +00:00

19 commits