mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-27 00:25:10 +00:00
New RVDNA binary format (.rvdna) purpose-built for AI genomic analysis: - 2-bit nucleotide encoding (4x compression vs ASCII FASTA) - Pre-computed k-mer vectors with int8 quantization for instant HNSW search - Sparse attention matrices in COO format for direct tensor consumption - Variant probability tensors with f16 genotype likelihoods - Zero-copy memory-mappable with 64-byte aligned sections - CRC32 checksums, section-level integrity verification Real human gene sequences from NCBI RefSeq: - HBB (hemoglobin beta, NM_000518.5) - sickle cell gene - TP53 (tumor suppressor, NM_000546.6) - exons 5-8 hotspot - BRCA1 (DNA repair, NM_007294.4) - exon 11 fragment - CYP2D6 (drug metabolism, NM_000106.6) - pharmacogenomic - INS (insulin, NM_000207.3) - preproinsulin Pipeline upgraded to 8 stages using real data: 1. Load 5 real human genes (2,340 bp total) 2. K-mer similarity matrix across gene panel 3. Smith-Waterman alignment on HBB 4. Sickle cell variant detection at HBB codon 6 5. HBB → hemoglobin beta translation (MVHLTPEEKSAVTALWGKVN verified) 6. Horvath epigenetic clock 7. CYP2D6 *4/*10 pharmacogenomics 8. RVDNA format conversion with pre-computed vectors 87 tests, 0 failures. ADR-013 documents the format specification. https://claude.ai/code/session_013B6stXbYwAkWHbE16sjUrq |
||
|---|---|---|
| .. | ||
| .gitkeep | ||
| ADR-001-vision-and-context.md | ||
| ADR-002-quantum-genomics-engine.md | ||
| ADR-003-genomic-vector-index.md | ||
| ADR-004-genomic-attention-architecture.md | ||
| ADR-005-graph-neural-protein-engine.md | ||
| ADR-006-temporal-epigenomic-engine.md | ||
| ADR-007-distributed-genomics-consensus.md | ||
| ADR-008-wasm-edge-genomics.md | ||
| ADR-009-variant-calling-pipeline.md | ||
| ADR-010-quantum-pharmacogenomics.md | ||
| ADR-011-performance-targets-and-benchmarks.md | ||
| ADR-012-genomic-security-and-privacy.md | ||
| ADR-013-rvdna-ai-native-format.md | ||