mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-23 04:27:11 +00:00
feat(training): ADR-129 RuvLTRA training pipeline — calibration, SFT, benchmarks, HF publishing
* docs(adr): update ADR-129 — all phases executing, Phase 4 publishing complete - Phase 1 Calibration: Complete (all 4 models, benchmarks uploaded to HF) - Phase 2 SFT: Executing on L4 GPU (rank-16, 2 epochs) - Phase 3 Benchmarks: Executing (release gates + L4 benchmark job) - Phase 4 Publishing: Complete (TQ configs + benchmarks + README updates on HF) Benchmark results (L4 GPU): - ruvltra-small: 75.4 tok/s - ruvltra-medium: 62.6 tok/s - ruvltra-claude-code: 67.1 tok/s Co-Authored-By: claude-flow <ruv@ruv.net> * docs: add training pipeline and release gates to root README Add Continuous Training & Optimization section (ADR-129) to the capabilities table: nightly training, 7-gate release checks, TurboQuant profiling, training corpus. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(training): include training corpus in Docker build context The SFT job failed because merged_corpus.jsonl was not in the Docker image. Copy it to scripts/training/data/training/ so it's included in the COPY . /app/ step. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(training): handle raw text corpus format in SFT pipeline The training corpus uses a flat 'text' field (brain memories, ADRs) rather than chat messages or Alpaca instruction format. Add handler that converts raw text to completion-style messages for SFT. Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
ad6586aa10
commit
385eb17d08
4 changed files with 251 additions and 3 deletions
|
|
@ -100,6 +100,14 @@ User Query → [SONA Engine] → Model Response → User Feedback
|
|||
| 8h | [**GraphMAE**](./crates/ruvector-gnn) | Graph Masked Autoencoder — self-supervised node representation learning with GAT encoder |
|
||||
| 8i | [**TurboQuant**](./crates/ruvllm) | 2-4 bit asymmetric KV-cache quantization — 6-8x memory reduction, <0.5% perplexity loss, H2O/PyramidKV eviction |
|
||||
|
||||
**Continuous Training & Optimization** *(ADR-129)*
|
||||
| # | Capability | What It Does |
|
||||
|---|------------|--------------|
|
||||
| 8j | [**Nightly training**](./scripts/training/) | Automated nightly LoRA fine-tuning from brain learnings — models improve every day |
|
||||
| 8k | [**Release gates**](./scripts/training/release_gate.py) | 7 automated quality checks (code quality, routing accuracy, perplexity, speed, contamination) — prevents shipping regressions |
|
||||
| 8l | [**TurboQuant profiling**](./crates/ruvllm/src/quantize/turboquant_profile.rs) | Per-layer KV-cache bit-width optimization with `.turboquant.json` sidecar configs |
|
||||
| 8m | [**Training corpus**](./data/training/) | 230+ records from brain memories (pi.ruv.io) + architecture decisions + Claude routing examples |
|
||||
|
||||
**Distributed Systems**
|
||||
| # | Capability | What It Does |
|
||||
|---|------------|--------------|
|
||||
|
|
|
|||
|
|
@ -16,9 +16,9 @@ Accepted — Phase 1 (calibration) deployed and executing. Governance and releas
|
|||
| **Cloud Run Jobs** | **3 deployed** | `ruvltra-calibration`, `ruvltra-nightly-train`, `ruvltra-benchmark` (all L4 GPU) |
|
||||
| **Cloud Schedulers** | **2 enabled** | Nightly 03:00 UTC, Weekly benchmark Mon 06:00 UTC |
|
||||
| **Phase 1: Calibration** | **Complete** | All 4 models calibrated on L4 GPU. TQ profiles + benchmarks uploaded to HuggingFace. Results: 75.4 tok/s (small), 62.6 tok/s (medium), 67.1 tok/s (claude-code) |
|
||||
| **Phase 2: SFT** | **Ready** | Training corpus exported (230 records, 530K tokens), scripts ready |
|
||||
| **Phase 3: Benchmarks** | **Partial** | Release gate automation implemented and tested; inference benchmarks running |
|
||||
| **Phase 4: Publishing** | **Partial** | TurboQuant sidecar configs uploaded to all 4 HF models |
|
||||
| **Phase 2: SFT** | **Executing** | LoRA SFT running on L4 GPU (rank-16, 2 epochs, lr=2e-5). Corpus: 230 records, 530K tokens |
|
||||
| **Phase 3: Benchmarks** | **Executing** | Release gate automation tested. L4 GPU benchmark job running. Calibration benchmarks complete for all 4 models |
|
||||
| **Phase 4: Publishing** | **Complete** | TurboQuant sidecar configs + benchmark results uploaded to all 4 HF models. Model card READMEs updated with benchmark tables |
|
||||
| **Tooling** | **ruvllm-native** | Uses RuvltraQuantizer + TurboQuantProfile (Rust), gguf + llama-cpp-python (Python). No llama.cpp source compilation. |
|
||||
|
||||
## Context
|
||||
|
|
|
|||
230
scripts/training/data/training/merged_corpus.jsonl
Normal file
230
scripts/training/data/training/merged_corpus.jsonl
Normal file
File diff suppressed because one or more lines are too long
|
|
@ -96,6 +96,16 @@ def format_dataset(records: list[dict]):
|
|||
messages[-1]["content"] += f"\n\n{rec['input']}"
|
||||
messages.append({"role": "assistant", "content": rec["output"]})
|
||||
formatted.append({"messages": messages})
|
||||
elif "text" in rec and len(rec["text"]) > 100:
|
||||
# Raw text format (brain memories, ADRs) — convert to completion format
|
||||
text = rec["text"]
|
||||
title = rec.get("title", text[:60].split("\n")[0])
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a knowledgeable software architect and Rust developer."},
|
||||
{"role": "user", "content": f"Explain: {title}"},
|
||||
{"role": "assistant", "content": text},
|
||||
]
|
||||
formatted.append({"messages": messages})
|
||||
else:
|
||||
log.warning("Skipping record with unknown format: %s", list(rec.keys()))
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue