feat(training): ADR-129 RuvLTRA training pipeline — calibration, SFT, benchmarks, HF publishing

* docs(adr): update ADR-129 — all phases executing, Phase 4 publishing complete

- Phase 1 Calibration: Complete (all 4 models, benchmarks uploaded to HF)
- Phase 2 SFT: Executing on L4 GPU (rank-16, 2 epochs)
- Phase 3 Benchmarks: Executing (release gates + L4 benchmark job)
- Phase 4 Publishing: Complete (TQ configs + benchmarks + README updates on HF)

Benchmark results (L4 GPU):
- ruvltra-small: 75.4 tok/s
- ruvltra-medium: 62.6 tok/s
- ruvltra-claude-code: 67.1 tok/s

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: add training pipeline and release gates to root README

Add Continuous Training & Optimization section (ADR-129) to the
capabilities table: nightly training, 7-gate release checks,
TurboQuant profiling, training corpus.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(training): include training corpus in Docker build context

The SFT job failed because merged_corpus.jsonl was not in the Docker
image. Copy it to scripts/training/data/training/ so it's included
in the COPY . /app/ step.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(training): handle raw text corpus format in SFT pipeline

The training corpus uses a flat 'text' field (brain memories, ADRs)
rather than chat messages or Alpaca instruction format. Add handler
that converts raw text to completion-style messages for SFT.

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
rUv 2026-03-30 07:58:07 -04:00 committed by GitHub
parent ad6586aa10
commit 385eb17d08
4 changed files with 251 additions and 3 deletions

View file

@ -100,6 +100,14 @@ User Query → [SONA Engine] → Model Response → User Feedback
| 8h | [**GraphMAE**](./crates/ruvector-gnn) | Graph Masked Autoencoder — self-supervised node representation learning with GAT encoder |
| 8i | [**TurboQuant**](./crates/ruvllm) | 2-4 bit asymmetric KV-cache quantization — 6-8x memory reduction, <0.5% perplexity loss, H2O/PyramidKV eviction |
**Continuous Training & Optimization** *(ADR-129)*
| # | Capability | What It Does |
|---|------------|--------------|
| 8j | [**Nightly training**](./scripts/training/) | Automated nightly LoRA fine-tuning from brain learnings — models improve every day |
| 8k | [**Release gates**](./scripts/training/release_gate.py) | 7 automated quality checks (code quality, routing accuracy, perplexity, speed, contamination) — prevents shipping regressions |
| 8l | [**TurboQuant profiling**](./crates/ruvllm/src/quantize/turboquant_profile.rs) | Per-layer KV-cache bit-width optimization with `.turboquant.json` sidecar configs |
| 8m | [**Training corpus**](./data/training/) | 230+ records from brain memories (pi.ruv.io) + architecture decisions + Claude routing examples |
**Distributed Systems**
| # | Capability | What It Does |
|---|------------|--------------|

View file

@ -16,9 +16,9 @@ Accepted — Phase 1 (calibration) deployed and executing. Governance and releas
| **Cloud Run Jobs** | **3 deployed** | `ruvltra-calibration`, `ruvltra-nightly-train`, `ruvltra-benchmark` (all L4 GPU) |
| **Cloud Schedulers** | **2 enabled** | Nightly 03:00 UTC, Weekly benchmark Mon 06:00 UTC |
| **Phase 1: Calibration** | **Complete** | All 4 models calibrated on L4 GPU. TQ profiles + benchmarks uploaded to HuggingFace. Results: 75.4 tok/s (small), 62.6 tok/s (medium), 67.1 tok/s (claude-code) |
| **Phase 2: SFT** | **Ready** | Training corpus exported (230 records, 530K tokens), scripts ready |
| **Phase 3: Benchmarks** | **Partial** | Release gate automation implemented and tested; inference benchmarks running |
| **Phase 4: Publishing** | **Partial** | TurboQuant sidecar configs uploaded to all 4 HF models |
| **Phase 2: SFT** | **Executing** | LoRA SFT running on L4 GPU (rank-16, 2 epochs, lr=2e-5). Corpus: 230 records, 530K tokens |
| **Phase 3: Benchmarks** | **Executing** | Release gate automation tested. L4 GPU benchmark job running. Calibration benchmarks complete for all 4 models |
| **Phase 4: Publishing** | **Complete** | TurboQuant sidecar configs + benchmark results uploaded to all 4 HF models. Model card READMEs updated with benchmark tables |
| **Tooling** | **ruvllm-native** | Uses RuvltraQuantizer + TurboQuantProfile (Rust), gguf + llama-cpp-python (Python). No llama.cpp source compilation. |
## Context

File diff suppressed because one or more lines are too long

View file

@ -96,6 +96,16 @@ def format_dataset(records: list[dict]):
messages[-1]["content"] += f"\n\n{rec['input']}"
messages.append({"role": "assistant", "content": rec["output"]})
formatted.append({"messages": messages})
elif "text" in rec and len(rec["text"]) > 100:
# Raw text format (brain memories, ADRs) — convert to completion format
text = rec["text"]
title = rec.get("title", text[:60].split("\n")[0])
messages = [
{"role": "system", "content": "You are a knowledgeable software architect and Rust developer."},
{"role": "user", "content": f"Explain: {title}"},
{"role": "assistant", "content": text},
]
formatted.append({"messages": messages})
else:
log.warning("Skipping record with unknown format: %s", list(rec.keys()))