feat(training): ADR-129 RuvLTRA training pipeline — calibration, SFT, benchmarks, HF publishing

* docs(adr): update ADR-129 — all phases executing, Phase 4 publishing complete - Phase 1 Calibration: Complete (all 4 models, benchmarks uploaded to HF) - Phase 2 SFT: Executing on L4 GPU (rank-16, 2 epochs) - Phase 3 Benchmarks: Executing (release gates + L4 benchmark job) - Phase 4 Publishing: Complete (TQ configs + benchmarks + README updates on HF) Benchmark results (L4 GPU): - ruvltra-small: 75.4 tok/s - ruvltra-medium: 62.6 tok/s - ruvltra-claude-code: 67.1 tok/s Co-Authored-By: claude-flow <ruv@ruv.net> * docs: add training pipeline and release gates to root README Add Continuous Training & Optimization section (ADR-129) to the capabilities table: nightly training, 7-gate release checks, TurboQuant profiling, training corpus. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(training): include training corpus in Docker build context The SFT job failed because merged_corpus.jsonl was not in the Docker image. Copy it to scripts/training/data/training/ so it's included in the COPY . /app/ step. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(training): handle raw text corpus format in SFT pipeline The training corpus uses a flat 'text' field (brain memories, ADRs) rather than chat messages or Alpaca instruction format. Add handler that converts raw text to completion-style messages for SFT. Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-23 04:27:11 +00:00 · 2026-03-30 07:58:07 -04:00 · 2026-03-30 07:58:07 -04:00 · 385eb17d08
commit 385eb17d08
parent ad6586aa10
4 changed files with 251 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -100,6 +100,14 @@ User Query → [SONA Engine] → Model Response → User Feedback
 | 8h | [**GraphMAE**](./crates/ruvector-gnn) | Graph Masked Autoencoder — self-supervised node representation learning with GAT encoder |
 | 8i | [**TurboQuant**](./crates/ruvllm) | 2-4 bit asymmetric KV-cache quantization — 6-8x memory reduction, <0.5% perplexity loss, H2O/PyramidKV eviction |

+**Continuous Training & Optimization** *(ADR-129)*
+| # | Capability | What It Does |
+|---|------------|--------------|
+| 8j | [**Nightly training**](./scripts/training/) | Automated nightly LoRA fine-tuning from brain learnings — models improve every day |
+| 8k | [**Release gates**](./scripts/training/release_gate.py) | 7 automated quality checks (code quality, routing accuracy, perplexity, speed, contamination) — prevents shipping regressions |
+| 8l | [**TurboQuant profiling**](./crates/ruvllm/src/quantize/turboquant_profile.rs) | Per-layer KV-cache bit-width optimization with `.turboquant.json` sidecar configs |
+| 8m | [**Training corpus**](./data/training/) | 230+ records from brain memories (pi.ruv.io) + architecture decisions + Claude routing examples |
+
 **Distributed Systems**
 | # | Capability | What It Does |
 |---|------------|--------------|
--- a/docs/adr/ADR-129-ruvltra-gcloud-training-turboquant.md
+++ b/docs/adr/ADR-129-ruvltra-gcloud-training-turboquant.md
@ -16,9 +16,9 @@ Accepted — Phase 1 (calibration) deployed and executing. Governance and releas
 | **Cloud Run Jobs** | **3 deployed** | `ruvltra-calibration`, `ruvltra-nightly-train`, `ruvltra-benchmark` (all L4 GPU) |
 | **Cloud Schedulers** | **2 enabled** | Nightly 03:00 UTC, Weekly benchmark Mon 06:00 UTC |
 | **Phase 1: Calibration** | **Complete** | All 4 models calibrated on L4 GPU. TQ profiles + benchmarks uploaded to HuggingFace. Results: 75.4 tok/s (small), 62.6 tok/s (medium), 67.1 tok/s (claude-code) |
-| **Phase 2: SFT** | **Ready** | Training corpus exported (230 records, 530K tokens), scripts ready |
-| **Phase 3: Benchmarks** | **Partial** | Release gate automation implemented and tested; inference benchmarks running |
-| **Phase 4: Publishing** | **Partial** | TurboQuant sidecar configs uploaded to all 4 HF models |
+| **Phase 2: SFT** | **Executing** | LoRA SFT running on L4 GPU (rank-16, 2 epochs, lr=2e-5). Corpus: 230 records, 530K tokens |
+| **Phase 3: Benchmarks** | **Executing** | Release gate automation tested. L4 GPU benchmark job running. Calibration benchmarks complete for all 4 models |
+| **Phase 4: Publishing** | **Complete** | TurboQuant sidecar configs + benchmark results uploaded to all 4 HF models. Model card READMEs updated with benchmark tables |
 | **Tooling** | **ruvllm-native** | Uses RuvltraQuantizer + TurboQuantProfile (Rust), gguf + llama-cpp-python (Python). No llama.cpp source compilation. |

 ## Context
--- a/scripts/training/data/training/merged_corpus.jsonl
+++ b/scripts/training/data/training/merged_corpus.jsonl
--- a/scripts/training/run_sft.py
+++ b/scripts/training/run_sft.py
@ -96,6 +96,16 @@ def format_dataset(records: list[dict]):
                messages[-1]["content"] += f"\n\n{rec['input']}"
            messages.append({"role": "assistant", "content": rec["output"]})
            formatted.append({"messages": messages})
+        elif "text" in rec and len(rec["text"]) > 100:
+            # Raw text format (brain memories, ADRs) — convert to completion format
+            text = rec["text"]
+            title = rec.get("title", text[:60].split("\n")[0])
+            messages = [
+                {"role": "system", "content": "You are a knowledgeable software architect and Rust developer."},
+                {"role": "user", "content": f"Explain: {title}"},
+                {"role": "assistant", "content": text},
+            ]
+            formatted.append({"messages": messages})
        else:
            log.warning("Skipping record with unknown format: %s", list(rec.keys()))