mirror of https://github.com/ruvnet/RuVector.git synced 2026-05-23 12:55:26 +00:00

History

rUv 385eb17d08 feat(training): ADR-129 RuvLTRA training pipeline — calibration, SFT, benchmarks, HF publishing * docs(adr): update ADR-129 — all phases executing, Phase 4 publishing complete - Phase 1 Calibration: Complete (all 4 models, benchmarks uploaded to HF) - Phase 2 SFT: Executing on L4 GPU (rank-16, 2 epochs) - Phase 3 Benchmarks: Executing (release gates + L4 benchmark job) - Phase 4 Publishing: Complete (TQ configs + benchmarks + README updates on HF) Benchmark results (L4 GPU): - ruvltra-small: 75.4 tok/s - ruvltra-medium: 62.6 tok/s - ruvltra-claude-code: 67.1 tok/s Co-Authored-By: claude-flow <ruv@ruv.net> * docs: add training pipeline and release gates to root README Add Continuous Training & Optimization section (ADR-129) to the capabilities table: nightly training, 7-gate release checks, TurboQuant profiling, training corpus. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(training): include training corpus in Docker build context The SFT job failed because merged_corpus.jsonl was not in the Docker image. Copy it to scripts/training/data/training/ so it's included in the COPY . /app/ step. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(training): handle raw text corpus format in SFT pipeline The training corpus uses a flat 'text' field (brain memories, ADRs) rather than chat messages or Alpaca instruction format. Add handler that converts raw text to completion-style messages for SFT. Co-Authored-By: claude-flow <ruv@ruv.net>		2026-03-30 07:58:07 -04:00
..
data/training	feat(training): ADR-129 RuvLTRA training pipeline — calibration, SFT, benchmarks, HF publishing	2026-03-30 07:58:07 -04:00
contamination_check.py	feat: implement ADR-129 training pipeline and TurboQuant sidecar infra	2026-03-28 02:27:32 +00:00
deploy_training.sh	fix(training): use 3600s timeout for GPU Cloud Run jobs	2026-03-28 12:21:58 +00:00
Dockerfile	fix(training): use torch 2.5.1+cu124 (2.3.1 unavailable on cu124 index)	2026-03-28 14:26:28 +00:00
export_training_data.py	feat: implement ADR-129 training pipeline and TurboQuant sidecar infra	2026-03-28 02:27:32 +00:00
nightly_train.sh	feat: add nightly continuous learning pipeline (ADR-129)	2026-03-28 02:30:25 +00:00
README.md	feat: implement ADR-129 training pipeline and TurboQuant sidecar infra	2026-03-28 02:27:32 +00:00
release_gate.py	feat: implement ADR-129 training pipeline and TurboQuant sidecar infra	2026-03-28 02:27:32 +00:00
run_calibration.py	refactor(training): use ruvllm-native tooling instead of llama.cpp	2026-03-28 13:40:14 +00:00
run_sft.py	feat(training): ADR-129 RuvLTRA training pipeline — calibration, SFT, benchmarks, HF publishing	2026-03-30 07:58:07 -04:00

README.md

Training Scripts

Scripts for RuvLTRA model training, evaluation, and release gating.

release_gate.py

Automated ship/no-ship checker implementing the 7 release gates from ADR-129 Section 3.2. No external dependencies -- uses Python stdlib only.

Prerequisites

Generate a gate_results.json file by running the evaluation scripts (eval_humaneval.py, eval_routing.py, eval_perplexity.py, turbo_quant_bench, eval_long_context.py, e2e_bench). The file must be placed in a results directory with the following structure:

{
  "model_size": "0.5B",
  "baseline": {
    "humaneval_pass1": 0.40,
    "routing_accuracy": 0.80,
    "wikitext2_ppl": 25.0
  },
  "candidate": {
    "humaneval_pass1": 0.48,
    "routing_accuracy": 0.83,
    "wikitext2_ppl": 24.5,
    "tq_compression": 10.7,
    "tq_ppl_delta": 0.008,
    "long_context_ppl": 18.0,
    "contamination_count": 0,
    "tok_per_sec": 95
  }
}

Usage

# Basic usage
python scripts/training/release_gate.py --results-dir ./results

# With model path (informational)
python scripts/training/release_gate.py \
  --model-path /models/ruvltra-v2.0-tq \
  --results-dir ./results

# Save JSON report
python scripts/training/release_gate.py \
  --results-dir ./results \
  --output-json ./reports/gate_report.json

Exit codes

Code	Meaning
`0`	All 7 gates PASS -- model is approved to ship
`1`	One or more gates FAIL -- do not ship

Gates

Gate	Criterion	0.5B threshold	3B threshold
G1	HumanEval pass@1	>=45% or >=5pp delta	>=55% or >=5pp delta
G2	Routing accuracy	>=80%	>=80%
G3	Wikitext-2 PPL regression	<5% increase	<5% increase
G4	TurboQuant compression	>=8x, PPL delta <1%	>=8x, PPL delta <1%
G5	Long context PPL at 16K	<20 PPL	<20 PPL
G6	Eval contamination	0 instances	0 instances
G7	Inference speed	>=80 tok/s	>=40 tok/s

CI integration

# In a GitHub Actions workflow or Cloud Build step:
- name: Release gate check
  run: python scripts/training/release_gate.py --results-dir ./results --output-json ./reports/gate_report.json

If any gate fails, the script exits with code 1, which fails the CI step and blocks publishing.