mirror of
https://github.com/ruvnet/RuVector.git
synced 2026-05-23 12:55:26 +00:00
* docs(adr): update ADR-129 — all phases executing, Phase 4 publishing complete - Phase 1 Calibration: Complete (all 4 models, benchmarks uploaded to HF) - Phase 2 SFT: Executing on L4 GPU (rank-16, 2 epochs) - Phase 3 Benchmarks: Executing (release gates + L4 benchmark job) - Phase 4 Publishing: Complete (TQ configs + benchmarks + README updates on HF) Benchmark results (L4 GPU): - ruvltra-small: 75.4 tok/s - ruvltra-medium: 62.6 tok/s - ruvltra-claude-code: 67.1 tok/s Co-Authored-By: claude-flow <ruv@ruv.net> * docs: add training pipeline and release gates to root README Add Continuous Training & Optimization section (ADR-129) to the capabilities table: nightly training, 7-gate release checks, TurboQuant profiling, training corpus. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(training): include training corpus in Docker build context The SFT job failed because merged_corpus.jsonl was not in the Docker image. Copy it to scripts/training/data/training/ so it's included in the COPY . /app/ step. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(training): handle raw text corpus format in SFT pipeline The training corpus uses a flat 'text' field (brain memories, ADRs) rather than chat messages or Alpaca instruction format. Add handler that converts raw text to completion-style messages for SFT. Co-Authored-By: claude-flow <ruv@ruv.net> |
||
|---|---|---|
| .. | ||
| data/training | ||
| contamination_check.py | ||
| deploy_training.sh | ||
| Dockerfile | ||
| export_training_data.py | ||
| nightly_train.sh | ||
| README.md | ||
| release_gate.py | ||
| run_calibration.py | ||
| run_sft.py | ||
Training Scripts
Scripts for RuvLTRA model training, evaluation, and release gating.
release_gate.py
Automated ship/no-ship checker implementing the 7 release gates from ADR-129 Section 3.2. No external dependencies -- uses Python stdlib only.
Prerequisites
Generate a gate_results.json file by running the evaluation scripts (eval_humaneval.py, eval_routing.py, eval_perplexity.py, turbo_quant_bench, eval_long_context.py, e2e_bench). The file must be placed in a results directory with the following structure:
{
"model_size": "0.5B",
"baseline": {
"humaneval_pass1": 0.40,
"routing_accuracy": 0.80,
"wikitext2_ppl": 25.0
},
"candidate": {
"humaneval_pass1": 0.48,
"routing_accuracy": 0.83,
"wikitext2_ppl": 24.5,
"tq_compression": 10.7,
"tq_ppl_delta": 0.008,
"long_context_ppl": 18.0,
"contamination_count": 0,
"tok_per_sec": 95
}
}
Usage
# Basic usage
python scripts/training/release_gate.py --results-dir ./results
# With model path (informational)
python scripts/training/release_gate.py \
--model-path /models/ruvltra-v2.0-tq \
--results-dir ./results
# Save JSON report
python scripts/training/release_gate.py \
--results-dir ./results \
--output-json ./reports/gate_report.json
Exit codes
| Code | Meaning |
|---|---|
0 |
All 7 gates PASS -- model is approved to ship |
1 |
One or more gates FAIL -- do not ship |
Gates
| Gate | Criterion | 0.5B threshold | 3B threshold |
|---|---|---|---|
| G1 | HumanEval pass@1 | >=45% or >=5pp delta | >=55% or >=5pp delta |
| G2 | Routing accuracy | >=80% | >=80% |
| G3 | Wikitext-2 PPL regression | <5% increase | <5% increase |
| G4 | TurboQuant compression | >=8x, PPL delta <1% | >=8x, PPL delta <1% |
| G5 | Long context PPL at 16K | <20 PPL | <20 PPL |
| G6 | Eval contamination | 0 instances | 0 instances |
| G7 | Inference speed | >=80 tok/s | >=40 tok/s |
CI integration
# In a GitHub Actions workflow or Cloud Build step:
- name: Release gate check
run: python scripts/training/release_gate.py --results-dir ./results --output-json ./reports/gate_report.json
If any gate fails, the script exits with code 1, which fails the CI step and blocks publishing.