* docs(adr): update ADR-129 — all phases executing, Phase 4 publishing complete
- Phase 1 Calibration: Complete (all 4 models, benchmarks uploaded to HF)
- Phase 2 SFT: Executing on L4 GPU (rank-16, 2 epochs)
- Phase 3 Benchmarks: Executing (release gates + L4 benchmark job)
- Phase 4 Publishing: Complete (TQ configs + benchmarks + README updates on HF)
Benchmark results (L4 GPU):
- ruvltra-small: 75.4 tok/s
- ruvltra-medium: 62.6 tok/s
- ruvltra-claude-code: 67.1 tok/s
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs: add training pipeline and release gates to root README
Add Continuous Training & Optimization section (ADR-129) to the
capabilities table: nightly training, 7-gate release checks,
TurboQuant profiling, training corpus.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(training): include training corpus in Docker build context
The SFT job failed because merged_corpus.jsonl was not in the Docker
image. Copy it to scripts/training/data/training/ so it's included
in the COPY . /app/ step.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(training): handle raw text corpus format in SFT pipeline
The training corpus uses a flat 'text' field (brain memories, ADRs)
rather than chat messages or Alpaca instruction format. Add handler
that converts raw text to completion-style messages for SFT.
Co-Authored-By: claude-flow <ruv@ruv.net>
- Add libgomp1 (required by llama-cpp-python OpenMP)
- Use PyTorch cu124 index for proper CUDA wheel
- Set default CMD with --model-id for Cloud Run execution
- Consolidate pip installs for Docker layer cache efficiency
Co-Authored-By: claude-flow <ruv@ruv.net>
The pip install of llama-cpp-python from source requires ninja + cmake
for CUDA compilation. Use the prebuilt wheel from the cu124 index instead.
Falls back to source install, then transformers-only mode.
Co-Authored-By: claude-flow <ruv@ruv.net>
GPU-enabled Cloud Run jobs have a maximum timeout of 1 hour.
The previous 7200s (2hr) setting was rejected by the API.
Co-Authored-By: claude-flow <ruv@ruv.net>