Research bitnet.cpp Rust port strategy: R3-Engine proves 100% Safe Rust
with dual-target (native AVX-512 + WASM SIMD128) achieving 80-117 tok/s.
Recommend Approach C (reference R3-Engine patterns) over Python codegen.
WASM SIMD128 maps TL1 LUT to v128.swizzle for ~20-40 tok/s in browser.
Resolves open question #5 (WASM viability). Adds 6 new references,
5 new DDD terms, 3 new open questions. DDD updated to v2.4.
https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK
Generated test coverage analysis for the PT-BitNet quantizer module,
documenting coverage across quantization, packing, dequantization,
error metrics, and layer filtering.
https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK
Analyze RLM training stack GPU dependencies and document that Phase 0.5
runs entirely on pure CPU SIMD (NEON on aarch64) without Metal GPU.
MicroLoRA, TrainingPipeline, EwcRegularizer, GrpoOptimizer are all pure
ndarray; ContrastiveTrainer has explicit CPU fallback. Only ~2-3x slower
than Metal. Extends platform support to Linux ARM64 and x86 (scalar).
https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK
Add Phase 0.5: RLM Post-Quantization Refinement — a $0 Mac Studio
approach that uses the existing RLM stack (MicroLoRA, GRPO, EWC++,
ContrastiveTrainer, MemoryDistiller, PolicyStore) to refine the
Phase 0 PTQ model by training only FP16 components (~1-2% of params).
ADR-017 changes:
- Added Phase 0.5 to phased decision: A(0C) → RLM Refinement → D → C → B
- Added AD-19: RLM Post-Quantization Refinement architecture
- Frozen ternary weights + trainable FP16 (LoRA, router, scales)
- ~200-400M trainable params (1-2% of 30B), 100-500M training tokens
- 100% RLM code reuse, 0% new training code
- 2-12 days on Mac Studio Metal, $0 cost
- Expected quality: ~70-80% of FP16 (up from 55-65% Phase 0 PTQ)
- Full pipeline diagram: Router repair → MicroLoRA injection → Scale opt
- Memory budget analysis: ~12-20 GB active RAM (fits any Mac Studio)
- Training schedule: 3-14 days total wall time
- Added Phase 0.5 exit criteria (11 items)
- Updated infrastructure table with Phase 0.5 row
- Updated consequences with RLM refinement benefits
DDD v2.2 changes:
- Added Section 3.8.1: Phase 0.5 RLM Refinement Mode
- Added 5 ubiquitous language terms (RLM Refinement, Frozen Ternary,
LoRA Correction, Router Repair)
- Added 3 open questions (LoRA rank, GGUF persistence, Phase continuity)
Key insight: RLM trains ~1% of parameters → needs ~0.25% of the data
(100-500M vs 200B tokens) → Mac Studio Metal is sufficient → $0 cost.
https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK
Update AD-17 and AD-18 to reflect that Phase 0 post-training quantization
runs entirely on Mac Studio (Apple Silicon) at zero cost, eliminating the
need for cloud GPU for the prototype phase.
Key changes:
- Phase 0 cost updated from ~$100 (cloud) to $0 (local Mac Studio)
- AD-18 now includes Mac Studio config compatibility matrix (M4 Max 36-128GB,
M3 Ultra 96-512GB) with wall time estimates per config
- Added mmap strategy: FP16 weights demand-paged from disk, per-tensor
quantization uses ~2-4MB working memory regardless of model size
- Metal GPU calibration via existing Candle integration (use_metal: true)
- ARM NEON for TL1 kernel validation (same ISA as production target)
- Updated throughput table with Mac Studio entries and Phase 0 column
- PtBitnetConfig gains use_mmap, use_metal_calibration, max_memory_gb fields
- Phase 0 exit criteria updated for Mac Studio local execution
- Updated infrastructure table: Phase 0 + router validation both $0 local
Mac Studio is ideal for Phase 0 (PTQ in hours, $0) but still infeasible
for Phase 1+ training (200B tokens at 500-1000 tok/s = 6.5 years).
This separation validates the phased cloud-for-training approach.
https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK
ADR-017: Add AD-17 with detailed memory budget analysis showing per-expert
distillation fits in A100 40GB (~15.5GB), full model requires 4×A100 80GB
(~430GB). CPU SIMD training infeasible at 200B+ tokens (~65 years on AVX2).
Recommend GCP 4×A100 spot instances (~$1,300 for Phase 1) or DataCrunch
H100 ($1.99/hr). Includes cost comparison across 6 platforms, per-phase
infrastructure mapping, and required CUDA device dispatch code change for
RealContrastiveTrainer.
DDD: Add section 8.5 Training Infrastructure Model with expert-parallel
GPU topology diagram, what-runs-where matrix, and required code change
summary.
https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK
Research and architecture documentation for integrating BitNet b1.58
ternary quantization with GLM-4.7-Flash 30B-A3B MoE architecture into
the RuvLLM serving runtime. Includes phased approach (expert replacement
→ full distillation → native training), CPU inference kernel strategy
(TL1/TL2/I2_S), domain model with 7 bounded contexts, and memory budget
analysis targeting <10GB for 30B-class CPU-only inference.
https://claude.ai/code/session_011nTcGcn49b8YKJRVoh4TaK
Resolves the "already exists and is not an empty directory" error by:
- Adding a cleanup step to remove the directory before git clone
- Setting up Node.js for ruvector dependencies
- Installing and verifying ruvector MCP installation
Run rustfmt on all Rust files to fix CI formatting checks.
This addresses pre-existing formatting inconsistencies across:
- cognitum-gate-kernel
- cognitum-gate-tilezero
- prime-radiant
- ruvector-* crates
- examples/benchmarks
- and other crates
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move datum and false arguments to same line in from_polymorphic_datum
- Join split let text_len = ... assignment to single line
These changes fix CI rustfmt check failures.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit fixes a critical P0 bug where HNSW indexes on ruvector
columns would crash PostgreSQL with a segmentation fault when using
parameterized queries (prepared statements, ORMs, application drivers).
Root Cause:
- Query vector extraction failed for parameterized queries
- Code fell back to zero vector without validation
- Zero vector caused segfault during HNSW search
Changes:
- Add multi-method query vector extraction pipeline
1. Direct RuVector::from_polymorphic_datum()
2. Text parameter conversion for parameterized queries
3. Validated varlena fallback with dimension checking
- Add query_valid flag to track extraction success
- Add validation before search execution:
- Reject empty/invalid query vectors with clear errors
- Reject all-zero vectors (invalid for similarity search)
- Validate dimension match between query and index
- Apply same fixes to IVFFlat for consistency
Testing:
- Added regression tests for parameterized queries
- Added tests for zero vector error handling
- Added tests for dimension mismatch errors
- Added 384-dimension production-scale tests
Fixes: #141
See: docs/adr/ADR-0027-hnsw-parameterized-query-fix.md
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ChatEnhancer for enhanced chat processing with skills, memory,
and proactive assistance integration
- Add SkillExecutor for skill lifecycle management and execution
- Add builtin skills: CodeSkill, MemorySkill, SummarizeSkill, WebSearchSkill
- Improve server.ts with better error handling and session management
- Update AIDefenceGuard with enhanced security checks
- Update chat UI with improved styling and interactions
- Bump version to 0.1.1 with delta crates integration
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the full delta-behavior framework - systems where change is
permitted but collapse is not.
## Core Implementation
- Coherence type with [0,1] bounds and safe constructors
- Three-layer enforcement: energy cost, scheduling, memory gating
- DeltaSystem trait for coherence-preserving systems
- DeltaConfig with strict/relaxed/default presets
## 11 Exotic Applications
1. Self-Limiting Reasoning - AI that does less when uncertain
2. Computational Event Horizon - bounded computation without hard limits
3. Artificial Homeostasis - synthetic life with coherence-based survival
4. Self-Stabilizing World Model - models that refuse to hallucinate
5. Coherence-Bounded Creativity - novelty without chaos
6. Anti-Cascade Financial System - markets that cannot collapse
7. Graceful Aging - systems that simplify over time
8. Swarm Intelligence - collective behavior without pathology
9. Graceful Shutdown - systems that seek safe termination
10. Pre-AGI Containment - bounded intelligence growth
11. Extropic Substrate - goal mutation, agent lifecycles, spike semantics
## Performance Optimizations
- O(n²) → O(n·k) swarm neighbor detection via SpatialGrid
- O(n) → O(1) coherence calculation with incremental cache
- VecDeque for O(1) history removal
- SIMD utilities with 8x loop unrolling
- Bounded history to prevent memory leaks
## Security Fixes
- Replaced unsafe static mut with AtomicU64 for thread-safe RNG
- NaN validation on all coherence inputs
- Overflow protection in calculations
## WASM + TypeScript SDK
- Full wasm-bindgen exports for all 11 applications
- High-level TypeScript SDK with ergonomic APIs
- Browser and Node.js examples
## Test Coverage
- 32 lib tests, 14 WASM tests, 13 doc tests (59 total)
Resolves#140
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add GoogleAIProvider for direct Gemini API access
- Support Gemini 2.5 Flash, Pro, and Lite models
- Add Gemini 3.x preview models
- Auto-detect and use Google AI when GOOGLE_AI_API_KEY is set
- Update chat UI with debugging and LLM status checks
- Fix model routing for Google Cloud deployments
- Bump version to 0.1.8
Tested models:
- gemini-2.5-flash (stable, recommended)
- gemini-2.5-pro (stable)
- gemini-2.5-flash-lite (stable)
Sources:
- https://ai.google.dev/gemini-api/docs/models
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add embedded chat UI with dark mode default (ADR-015)
- Add channel setup commands for Slack, Discord, Telegram
- Add webhook configuration CLI commands
- Add cloud deployment CLI with Cloud Run, Docker, Kubernetes support
- Add gcloud CLI integration and curl-based installer
- Add template library system for quick starts
- Update Dockerfile to include static files
- Bump version to 0.1.6
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix pino error serialization (use 'err' key instead of 'error')
- Add detailed logging for JSON parse errors with rawBody preview
- Add nested try-catch for AIDefence output validation
- Block critical threats with SECURITY_BLOCKED error code
- Add /api/models endpoint with 12+ supported LLM models
- Add Gemini 2.5 Pro support via OpenRouter
- Update README with stronger security messaging vs Clawdbot
- Update FEATURE_COMPARISON.md with security gap analysis
- Bump version to 0.1.1
Tested and verified on Cloud Run deployment:
- All endpoints working: health, ready, status, models, agents, sessions, chat
- Chat returns proper fallback when LLM not configured
- Error logs now properly serialize error objects
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add server.ts with REST API endpoints for RuvBot
- Implement health/ready checks for Cloud Run
- Add agent and session management API
- Integrate AIDefence security layer in production
- Fix Dockerfile CMD path to dist/server.js
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Create missing learning/memory/MemoryManager.ts with Embedder and VectorIndex interfaces
- Fix core/index.ts to re-export memory types from learning module instead of non-existent core/memory
- Fix HybridSearch to await async vectorIndex.add() call and handle empty queries
- Fix MockSlackWebClient name collisions (users, files, reactions private Maps shadowed by API objects)
- Fix MockRouter path matching to properly split method:path keys with param colons
- Fix SkillRegistry updateMetrics calculation for success rate
- Fix test mocks to match async interface signatures (VectorIndex, Embedder)
- Update skill test latency calculation to use performance.now() for sub-ms precision
Test results: 561 passing (previously 287), 10 remaining edge case failures
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>