ruvector/examples/benchmarks
Claude 05bfff45da feat(compiler): bounded trial, confidence gating, 2-failure quarantine
Three-fix iteration based on ablation diagnostics:

1. Bounded trial: Strategy Zero now caps trial budget at min(avg_steps*2,
   external_limit/4) with floor of 10 steps. Makes false hits cheap
   (max 100 steps overhead instead of full compiled budget).

2. Confidence gating: Strategy Zero only attempts when config confidence
   >= 0.7 (Laplace-smoothed success rate). Compiled observations from
   training seed initial confidence so configs start trusted.

3. 2-failure quarantine: any compiled signature with 2+ false hits is
   disabled (expected_correct=false). Prevents persistent bad patterns.

Additional changes:
- Versioned signature prefix (v1:difficulty:constraints) for cache
  safety across refactors
- CompiledSolveConfig gains avg_steps, observations, confidence(),
  trial_budget() methods
- KnowledgeCompiler gains steps_saved tracking, confidence_threshold,
  print_diagnostics() for per-signature analysis
- record_success now tracks actual steps for delta-cost calculation
- Verbose mode prints full compiler diagnostics after each ablation

Results: false hit rate dropped from 8.2% to 4.4% (PASS). Cost still
net-positive because constraint-determined search ranges are 1-10 dates
— structurally no room for compiler optimization. Next: PolicyKernel
constraint ordering for real cost surface.

81 tests passing.

https://claude.ai/code/session_01RnwD4x5cbpB7FPvoyYQz8G
2026-02-15 22:01:46 +00:00
..
src feat(compiler): bounded trial, confidence gating, 2-failure quarantine 2026-02-15 22:01:46 +00:00
tests style: apply rustfmt across entire codebase 2026-01-28 17:00:26 +00:00
Cargo.toml feat(agi-contract): multi-dimensional IQ with cost, robustness, and AGI contract 2026-02-15 20:43:31 +00:00