ruvector/examples/benchmarks
Claude 8bc077aeff feat(ablation): PolicyKernel, DifficultyVector, fair mode comparison
All modes now share the same solver capabilities. What differs is
the policy mechanism that decides *when* to use them:

- Mode A: fixed heuristic (posterior_range + distractor_count)
- Mode B: compiler-suggested skip_mode from constraint signatures
- Mode C: learned PolicyKernel (contextual bandit over skip modes)

Key changes:

PolicyKernel (temporal.rs):
- SkipMode enum: None | Weekday | Hybrid
- fixed_policy(): if DayOfWeek AND range>30 AND no distractors → Weekday
- compiled_policy(): uses CompiledSolveConfig.compiled_skip_mode
- learned_policy(): epsilon-greedy over per-context SkipModeStats
- EarlyCommitPenalty: tracks solved-but-wrong from aggressive skipping
- Hybrid mode: weekday skip + ±7 day refinement pass for safety

DifficultyVector (timepuzzles.rs):
- Replaces single-axis difficulty with (range_size, posterior_target,
  distractor_rate, noise_rate, ambiguity_count)
- Flipped relationship: higher difficulty = wider range + more ambiguity
  (not tighter posterior)
- Distractor DayOfWeek (difficulty 6+): DayOfWeek present but paired
  with wider Between that makes unconditional skipping risky

Ablation fairness (acceptance_test.rs):
- Removed feature gating: skip_weekday no longer forbidden for Mode A
- All modes access same solver knobs, differ only by policy
- AblationResult tracks PolicyKernel metrics (early_commit_rate, etc)
- Comparison print shows policy differences explicitly

81 tests passing (61 lib + 20 integration).

https://claude.ai/code/session_01RnwD4x5cbpB7FPvoyYQz8G
2026-02-15 22:54:28 +00:00
..
src feat(ablation): PolicyKernel, DifficultyVector, fair mode comparison 2026-02-15 22:54:28 +00:00
tests feat(generator): posterior-targeting puzzle generation, weekday skipping PolicyKernel 2026-02-15 22:31:12 +00:00
Cargo.toml feat(agi-contract): multi-dimensional IQ with cost, robustness, and AGI contract 2026-02-15 20:43:31 +00:00