ruvector/examples/benchmarks
Claude f9742e6b0e refine(ablation): risk_score policy, normalized penalty, witness log
PolicyKernel refinements:
- Fixed policy (Mode A): risk_score = R + k*D, k=30, T=140
  Fixed constants (not learned) — Mode A is the control arm.
  One distractor raises perceived risk by ~30 range-days.
  Weekday only when range is large AND distractor-free.
- Normalized EarlyCommitPenalty: (remaining/initial) * scale
  Committing at 5% scan = cheap (0.05), at 90% = expensive (0.90).
  Only charged on wrong commits.
- Hybrid minimum evidence: stop_after_first disabled in Hybrid mode
  so solver checks all matching weekdays before committing.

Witness log:
- SolutionAttempt now carries skip_mode and context_bucket strings
- record_attempt_witnessed() for full policy audit trail
- Every trajectory records which skip mode was chosen and why

Observability:
- Puzzle tags now include distractor_count and has_dow (deterministic)
- count_distractors() made public for generator to tag puzzles

Ablation assertions (two new):
- a_skip_nonzero: Mode A uses skip at least sometimes (proves not hobbled)
- c_multi_mode: Mode C uses different skip modes across contexts (proves learning)
- Skip-mode distribution table printed per context bucket for Mode C

posterior_target monotonicity verified: 2→4→8→12→18→25→35→50→70→100
(never shrinks with difficulty)

81 tests passing (61 lib + 20 integration).

https://claude.ai/code/session_01RnwD4x5cbpB7FPvoyYQz8G
2026-02-15 23:08:02 +00:00
..
src refine(ablation): risk_score policy, normalized penalty, witness log 2026-02-15 23:08:02 +00:00
tests feat(generator): posterior-targeting puzzle generation, weekday skipping PolicyKernel 2026-02-15 22:31:12 +00:00
Cargo.toml feat(agi-contract): multi-dimensional IQ with cost, robustness, and AGI contract 2026-02-15 20:43:31 +00:00