Dhravya
|
cba994be3f
|
feat: add eval corpus data generator
7-phase pipeline for generating synthetic multi-file corpora:
1. Scenario Brief (SCENARIO.md) - world-building
2. Fact Registry (facts.json) - consistency source of truth
3. File Manifest (manifest.json) - per-file briefs
4. Clustering - topological sort + fact registry sharding
5. Parallel File Generation - concurrent workers
6. Validation - cross-reference & consistency audit
7. Question Generation - 10 eval questions per corpus
Supports all 13 data points (dp_001 through dp_013, 5 to 10,000 files).
Uses Gemini 2.5 Pro by default. Includes resume support, validation,
and 219 unit tests.
|
2026-04-28 23:24:42 +00:00 |
|