blt/fixtures
Pedro Rodriguez 0ffe2ab685 Update iterator inheritance, pass file format args, limit iterator
- Create a common class to use in all inheritance for states
- Add a limit iterator that we can use in evals
- Modify ArrowFileIterator behavior to not do arrow path inference if file_format='json'
- Make EvalArgs valid
- Move testing iterators to a common directory to allow usage in multiple test files
- Make it so that SequenceIterator can take a None rng_state, to disable all rng ops (for eval mainly)

Test Plan:

- `pytest bytelatent`
- `python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null eval=null`
2025-02-20 00:57:17 +00:00
..
test-cfgs Make it possible to specify multiple config files (#54) 2025-02-18 10:42:44 -08:00
test_docs.jsonl Update iterator inheritance, pass file format args, limit iterator 2025-02-20 00:57:17 +00:00
tokenizer_data.json Initial commit 2024-12-12 15:32:30 -08:00
tokenizer_data_bpe_delim.json Initial commit 2024-12-12 15:32:30 -08:00
tokens_with_entropies.json Initial commit 2024-12-12 15:32:30 -08:00