blt/bytelatent/data
Pedro Rodriguez 2655e4cf82 Remove byte tokenizer and add config args to switch between byte/patch packing
Summary:

Test Plan:

```
python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null checkpoint.dump.every=1000 checkpoint.eval.every=100000 eval=null

pytest bytelatent/
```
2025-02-22 01:13:13 +00:00
..
iterators Remove byte tokenizer and add config args to switch between byte/patch packing 2025-02-22 01:13:13 +00:00
__init__.py Initial commit 2024-12-12 15:32:30 -08:00
data_types.py This includes fixes that make checkpointing and reloading work correctly. (#35) 2025-01-27 16:56:42 -08:00
file_util.py Update file check script to check sizes (#32) 2025-01-22 13:06:46 -08:00
ngram_processor.py Initial commit 2024-12-12 15:32:30 -08:00
patcher.py Initial codes and scripts for training entropy model (#34) 2025-01-27 09:46:44 -08:00
test_data.py Test first batch matches (#53) 2025-02-13 10:05:08 -08:00