blt/bytelatent/tokenizers
Pedro Rodriguez aeb95f12a1
Remove byte tokenizer and add config args to switch between byte/patch packing ()
Summary:

Test Plan:

```
python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null checkpoint.dump.every=1000 checkpoint.eval.every=100000 eval=null

pytest bytelatent/
```
2025-02-25 11:10:59 -08:00
..
__init__.py Initial commit 2024-12-12 15:32:30 -08:00
abstract_tokenizer.py Add vocab and seq len abstract fields () 2025-02-24 14:41:58 -08:00
blt_tokenizer.py Add vocab and seq len abstract fields () 2025-02-24 14:41:58 -08:00
build_tokenizer.py Remove byte tokenizer and add config args to switch between byte/patch packing () 2025-02-25 11:10:59 -08:00
constants.py Initial commit 2024-12-12 15:32:30 -08:00
sentence_piece_tokenizer.py Add vocab and seq len abstract fields () 2025-02-24 14:41:58 -08:00
test_blt_tokenizer.py Initial commit 2024-12-12 15:32:30 -08:00
tiktoken_tokenizer.py Add vocab and seq len abstract fields () 2025-02-24 14:41:58 -08:00