blt/bytelatent/tokenizers
Pedro Rodriguez 2655e4cf82 Remove byte tokenizer and add config args to switch between byte/patch packing
Summary:

Test Plan:

```
python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null checkpoint.dump.every=1000 checkpoint.eval.every=100000 eval=null

pytest bytelatent/
```
2025-02-22 01:13:13 +00:00
..
__init__.py Initial commit 2024-12-12 15:32:30 -08:00
abstract_tokenizer.py Initial commit 2024-12-12 15:32:30 -08:00
blt_tokenizer.py Initial commit 2024-12-12 15:32:30 -08:00
build_tokenizer.py Remove byte tokenizer and add config args to switch between byte/patch packing 2025-02-22 01:13:13 +00:00
constants.py Initial commit 2024-12-12 15:32:30 -08:00
sentence_piece_tokenizer.py Initial commit 2024-12-12 15:32:30 -08:00
test_blt_tokenizer.py Initial commit 2024-12-12 15:32:30 -08:00
tiktoken_tokenizer.py Initial commit 2024-12-12 15:32:30 -08:00