blt/bytelatent
Pedro Rodriguez 9c3c997cae
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Allow ArrowIterator to read from json
Summary:

Currently, arrow iterator can only read arrow files. However, the pyarrow library can read
other formats, including jsonlines. This allows the same ArrowIterator to read from jsonlines,
so we can read from the original source data, and simply omit the entropy column when doing so

Test Plan:

Run train script until dataloader starts
2025-02-06 17:44:36 +00:00
..
configs This includes fixes that make checkpointing and reloading work correctly. (#35) 2025-01-27 16:56:42 -08:00
data Allow ArrowIterator to read from json 2025-02-06 17:44:36 +00:00
model Add rope fp32 (#43) 2025-02-05 17:19:37 -08:00
plotting Add plotting code from paper (#17) 2025-01-09 12:11:50 -08:00
preprocess Allow ArrowIterator to read from json 2025-02-06 17:44:36 +00:00
tokenizers Initial commit 2024-12-12 15:32:30 -08:00
.DS_Store Initial commit 2024-12-12 15:32:30 -08:00
__init__.py Initial commit 2024-12-12 15:32:30 -08:00
args.py Allow ArrowIterator to read from json 2025-02-06 17:44:36 +00:00
base_transformer.py Add rope fp32 (#43) 2025-02-05 17:19:37 -08:00
checkpoint.py Update checkpointing to use fsspec (#39) 2025-02-06 09:41:58 -08:00
constants.py Initial commit 2024-12-12 15:32:30 -08:00
distributed.py Changes for training entropy model and correcting attention in local models (#25) 2025-01-17 14:23:01 -08:00
entropy_model.py Changes for training entropy model and correcting attention in local models (#25) 2025-01-17 14:23:01 -08:00
eval.py This includes fixes that make checkpointing and reloading work correctly. (#35) 2025-01-27 16:56:42 -08:00
float8.py Initial commit 2024-12-12 15:32:30 -08:00
generate.py This includes fixes that make checkpointing and reloading work correctly. (#35) 2025-01-27 16:56:42 -08:00
logger.py Update checkpointing to use fsspec (#39) 2025-02-06 09:41:58 -08:00
metrics.py Update checkpointing to use fsspec (#39) 2025-02-06 09:41:58 -08:00
norms.py Fix distributed all reduce grad norm (#40) 2025-02-04 16:53:50 -08:00
optim.py Initial commit 2024-12-12 15:32:30 -08:00
probe.py Initial commit 2024-12-12 15:32:30 -08:00
profiling.py Initial commit 2024-12-12 15:32:30 -08:00
stool.py Allow ArrowIterator to read from json 2025-02-06 17:44:36 +00:00
test_blt.py Initial codes and scripts for training entropy model (#34) 2025-01-27 09:46:44 -08:00
test_entropy_model.py Changes for training entropy model and correcting attention in local models (#25) 2025-01-17 14:23:01 -08:00
train.py Update checkpointing to use fsspec (#39) 2025-02-06 09:41:58 -08:00
transformer.py Changes for training entropy model and correcting attention in local models (#25) 2025-01-17 14:23:01 -08:00