blt/bytelatent/data/iterators
Pedro Rodriguez 936d9437be
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Allow ArrowIterator to read from json (#45)
Summary:

Currently, arrow iterator can only read arrow files. However, the pyarrow library can read
other formats, including jsonlines. This allows the same ArrowIterator to read from jsonlines,
so we can read from the original source data, and simply omit the entropy column when doing so

Test Plan:

Run train script until dataloader starts
2025-02-06 09:57:22 -08:00
..
__init__.py Initial commit 2024-12-12 15:32:30 -08:00
abstract_iterator.py Initial commit 2024-12-12 15:32:30 -08:00
arrow_iterator.py Allow ArrowIterator to read from json (#45) 2025-02-06 09:57:22 -08:00
looping_iterator.py Initial commit 2024-12-12 15:32:30 -08:00
multiprocess_iterator.py This includes fixes that make checkpointing and reloading work correctly. (#35) 2025-01-27 16:56:42 -08:00
packing_iterator.py Initial codes and scripts for training entropy model (#34) 2025-01-27 09:46:44 -08:00
preprocess_iterator.py Initial commit 2024-12-12 15:32:30 -08:00
sampling_iterator.py Initial commit 2024-12-12 15:32:30 -08:00
sequence_iterator.py Initial codes and scripts for training entropy model (#34) 2025-01-27 09:46:44 -08:00
test_arrow_iterator.py Changes for training entropy model and correcting attention in local models (#25) 2025-01-17 14:23:01 -08:00
test_iters.py Initial commit 2024-12-12 15:32:30 -08:00