mirror of
https://github.com/facebookresearch/blt.git
synced 2025-02-23 21:42:14 +00:00
Summary: Currently, arrow iterator can only read arrow files. However, the pyarrow library can read other formats, including jsonlines. This allows the same ArrowIterator to read from jsonlines, so we can read from the original source data, and simply omit the entropy column when doing so Test Plan: Run train script until dataloader starts |
||
---|---|---|
.. | ||
__init__.py | ||
abstract_iterator.py | ||
arrow_iterator.py | ||
looping_iterator.py | ||
multiprocess_iterator.py | ||
packing_iterator.py | ||
preprocess_iterator.py | ||
sampling_iterator.py | ||
sequence_iterator.py | ||
test_arrow_iterator.py | ||
test_iters.py |