mirror of
https://github.com/facebookresearch/blt.git
synced 2025-02-23 13:32:14 +00:00
This commit does/fixes the following: 1. Adds unit tests for byte and patch packing to ensure it works correctly 2. Fixes a bug where for batches that end up with <max_length number of bytes (e.g., short patches), the mask was including elements that had value pad_id. This fixes the mask by setting it to be !=pad_id, if its not specified. 3. Correctly handles the last batch, where previously it would crash. This didn't affect training since we had enough data and/or looped iterators, but for evaluation perplexity, it comes up if we validation on an entire file. 4. Correctly forward the mask if it exists for byte packing Test Plan: ``` pytest bytelatent/ ``` Testing these changes more thoroughly in a stacked PR that fixes evals |
||
---|---|---|
.. | ||
__init__.py | ||
abstract_iterator.py | ||
arrow_iterator.py | ||
dev_iterators.py | ||
limit_iterator.py | ||
looping_iterator.py | ||
multiprocess_iterator.py | ||
packing_iterator.py | ||
preprocess_iterator.py | ||
sampling_iterator.py | ||
sequence_iterator.py | ||
test_arrow_iterator.py | ||
test_iters.py | ||
test_limit_iterator.py | ||
test_packing_iterator.py |