vikarti.anatra/blt

mirror of https://github.com/facebookresearch/blt.git synced 2025-02-23 13:32:14 +00:00

Author	SHA1	Message	Date
Pedro Rodriguez	2655e4cf82	Remove byte tokenizer and add config args to switch between byte/patch packing Summary: Test Plan: ``` python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null checkpoint.dump.every=1000 checkpoint.eval.every=100000 eval=null pytest bytelatent/ ```	2025-02-22 01:13:13 +00:00
Pedro Rodriguez	fc3399ef40	Update iterator inheritance, pass file format args, limit iterator (#63 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details - Create a common class to use in all inheritance for states - Add a limit iterator that we can use in evals - Modify ArrowFileIterator behavior to not do arrow path inference if file_format='json' - Make EvalArgs valid - Move testing iterators to a common directory to allow usage in multiple test files - Make it so that SequenceIterator can take a None rng_state, to disable all rng ops (for eval mainly) Test Plan: - `pytest bytelatent` - `python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null eval=null`	2025-02-21 16:21:07 -08:00
Pedro Rodriguez	8c61ab5e67	Fix multiprocessing dataloader checkpointing and use it in the train script (#50 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-13 11:58:23 -08:00
Pedro Rodriguez	85c2f28f26	Test first batch matches (#53 ) Summary: Test Plan:	2025-02-13 10:05:08 -08:00
Pedro Rodriguez	936d9437be	Allow ArrowIterator to read from json (#45 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Currently, arrow iterator can only read arrow files. However, the pyarrow library can read other formats, including jsonlines. This allows the same ArrowIterator to read from jsonlines, so we can read from the original source data, and simply omit the entropy column when doing so Test Plan: Run train script until dataloader starts	2025-02-06 09:57:22 -08:00
Pedro Rodriguez	7044771a12	This includes fixes that make checkpointing and reloading work correctly. (#35 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details It also batches in a first set of changes for fixing eval code Summary: Test Plan:	2025-01-27 16:56:42 -08:00
Pedro Rodriguez	7622d28b74	Initial codes and scripts for training entropy model (#34 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan:	2025-01-27 09:46:44 -08:00
Pedro Rodriguez	bc42cebd7d	Update file check script to check sizes (#32 ) Some checks failed Lint with isort / lint (push) Has been cancelled Details Lint with Black / lint (push) Has been cancelled Details Summary: Test Plan:	2025-01-22 13:06:46 -08:00
Ink	392117bff2	Fix realtime entropy patching (#26 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details * allow loading of the entropy model directly * remove unused argument * remove spammy warning * allow patch_batch_size to be adjusted in the forward() method * revert to original patcher style, fix warning * allow grads when calculating entropies * fix grad flow * return preds from calculate_entropies() * remove legacy arg * fix an error with monotonicity and small sequence lengths * ensure patcher is serializable * revert patcher to original * remove unused import	2025-01-21 16:34:23 -08:00
Pedro Rodriguez	6ffeb66b53	Changes for training entropy model and correcting attention in local models (#25 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: - Refactor local model configs to be separate and clearer - Add attention arguments and correct which attention is used in local models - Preparation for being able to have an entropy train script - Fix failing unit tests Test Plan:	2025-01-17 14:23:01 -08:00
Pedro Rodriguez	1da3dd9315	Update preprocess_entropies script to blt inference + add fsspec support (#23 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan:	2025-01-13 15:28:14 -08:00
Pedro Rodriguez	b0120da72f	Replace regular filesystem calls with fsspec + add s3 support (#18 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: For compatibility with either local/nfs or S3 datasets, swap to fsspec. Add a tool to compare local and remote filesystems Test Plan: - Ran regular train script - Ran with config with data in S3	2025-01-10 11:04:41 -08:00
Pedro Rodriguez	bcc039bb75	Initial commit	2024-12-12 15:32:30 -08:00

13 commits