vikarti.anatra/blt

mirror of https://github.com/facebookresearch/blt.git synced 2025-09-01 10:09:06 +00:00

Author	SHA1	Message	Date
Pedro Rodriguez	eea7f02949	merge commit for archive created by Sapling	2025-03-05 17:01:54 +00:00
Pedro Rodriguez	428bfe2f76	Merge `1288483307` into sapling-pr-archive-EntilZha	2025-03-05 09:01:35 -08:00
Pedro Rodriguez	1288483307	Fix rsync to not preserve original permissions, instead use destination Summary: Test Plan:	2025-03-05 17:01:23 +00:00
Pedro Rodriguez	f0636bf31c	Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan:	2025-03-05 16:59:36 +00:00
Pedro Rodriguez	4756a88cdd	Add approximate state persistence Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan: *** More verbose multiprocess logging, fix get_state_and_recycle Summary: Test Plan:	2025-03-05 16:59:35 +00:00
Pedro Rodriguez	4c82ed8732	Merge `abb4f7e6a4` into sapling-pr-archive-EntilZha Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details	2025-03-03 17:43:54 -08:00
Pedro Rodriguez	abb4f7e6a4	Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases Summary: Test Plan:	2025-03-04 01:43:41 +00:00
Pedro Rodriguez	e0ddc2dc82	merge commit for archive created by Sapling Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-03-04 01:19:29 +00:00
Pedro Rodriguez	3e1df4ea4d	Add approximate state persistence Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: Test Plan:	2025-03-04 01:19:22 +00:00
Pedro Rodriguez	c727844e9d	Correctly reset batch iterator at each arrow create_iter call. (#74 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: Test Plan:	2025-03-03 16:59:02 -08:00
Pedro Rodriguez	dd8557400c	Merge `f74aa7bd1a` into sapling-pr-archive-EntilZha	2025-03-03 15:32:39 -08:00
Pedro Rodriguez	f74aa7bd1a	Correctly reset batch iterator at each arrow create_iter call. Summary: Test Plan:	2025-03-03 23:32:30 +00:00
Pedro Rodriguez	9331363fc2	Merge `967b23fd05` into sapling-pr-archive-EntilZha Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details	2025-02-28 16:15:42 -08:00
Pedro Rodriguez	967b23fd05	Add approximate state persistence Summary: Test Plan:	2025-03-01 00:15:33 +00:00
Pedro Rodriguez	a81de49649	merge commit for archive created by Sapling Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-28 00:41:06 +00:00
Pedro Rodriguez	2cae41fe1f	Get evals working again. Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details - PPL/validation: Works now and uses multi-gpu. For some reason 1 GPU differs from multi-GPU, can debug in a followup PR - Generation evals likely work, but are very slow, so disabled for now Test Plan: ``` torchrun --nproc-per-node 8 -m bytelatent.eval config=../internal-blt/configs/eval.yaml ```	2025-02-28 00:41:01 +00:00
Pedro Rodriguez	0b12e91b3b	merge commit for archive created by Sapling Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-27 19:51:16 +00:00
Pedro Rodriguez	57d04fa37d	Minimal working eval Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-27 19:42:39 +00:00
Pedro Rodriguez	08b8c7cd05	Pass mask in packing_iterator, correctly handle last batch, fix masking (#65 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details This commit does/fixes the following: 1. Adds unit tests for byte and patch packing to ensure it works correctly 2. Fixes a bug where for batches that end up with <max_length number of bytes (e.g., short patches), the mask was including elements that had value pad_id. This fixes the mask by setting it to be !=pad_id, if its not specified. 3. Correctly handles the last batch, where previously it would crash. This didn't affect training since we had enough data and/or looped iterators, but for evaluation perplexity, it comes up if we validation on an entire file. 4. Correctly forward the mask if it exists for byte packing Test Plan: ``` pytest bytelatent/ ``` Testing these changes more thoroughly in a stacked PR that fixes evals	2025-02-27 11:41:47 -08:00
Srinivasan Iyer	0da051f4f9	Initialize rope embeddings properly for the entropy model (#72 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-25 15:35:25 -08:00
Pedro Rodriguez	2d4f277596	Merge `9446c1ee5c` into sapling-pr-archive-EntilZha Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details	2025-02-25 11:15:02 -08:00
Pedro Rodriguez	a77878ae65	Merge `c7b40706f0` into sapling-pr-archive-EntilZha	2025-02-25 11:11:30 -08:00
Pedro Rodriguez	9446c1ee5c	Minimal working eval Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details	2025-02-25 19:11:23 +00:00
Pedro Rodriguez	c7b40706f0	Pass mask in packing_iterator, correctly handle last batch, fix masking Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details This commit does/fixes the following: 1. Adds unit tests for byte and patch packing to ensure it works correctly 2. Fixes a bug where for batches that end up with <max_length number of bytes (e.g., short patches), the mask was including elements that had value pad_id. This fixes the mask by setting it to be !=pad_id, if its not specified. 3. Correctly handles the last batch, where previously it would crash. This didn't affect training since we had enough data and/or looped iterators, but for evaluation perplexity, it comes up if we validation on an entire file. 4. Correctly forward the mask if it exists for byte packing Test Plan: ``` pytest bytelatent/ ``` Testing these changes more thoroughly in a stacked PR that fixes evals	2025-02-25 19:11:23 +00:00
Pedro Rodriguez	aeb95f12a1	Remove byte tokenizer and add config args to switch between byte/patch packing (#68 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan: ``` python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null checkpoint.dump.every=1000 checkpoint.eval.every=100000 eval=null pytest bytelatent/ ```	2025-02-25 11:10:59 -08:00
Pedro Rodriguez	62cb8936ee	merge commit for archive created by Sapling Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-25 01:38:44 +00:00
Pedro Rodriguez	52d5603b4f	Pass mask in packing_iterator, correctly handle last batch, fix masking Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details This commit does/fixes the following: 1. Adds unit tests for byte and patch packing to ensure it works correctly 2. Fixes a bug where for batches that end up with <max_length number of bytes (e.g., short patches), the mask was including elements that had value pad_id. This fixes the mask by setting it to be !=pad_id, if its not specified. 3. Correctly handles the last batch, where previously it would crash. This didn't affect training since we had enough data and/or looped iterators, but for evaluation perplexity, it comes up if we validation on an entire file. 4. Correctly forward the mask if it exists for byte packing Test Plan: ``` pytest bytelatent/ ``` Testing these changes more thoroughly in a stacked PR that fixes evals	2025-02-25 01:38:38 +00:00
Pedro Rodriguez	f48ad82d96	Merge `6147207155` into sapling-pr-archive-EntilZha	2025-02-24 17:35:42 -08:00
Pedro Rodriguez	6147207155	Pass mask in packing_iterator, correctly handle last batch, fix masking This commit does/fixes the following: 1. Adds unit tests for byte and patch packing to ensure it works correctly 2. Fixes a bug where for batches that end up with <max_length number of bytes (e.g., short patches), the mask was including elements that had value pad_id. This fixes the mask by setting it to be !=pad_id, if its not specified. 3. Correctly handles the last batch, where previously it would crash. This didn't affect training since we had enough data and/or looped iterators, but for evaluation perplexity, it comes up if we validation on an entire file. 4. Correctly forward the mask if it exists for byte packing Test Plan: ``` pytest bytelatent/ ``` Testing these changes more thoroughly in a stacked PR that fixes evals	2025-02-25 01:35:38 +00:00
Pedro Rodriguez	2a04df1130	Merge `3aaeb8ac14` into sapling-pr-archive-EntilZha	2025-02-24 17:34:18 -08:00
Pedro Rodriguez	3aaeb8ac14	Pass mask in packing_iterator, correctly handle last batch, fix masking This commit does/fixes the following: 1. Adds unit tests for byte and patch packing to ensure it works correctly 2. Fixes a bug where for batches that end up with <max_length number of bytes (e.g., short patches), the mask was including elements that had value pad_id. This fixes the mask by setting it to be !=pad_id, if its not specified. 3. Correctly handles the last batch, where previously it would crash. This didn't affect training since we had enough data and/or looped iterators, but for evaluation perplexity, it comes up if we validation on an entire file. 4. Correctly forward the mask if it exists for byte packing Test Plan: ``` pytest bytelatent/ ``` Testing these changes more thoroughly in a stacked PR that fixes evals	2025-02-25 01:34:13 +00:00
Pedro Rodriguez	f3781cc0ca	merge commit for archive created by Sapling	2025-02-25 00:04:43 +00:00
Pedro Rodriguez	edccc0873d	Remove byte tokenizer and add config args to switch between byte/patch packing Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: Test Plan: ``` python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null checkpoint.dump.every=1000 checkpoint.eval.every=100000 eval=null pytest bytelatent/ ```	2025-02-24 23:56:43 +00:00
Pedro Rodriguez	ff36aa8642	Add vocab and seq len abstract fields (#66 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-24 14:41:58 -08:00
Pedro Rodriguez	bbd1edd90d	Merge `4c6ee1aef0` into sapling-pr-archive-EntilZha	2025-02-24 14:41:33 -08:00
Pedro Rodriguez	4c6ee1aef0	Add vocab and seq len abstract fields Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details	2025-02-24 22:41:01 +00:00
Bocheng Li	a6ed14f689	Fix: Correct model_args usage in parallelize_model call (#69 )	2025-02-24 14:40:38 -08:00
Pedro Rodriguez	de774bd98b	Merge `203bff3696` into sapling-pr-archive-EntilZha Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details	2025-02-21 17:27:19 -08:00
Pedro Rodriguez	203bff3696	Pass mask in packing_iterator, correctly handle last batch, fix masking Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details This commit does/fixes the following: 1. Adds unit tests for byte and patch packing to ensure it works correctly 2. Fixes a bug where for batches that end up with <max_length number of bytes (e.g., short patches), the mask was including elements that had value pad_id. This fixes the mask by setting it to be !=pad_id, if its not specified. 3. Correctly handles the last batch, where previously it would crash. This didn't affect training since we had enough data and/or looped iterators, but for evaluation perplexity, it comes up if we validation on an entire file. 4. Correctly forward the mask if it exists for byte packing Test Plan: ``` pytest bytelatent/ ``` Testing these changes more thoroughly in a stacked PR that fixes evals	2025-02-22 01:27:13 +00:00
Pedro Rodriguez	a0fa496aa2	merge commit for archive created by Sapling	2025-02-22 01:22:31 +00:00
Pedro Rodriguez	1ede87e1ae	Pass mask in packing_iterator, correctly handle last batch	2025-02-22 01:22:25 +00:00
Pedro Rodriguez	c233487b95	Merge `2655e4cf82` into sapling-pr-archive-EntilZha	2025-02-21 17:13:18 -08:00
Pedro Rodriguez	2655e4cf82	Remove byte tokenizer and add config args to switch between byte/patch packing Summary: Test Plan: ``` python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null checkpoint.dump.every=1000 checkpoint.eval.every=100000 eval=null pytest bytelatent/ ```	2025-02-22 01:13:13 +00:00
Pedro Rodriguez	44b1e5eaa1	Merge `edf86f6689` into sapling-pr-archive-EntilZha	2025-02-21 17:12:00 -08:00
Pedro Rodriguez	edf86f6689	Remove byte tokenizer and add config args to switch between byte/patch packing Summary: Test Plan:	2025-02-22 01:05:59 +00:00
Pedro Rodriguez	62a3ff55bf	merge commit for archive created by Sapling	2025-02-22 00:46:36 +00:00
Pedro Rodriguez	eac7a3fdbe	Pass mask in packing_iterator, correctly handle last batch	2025-02-22 00:46:29 +00:00
Pedro Rodriguez	fc3399ef40	Update iterator inheritance, pass file format args, limit iterator (#63 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details - Create a common class to use in all inheritance for states - Add a limit iterator that we can use in evals - Modify ArrowFileIterator behavior to not do arrow path inference if file_format='json' - Make EvalArgs valid - Move testing iterators to a common directory to allow usage in multiple test files - Make it so that SequenceIterator can take a None rng_state, to disable all rng ops (for eval mainly) Test Plan: - `pytest bytelatent` - `python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null eval=null`	2025-02-21 16:21:07 -08:00
Pedro Rodriguez	92b9a75391	merge commit for archive created by Sapling Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-21 19:26:36 +00:00
Pedro Rodriguez	3e9de62763	Pass mask in packing_iterator, correctly handle last batch Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-21 19:26:29 +00:00

1 2 3 4 5

203 commits