vikarti.anatra/blt

mirror of https://github.com/facebookresearch/blt.git synced 2025-02-22 21:12:15 +00:00

Author	SHA1	Message	Date
Pedro Rodriguez	2f247263b9	Make apex logs less noisy Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: Test Plan:	2025-02-18 18:43:06 +00:00
Pedro Rodriguez	82ab5930ec	Make it possible to specify multiple config files (#54 ) Summary: Make it possible to specify multiple config files. Parsing CLI is not a special case anymore, just uses the same config inheritance method. Test Plan: Test that this iterpolates in the right order via unit tests Sample usage, loads the internal config, which references bytelatent/configs/entropy_model.yaml. The precendence order is: - Default pydantic args - Included configs, eg `config` - CLI args ``` python -m bytelatent.print_config config=internal/configs/entropy_model.yaml eval=null ``` Summary: Test Plan:	2025-02-18 10:42:44 -08:00
CharlesCNorton	9f29e0de18	fix(README): correct typo in quickstart instructions (#62 ) Changed "your can activate the environment" to "you can activate the environment" for clarity.	2025-02-18 09:47:58 -08:00
Srinivasan Iyer	f3e8125f74	using apex rmsnorm (#57 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details * using apex rmsnorm * added message for missing apex * black * missed a print --------- Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-14 11:22:03 -08:00
Srinivasan Iyer	c49e25171e	Update README.md (#58 )	2025-02-14 11:16:49 -08:00
Pedro Rodriguez	8c61ab5e67	Fix multiprocessing dataloader checkpointing and use it in the train script (#50 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details	2025-02-13 11:58:23 -08:00
Pedro Rodriguez	85c2f28f26	Test first batch matches (#53 ) Summary: Test Plan:	2025-02-13 10:05:08 -08:00
Srinivasan Iyer	9d907fed1c	disable reshard after forward (#56 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-12 18:33:53 -08:00
Srinivasan Iyer	48e4ad0bd2	make sure max_encoder_seq_length matches (#55 ) * make sure max_encoder_seq_length matches * black and assert comment --------- Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-12 18:27:22 -08:00
Srinivasan Iyer	22c7fe1d1c	fix save and reload model state (#49 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-07 14:27:47 -08:00
Pedro Rodriguez	fe45f69fbf	Add bpb and n_bytes to metric logging (#41 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan:	2025-02-07 13:14:30 -08:00
Srinivasan Iyer	aebdc481a8	Fix init and repro (#48 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details * Fix init and repro * comment + black --------- Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-06 14:18:02 -08:00
Pedro Rodriguez	936d9437be	Allow ArrowIterator to read from json (#45 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Currently, arrow iterator can only read arrow files. However, the pyarrow library can read other formats, including jsonlines. This allows the same ArrowIterator to read from jsonlines, so we can read from the original source data, and simply omit the entropy column when doing so Test Plan: Run train script until dataloader starts	2025-02-06 09:57:22 -08:00
Pedro Rodriguez	afedb16598	Update checkpointing to use fsspec (#39 ) Summary: - Make the data/checkpoint code fsspec compatible - Still will not work with s3 saves, due to `torch.distributed.checkpoint.save` not being out of the box workable with `fsspec`. Will implement in followup PR Test Plan: Run unit tests and the commands below ``` python -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100 ``` ``` torchrun --nproc-per-node 8 -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100 ``` These currently won't work due to the torch distributed save, but theses hould be tested at a later date ``` python -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100 dump_dir=s3://blt/scratch/checkpoint-test/ ``` ``` torchrun --nproc-per-node 8 -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100 dump_dir=s3://blt/scratch/checkpoint-test/ ```	2025-02-06 09:41:58 -08:00
Srinivasan Iyer	739dc71a0a	Add rope fp32 (#43 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details * Log model * Add flag for rope outer in fp32 --------- Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-05 17:19:37 -08:00
Srinivasan Iyer	6fbaf7266f	fix stool (#44 ) Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-05 17:18:40 -08:00
Srinivasan Iyer	7cf8fab49b	Fix wandb logging (#42 ) Co-authored-by: Srini Iyer <sviyer@meta.com>	2025-02-05 16:24:39 -08:00
Pedro Rodriguez	c79b1fdbd0	Fix distributed all reduce grad norm (#40 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: With >1 GPU, but only 1 node, all reduces fail when inputs are not bf16. This uses a modified copy of torch's grad norm to avoid failures Test Plan: - Run unit tests: - Run single gpu training: `python -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100` - Run 1 node, multi-gpu training `torchrun --nproc-per-node 8 -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100`	2025-02-04 16:53:50 -08:00
Pedro Rodriguez	7044771a12	This includes fixes that make checkpointing and reloading work correctly. (#35 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details It also batches in a first set of changes for fixing eval code Summary: Test Plan:	2025-01-27 16:56:42 -08:00
Pedro Rodriguez	7622d28b74	Initial codes and scripts for training entropy model (#34 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan:	2025-01-27 09:46:44 -08:00
Pedro Rodriguez	a809259e71	Use load_async flag to not start MP iterator (#33 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: Test Plan:	2025-01-24 10:57:20 -08:00
Pedro Rodriguez	bc42cebd7d	Update file check script to check sizes (#32 ) Some checks failed Lint with isort / lint (push) Has been cancelled Details Lint with Black / lint (push) Has been cancelled Details Summary: Test Plan:	2025-01-22 13:06:46 -08:00
Ink	392117bff2	Fix realtime entropy patching (#26 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details * allow loading of the entropy model directly * remove unused argument * remove spammy warning * allow patch_batch_size to be adjusted in the forward() method * revert to original patcher style, fix warning * allow grads when calculating entropies * fix grad flow * return preds from calculate_entropies() * remove legacy arg * fix an error with monotonicity and small sequence lengths * ensure patcher is serializable * revert patcher to original * remove unused import	2025-01-21 16:34:23 -08:00
Pedro Rodriguez	6ffeb66b53	Changes for training entropy model and correcting attention in local models (#25 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: - Refactor local model configs to be separate and clearer - Add attention arguments and correct which attention is used in local models - Preparation for being able to have an entropy train script - Fix failing unit tests Test Plan:	2025-01-17 14:23:01 -08:00
Ink	caec8d2621	allow flex-attention to be disabled (#19 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details * allow flex-attention to silently fail * allow flex-attn to be disabled via an env var	2025-01-14 09:32:07 -08:00
Pedro Rodriguez	1da3dd9315	Update preprocess_entropies script to blt inference + add fsspec support (#23 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan:	2025-01-13 15:28:14 -08:00
Pedro Rodriguez	b0120da72f	Replace regular filesystem calls with fsspec + add s3 support (#18 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details Summary: For compatibility with either local/nfs or S3 datasets, swap to fsspec. Add a tool to compare local and remote filesystems Test Plan: - Ran regular train script - Ran with config with data in S3	2025-01-10 11:04:41 -08:00
Pedro Rodriguez	d4ddb95322	Add plotting code from paper (#17 ) Some checks are pending Lint with Black / lint (push) Waiting to run Details Lint with isort / lint (push) Waiting to run Details Summary: Test Plan:	2025-01-09 12:11:50 -08:00
Ink	2fdc6f3cc9	Package `bytelatent` as a module (#7 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details * make installable via pip * fix missing xformers deps * remove non-core dependencies * fix linting * fix isort	2025-01-06 16:44:50 -08:00
Ikko Eltociear Ashimine	9065bb1cce	docs: update README.md (#1 ) Some checks failed Lint with Black / lint (push) Has been cancelled Details Lint with isort / lint (push) Has been cancelled Details folowing -> following	2025-01-03 12:08:00 -08:00
Daniele Sartiano	898671b66b	Update README.md (#13 ) Fixed typo on Meta Lingua	2025-01-03 12:06:47 -08:00
Pedro Rodriguez	bcc039bb75	Initial commit	2024-12-12 15:32:30 -08:00

32 commits