Commit graph

18 commits

Author SHA1 Message Date
Srini Iyer b28ceb624d Add flag for rope outer in fp32 2025-02-06 00:40:51 +00:00
Srini Iyer 162b99b4a3 Log model 2025-02-06 00:26:37 +00:00
Srinivasan Iyer 7cf8fab49b
Fix wandb logging (#42)
Co-authored-by: Srini Iyer <sviyer@meta.com>
2025-02-05 16:24:39 -08:00
Pedro Rodriguez c79b1fdbd0
Fix distributed all reduce grad norm (#40)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:

With >1 GPU, but only 1 node, all reduces fail when inputs are not bf16. This uses a modified copy of torch's grad norm to avoid failures

Test Plan:

- Run unit tests:
- Run single gpu training: `python -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100`
- Run 1 node, multi-gpu training `torchrun --nproc-per-node 8 -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100`
2025-02-04 16:53:50 -08:00
Pedro Rodriguez 7044771a12
This includes fixes that make checkpointing and reloading work correctly. (#35)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
It also batches in a first set of changes for fixing eval code

Summary:

Test Plan:
2025-01-27 16:56:42 -08:00
Pedro Rodriguez 7622d28b74
Initial codes and scripts for training entropy model (#34)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:

Test Plan:
2025-01-27 09:46:44 -08:00
Pedro Rodriguez a809259e71
Use load_async flag to not start MP iterator (#33)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

Test Plan:
2025-01-24 10:57:20 -08:00
Pedro Rodriguez bc42cebd7d
Update file check script to check sizes (#32)
Some checks failed
Lint with isort / lint (push) Has been cancelled
Lint with Black / lint (push) Has been cancelled
Summary:

Test Plan:
2025-01-22 13:06:46 -08:00
Ink 392117bff2
Fix realtime entropy patching (#26)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
* allow loading of the entropy model directly

* remove unused argument

* remove spammy warning

* allow patch_batch_size to be adjusted in the forward() method

* revert to original patcher style, fix warning

* allow grads when calculating entropies

* fix grad flow

* return preds from calculate_entropies()

* remove legacy arg

* fix an error with monotonicity and small sequence lengths

* ensure patcher is serializable

* revert patcher to original

* remove unused import
2025-01-21 16:34:23 -08:00
Pedro Rodriguez 6ffeb66b53
Changes for training entropy model and correcting attention in local models (#25)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

- Refactor local model configs to be separate and clearer
- Add attention arguments and correct which attention is used in local models
- Preparation for being able to have an entropy train script
- Fix failing unit tests

Test Plan:
2025-01-17 14:23:01 -08:00
Ink caec8d2621
allow flex-attention to be disabled (#19)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
* allow flex-attention to silently fail

* allow flex-attn to be disabled via an env var
2025-01-14 09:32:07 -08:00
Pedro Rodriguez 1da3dd9315
Update preprocess_entropies script to blt inference + add fsspec support (#23)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:

Test Plan:
2025-01-13 15:28:14 -08:00
Pedro Rodriguez b0120da72f
Replace regular filesystem calls with fsspec + add s3 support (#18)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

For compatibility with either local/nfs or S3 datasets, swap to fsspec.

Add a tool to compare local and remote filesystems

Test Plan:

- Ran regular train script
- Ran with config with data in S3
2025-01-10 11:04:41 -08:00
Pedro Rodriguez d4ddb95322
Add plotting code from paper (#17)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:

Test Plan:
2025-01-09 12:11:50 -08:00
Ink 2fdc6f3cc9
Package bytelatent as a module (#7)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
* make installable via pip

* fix missing xformers deps

* remove non-core dependencies

* fix linting

* fix isort
2025-01-06 16:44:50 -08:00
Ikko Eltociear Ashimine 9065bb1cce
docs: update README.md (#1)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
folowing -> following
2025-01-03 12:08:00 -08:00
Daniele Sartiano 898671b66b
Update README.md (#13)
Fixed typo on Meta Lingua
2025-01-03 12:06:47 -08:00
Pedro Rodriguez bcc039bb75 Initial commit 2024-12-12 15:32:30 -08:00