Pedro Rodriguez
48df9ce785
Merge c6ef4285e2
into sapling-pr-archive-EntilZha
2025-02-04 10:03:26 -08:00
Pedro Rodriguez
c6ef4285e2
Several changes to enable entropy model training/eval
...
Summary:
- Make arrow iterator able to read from jsonl files, the entropies are omitted in this case
- Make the data/checkpoint code fsspec compatible
- Fix issues with all reduce with non-bf16 in dist_sum and norm computation.
- Minimal fixes to get eval to run, it is slow currently
- Add bpb numbers during training
Test Plan:
2025-02-04 18:03:19 +00:00
Pedro Rodriguez
4ff8341738
Merge 11cad6c84d
into sapling-pr-archive-EntilZha
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-03 18:29:37 -08:00
Pedro Rodriguez
11cad6c84d
WIP parallel copy script
...
Summary:
Test Plan:
2025-01-28 00:57:06 +00:00
Pedro Rodriguez
7044771a12
This includes fixes that make checkpointing and reloading work correctly. ( #35 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
It also batches in a first set of changes for fixing eval code
Summary:
Test Plan:
2025-01-27 16:56:42 -08:00
Pedro Rodriguez
4db801a532
Merge caf82b924e
into sapling-pr-archive-EntilZha
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-01-27 16:54:52 -08:00
Pedro Rodriguez
caf82b924e
This includes fixes that make checkpointing and reloading work correctly.
...
It also batches in a first set of changes for fixing eval code
Summary:
Test Plan:
2025-01-28 00:54:47 +00:00
Pedro Rodriguez
c2f1e4845e
Merge e02ba763b0
into sapling-pr-archive-EntilZha
2025-01-27 16:38:54 -08:00
Pedro Rodriguez
e02ba763b0
This includes fixes that make checkpointing and reloading work correctly.
...
It also batches in a first set of changes for fixing eval code
Summary:
Test Plan:
2025-01-28 00:38:46 +00:00
Pedro Rodriguez
7622d28b74
Initial codes and scripts for training entropy model ( #34 )
...
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:
Test Plan:
2025-01-27 09:46:44 -08:00
Pedro Rodriguez
b1c12dd275
Merge 34ca1f7d4b
into sapling-pr-archive-EntilZha
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-01-24 13:59:47 -08:00
Pedro Rodriguez
34ca1f7d4b
Initial codes and scripts for training entropy model
...
Summary:
Test Plan:
2025-01-24 21:59:42 +00:00
Pedro Rodriguez
fb09022e5e
Initial codes and scripts for training entropy model
...
Summary:
Test Plan:
2025-01-24 21:55:41 +00:00
Pedro Rodriguez
a809259e71
Use load_async flag to not start MP iterator ( #33 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:
Test Plan:
2025-01-24 10:57:20 -08:00
Pedro Rodriguez
bc42cebd7d
Update file check script to check sizes ( #32 )
...
Lint with isort / lint (push) Has been cancelled
Lint with Black / lint (push) Has been cancelled
Summary:
Test Plan:
2025-01-22 13:06:46 -08:00
Ink
392117bff2
Fix realtime entropy patching ( #26 )
...
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
* allow loading of the entropy model directly
* remove unused argument
* remove spammy warning
* allow patch_batch_size to be adjusted in the forward() method
* revert to original patcher style, fix warning
* allow grads when calculating entropies
* fix grad flow
* return preds from calculate_entropies()
* remove legacy arg
* fix an error with monotonicity and small sequence lengths
* ensure patcher is serializable
* revert patcher to original
* remove unused import
2025-01-21 16:34:23 -08:00
Pedro Rodriguez
6ffeb66b53
Changes for training entropy model and correcting attention in local models ( #25 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:
- Refactor local model configs to be separate and clearer
- Add attention arguments and correct which attention is used in local models
- Preparation for being able to have an entropy train script
- Fix failing unit tests
Test Plan:
2025-01-17 14:23:01 -08:00
Ink
caec8d2621
allow flex-attention to be disabled ( #19 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
* allow flex-attention to silently fail
* allow flex-attn to be disabled via an env var
2025-01-14 09:32:07 -08:00
Pedro Rodriguez
1da3dd9315
Update preprocess_entropies script to blt inference + add fsspec support ( #23 )
...
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:
Test Plan:
2025-01-13 15:28:14 -08:00
Pedro Rodriguez
b0120da72f
Replace regular filesystem calls with fsspec + add s3 support ( #18 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:
For compatibility with either local/nfs or S3 datasets, swap to fsspec.
Add a tool to compare local and remote filesystems
Test Plan:
- Ran regular train script
- Ran with config with data in S3
2025-01-10 11:04:41 -08:00
Pedro Rodriguez
d4ddb95322
Add plotting code from paper ( #17 )
...
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:
Test Plan:
2025-01-09 12:11:50 -08:00
Pedro Rodriguez
bcc039bb75
Initial commit
2024-12-12 15:32:30 -08:00