Pedro Rodriguez
d87ba751d3
merge commit for archive created by Sapling
2025-03-13 00:24:10 +00:00
Pedro Rodriguez
790e224b11
Update ppl evals to work with blt model, in addition to entropy model
...
Summary:
Test Plan:
2025-03-13 00:23:55 +00:00
Pedro Rodriguez
fe70785822
Update iterate_data
...
Summary:
Test Plan:
2025-03-13 00:23:55 +00:00
Srinivasan Iyer
c110f6be2a
Add way to call consolidate ( #80 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
* Add way to call consolidate
* black
* isort
---------
Co-authored-by: Srini Iyer <sviyer@meta.com>
2025-03-11 16:53:33 -07:00
Srinivasan Iyer
a5ceaaa226
When merging configs, do not merge data sources ( #79 )
...
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
* When merging configs, do not merge data sources
* Add todo
---------
Co-authored-by: Srini Iyer <sviyer@meta.com>
2025-03-11 11:03:24 -07:00
Pedro Rodriguez
7517ac2a9f
Get evals working again. ( #46 )
...
- PPL/validation: Works now and uses multi-gpu. For some reason 1 GPU differs from multi-GPU, can debug in a followup PR
- Generation evals likely work, but are very slow, so disabled for now
Test Plan:
```
torchrun --nproc-per-node 8 -m bytelatent.eval config=../internal-blt/configs/eval.yaml
```
2025-03-11 09:57:19 -07:00
Pedro Rodriguez
63913e4dba
Reduce per file resources arrow uses ( #77 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:
Test Plan:
2025-03-05 15:03:42 -08:00
Pedro Rodriguez
aec12c79e6
Merge 880493e742
into sapling-pr-archive-EntilZha
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-03-05 15:03:22 -08:00
Pedro Rodriguez
880493e742
Reduce per file resources arrow uses
...
Summary:
Test Plan:
2025-03-05 23:03:14 +00:00
Pedro Rodriguez
8f2cf8899d
Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases ( #75 )
...
Summary:
Test Plan:
2025-03-05 15:02:57 -08:00
Pedro Rodriguez
bde475f8a0
Merge a828594625
into sapling-pr-archive-EntilZha
2025-03-05 15:02:32 -08:00
Pedro Rodriguez
a828594625
Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:
Test Plan:
2025-03-05 23:02:23 +00:00
Pedro Rodriguez
ea1fc75862
Add approximate state persistence ( #73 )
...
Summary:
Test Plan:
***
More verbose multiprocess logging, fix get_state_and_recycle
Summary:
Test Plan:
2025-03-05 15:01:45 -08:00
Pedro Rodriguez
e78a24dc80
Merge 34664fa7f1
into sapling-pr-archive-EntilZha
2025-03-05 14:49:24 -08:00
Pedro Rodriguez
34664fa7f1
Reduce per file resources arrow uses
...
Summary:
Test Plan:
2025-03-05 22:49:15 +00:00
Pedro Rodriguez
3c08d7e5d7
Merge 3d44bd1b7a
into sapling-pr-archive-EntilZha
2025-03-05 13:23:32 -08:00
Pedro Rodriguez
3d44bd1b7a
Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases
...
Summary:
Test Plan:
2025-03-05 21:23:21 +00:00
Pedro Rodriguez
c3ad8b60f4
Add approximate state persistence
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:
Test Plan:
***
More verbose multiprocess logging, fix get_state_and_recycle
Summary:
Test Plan:
2025-03-05 21:23:21 +00:00
Pedro Rodriguez
3114f52e82
merge commit for archive created by Sapling
2025-03-05 21:16:34 +00:00
Pedro Rodriguez
8c6103e6e7
Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases
...
Summary:
Test Plan:
2025-03-05 21:16:25 +00:00
Pedro Rodriguez
daea91e4a9
Add approximate state persistence
...
Summary:
Test Plan:
***
More verbose multiprocess logging, fix get_state_and_recycle
Summary:
Test Plan:
2025-03-05 21:16:25 +00:00
Pedro Rodriguez
9bd51df961
Fix rsync to not preserve original permissions, instead use destination ( #76 )
...
Summary:
Test Plan:
2025-03-05 11:49:41 -08:00
Pedro Rodriguez
6bcefb0412
merge commit for archive created by Sapling
2025-03-05 19:48:54 +00:00
Pedro Rodriguez
f05acb95fb
Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases
...
Summary:
Test Plan:
2025-03-05 19:48:43 +00:00
Pedro Rodriguez
e60344da9e
Add approximate state persistence
...
Summary:
Test Plan:
***
More verbose multiprocess logging, fix get_state_and_recycle
Summary:
Test Plan:
2025-03-05 19:48:43 +00:00
Pedro Rodriguez
7e510088bc
Merge 44668ef966
into sapling-pr-archive-EntilZha
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-03-05 09:06:39 -08:00
Pedro Rodriguez
44668ef966
Fix rsync to not preserve original permissions, instead use destination
...
Summary:
Test Plan:
2025-03-05 17:06:33 +00:00
Pedro Rodriguez
eea7f02949
merge commit for archive created by Sapling
2025-03-05 17:01:54 +00:00
Pedro Rodriguez
428bfe2f76
Merge 1288483307
into sapling-pr-archive-EntilZha
2025-03-05 09:01:35 -08:00
Pedro Rodriguez
1288483307
Fix rsync to not preserve original permissions, instead use destination
...
Summary:
Test Plan:
2025-03-05 17:01:23 +00:00
Pedro Rodriguez
f0636bf31c
Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases
...
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:
Test Plan:
2025-03-05 16:59:36 +00:00
Pedro Rodriguez
4756a88cdd
Add approximate state persistence
...
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:
Test Plan:
***
More verbose multiprocess logging, fix get_state_and_recycle
Summary:
Test Plan:
2025-03-05 16:59:35 +00:00
Pedro Rodriguez
4c82ed8732
Merge abb4f7e6a4
into sapling-pr-archive-EntilZha
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-03-03 17:43:54 -08:00
Pedro Rodriguez
abb4f7e6a4
Let process start before yielding preloaded prefetch buffer, avoid needlessly losing buffer in edge cases
...
Summary:
Test Plan:
2025-03-04 01:43:41 +00:00
Pedro Rodriguez
e0ddc2dc82
merge commit for archive created by Sapling
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-03-04 01:19:29 +00:00
Pedro Rodriguez
3e1df4ea4d
Add approximate state persistence
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:
Test Plan:
2025-03-04 01:19:22 +00:00
Pedro Rodriguez
c727844e9d
Correctly reset batch iterator at each arrow create_iter call. ( #74 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:
Test Plan:
2025-03-03 16:59:02 -08:00
Pedro Rodriguez
dd8557400c
Merge f74aa7bd1a
into sapling-pr-archive-EntilZha
2025-03-03 15:32:39 -08:00
Pedro Rodriguez
f74aa7bd1a
Correctly reset batch iterator at each arrow create_iter call.
...
Summary:
Test Plan:
2025-03-03 23:32:30 +00:00
Pedro Rodriguez
9331363fc2
Merge 967b23fd05
into sapling-pr-archive-EntilZha
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-02-28 16:15:42 -08:00
Pedro Rodriguez
967b23fd05
Add approximate state persistence
...
Summary:
Test Plan:
2025-03-01 00:15:33 +00:00
Pedro Rodriguez
a81de49649
merge commit for archive created by Sapling
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-28 00:41:06 +00:00
Pedro Rodriguez
2cae41fe1f
Get evals working again.
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
- PPL/validation: Works now and uses multi-gpu. For some reason 1 GPU differs from multi-GPU, can debug in a followup PR
- Generation evals likely work, but are very slow, so disabled for now
Test Plan:
```
torchrun --nproc-per-node 8 -m bytelatent.eval config=../internal-blt/configs/eval.yaml
```
2025-02-28 00:41:01 +00:00
Pedro Rodriguez
0b12e91b3b
merge commit for archive created by Sapling
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-27 19:51:16 +00:00
Pedro Rodriguez
57d04fa37d
Minimal working eval
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-27 19:42:39 +00:00
Pedro Rodriguez
08b8c7cd05
Pass mask in packing_iterator, correctly handle last batch, fix masking ( #65 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
This commit does/fixes the following:
1. Adds unit tests for byte and patch packing to ensure it works correctly
2. Fixes a bug where for batches that end up with <max_length number of bytes (e.g., short patches), the mask was including elements that had value pad_id. This fixes the mask by setting it to be !=pad_id, if its not specified.
3. Correctly handles the last batch, where previously it would crash. This didn't affect training since we had enough data and/or looped iterators, but for evaluation perplexity, it comes up if we validation on an entire file.
4. Correctly forward the mask if it exists for byte packing
Test Plan:
```
pytest bytelatent/
```
Testing these changes more thoroughly in a stacked PR that fixes evals
2025-02-27 11:41:47 -08:00
Srinivasan Iyer
0da051f4f9
Initialize rope embeddings properly for the entropy model ( #72 )
...
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Co-authored-by: Srini Iyer <sviyer@meta.com>
2025-02-25 15:35:25 -08:00
Pedro Rodriguez
2d4f277596
Merge 9446c1ee5c
into sapling-pr-archive-EntilZha
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-02-25 11:15:02 -08:00
Pedro Rodriguez
a77878ae65
Merge c7b40706f0
into sapling-pr-archive-EntilZha
2025-02-25 11:11:30 -08:00
Pedro Rodriguez
9446c1ee5c
Minimal working eval
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-02-25 19:11:23 +00:00