Commit graph

151 commits

Author SHA1 Message Date
Pedro Rodriguez 86abff94d0
Merge 55ddb0f84b into sapling-pr-archive-EntilZha 2025-02-20 12:16:14 -08:00
Pedro Rodriguez 55ddb0f84b Pass mask in packing_iterator, correctly handle last batch 2025-02-20 20:15:46 +00:00
Pedro Rodriguez 8baeef13a1 merge commit for archive created by Sapling
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-20 00:57:24 +00:00
Pedro Rodriguez 0ffe2ab685 Update iterator inheritance, pass file format args, limit iterator
- Create a common class to use in all inheritance for states
- Add a limit iterator that we can use in evals
- Modify ArrowFileIterator behavior to not do arrow path inference if file_format='json'
- Make EvalArgs valid
- Move testing iterators to a common directory to allow usage in multiple test files
- Make it so that SequenceIterator can take a None rng_state, to disable all rng ops (for eval mainly)

Test Plan:

- `pytest bytelatent`
- `python -m bytelatent.train config=../internal-blt/configs/entropy_model.yaml logging.wandb=null eval=null`
2025-02-20 00:57:17 +00:00
Pedro Rodriguez 3c1c247809
Merge 2a717d6b40 into sapling-pr-archive-EntilZha 2025-02-19 16:38:06 -08:00
Pedro Rodriguez 2a717d6b40 Update iterators 2025-02-20 00:35:04 +00:00
Pedro Rodriguez b0956bde99
Make apex logs less noisy (#60)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

Test Plan:
2025-02-18 10:45:56 -08:00
Pedro Rodriguez 4b57d05c3b
Merge 2f247263b9 into sapling-pr-archive-EntilZha
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-02-18 10:43:12 -08:00
Pedro Rodriguez 2f247263b9 Make apex logs less noisy
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

Test Plan:
2025-02-18 18:43:06 +00:00
Pedro Rodriguez 82ab5930ec
Make it possible to specify multiple config files (#54)
Summary:

Make it possible to specify multiple config files.
Parsing CLI is not a special case anymore, just uses the same config inheritance method.

Test Plan:

Test that this iterpolates in the right order via unit tests

Sample usage, loads the internal config, which references bytelatent/configs/entropy_model.yaml. The precendence order is:

- Default pydantic args
- Included configs, eg `config`
- CLI args

```
python -m bytelatent.print_config config=internal/configs/entropy_model.yaml eval=null

```


Summary:

Test Plan:
2025-02-18 10:42:44 -08:00
Pedro Rodriguez 75fd18716e merge commit for archive created by Sapling 2025-02-18 18:41:21 +00:00
Pedro Rodriguez 3117ac1f1f Make it possible to specify multiple config files
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

Make it possible to specify multiple config files.
Parsing CLI is not a special case anymore, just uses the same config inheritance method.

Test Plan:

Test that this iterpolates in the right order via unit tests

Sample usage, loads the internal config, which references bytelatent/configs/entropy_model.yaml. The precendence order is:

- Default pydantic args
- Included configs, eg `config`
- CLI args

```
python -m bytelatent.print_config config=internal/configs/entropy_model.yaml eval=null

```


Summary:

Test Plan:
2025-02-18 18:41:02 +00:00
CharlesCNorton 9f29e0de18
fix(README): correct typo in quickstart instructions (#62)
Changed "your can activate the environment" to "you can activate the environment" for clarity.
2025-02-18 09:47:58 -08:00
Pedro Rodriguez f912535cb7
Merge 655eca670d into sapling-pr-archive-EntilZha
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
2025-02-14 15:46:06 -08:00
Pedro Rodriguez 88dedaa2ec
Merge a3e0647d03 into sapling-pr-archive-EntilZha 2025-02-14 15:45:43 -08:00
Pedro Rodriguez 655eca670d Minimal working eval
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

Test Plan:
2025-02-14 23:45:29 +00:00
Pedro Rodriguez a3e0647d03 Make apex logs less noisy
Summary:

Test Plan:
2025-02-14 23:45:28 +00:00
Pedro Rodriguez 52590842e0 merge commit for archive created by Sapling 2025-02-14 22:51:24 +00:00
Pedro Rodriguez f94babc94e Make it possible to specify multiple config files
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

Make it possible to specify multiple config files.
Parsing CLI is not a special case anymore, just uses the same config inheritance method.

Test Plan:

Test that this iterpolates in the right order via unit tests

Sample usage, loads the internal config, which references bytelatent/configs/entropy_model.yaml. The precendence order is:

- Default pydantic args
- Included configs, eg `config`
- CLI args

```
python -m bytelatent.print_config config=internal/configs/entropy_model.yaml eval=null

```


Summary:

Test Plan:
2025-02-14 22:50:57 +00:00
Pedro Rodriguez 018bf98798
Merge aa78c96ea4 into sapling-pr-archive-EntilZha 2025-02-14 13:06:55 -08:00
Pedro Rodriguez aa78c96ea4 Make it possible to specify multiple config files
Summary:

Make it possible to specify multiple config files.
Parsing CLI is not a special case anymore, just uses the same config inheritance method.

Test Plan:

Test that this iterpolates in the right order via unit tests

Sample usage, loads the internal config, which references bytelatent/configs/entropy_model.yaml. The precendence order is:

- Default pydantic args
- Included configs, eg `config`
- CLI args

```
python -m bytelatent.print_config config=internal/configs/entropy_model.yaml eval=null

```
2025-02-14 21:06:50 +00:00
Pedro Rodriguez ed6300375f
Merge bec0164820 into sapling-pr-archive-EntilZha 2025-02-14 13:04:04 -08:00
Pedro Rodriguez bec0164820 Make it possible to specify multiple config files
Summary:

Test Plan:

Test that this iterpolates in the right order, config -> configs -> cli args

```
# All three sources
python -m bytelatent.print_config config=bytelatent/configs/debug.yaml configs=[internal/configs/s3_debug.yaml] eval=null

# What worked before
python -m bytelatent.print_config config=internal/configs/s3_debug.yaml eval=null
```
2025-02-14 21:03:57 +00:00
Pedro Rodriguez 1c7031b4c4
Merge be3ff12cfe into sapling-pr-archive-EntilZha 2025-02-14 13:03:38 -08:00
Pedro Rodriguez be3ff12cfe Make it possible to specify multiple config files
Summary:

Test Plan:

Test that this iterpolates in the right order, config -> configs -> cli args

```
# All three sources
python -m bytelatent.print_config config=bytelatent/configs/debug.yaml configs=[internal/configs/s3_debug.yaml] eval=null

# What worked before
python -m bytelatent.print_config config=internal/configs/s3_debug.yaml eval=null
```
2025-02-14 21:03:26 +00:00
Srinivasan Iyer f3e8125f74
using apex rmsnorm (#57)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
* using apex rmsnorm

* added message for missing apex

* black

* missed a print

---------

Co-authored-by: Srini Iyer <sviyer@meta.com>
2025-02-14 11:22:03 -08:00
Srinivasan Iyer c49e25171e
Update README.md (#58) 2025-02-14 11:16:49 -08:00
Pedro Rodriguez 8c61ab5e67
Fix multiprocessing dataloader checkpointing and use it in the train script (#50)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-13 11:58:23 -08:00
Pedro Rodriguez 84afa0f121 merge commit for archive created by Sapling
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-13 19:01:55 +00:00
Pedro Rodriguez 53529dcc78 Fix multiprocessing dataloader checkpointing and use it in the train script
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

Test Plan:
2025-02-13 19:01:49 +00:00
Pedro Rodriguez 76e7b001bb
Merge 0c6cb995a0 into sapling-pr-archive-EntilZha 2025-02-13 10:39:03 -08:00
Pedro Rodriguez 0c6cb995a0 Fix multiprocessing dataloader checkpointing and use it in the train script
Summary:

Test Plan:
2025-02-13 18:38:58 +00:00
Pedro Rodriguez 85c2f28f26
Test first batch matches (#53)
Summary:

Test Plan:
2025-02-13 10:05:08 -08:00
Pedro Rodriguez 45d52b7ae3
Merge ab8f8a4412 into sapling-pr-archive-EntilZha 2025-02-13 10:04:43 -08:00
Pedro Rodriguez ab8f8a4412 Test first batch matches
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

Test Plan:
2025-02-13 18:04:30 +00:00
Srinivasan Iyer 9d907fed1c
disable reshard after forward (#56)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Co-authored-by: Srini Iyer <sviyer@meta.com>
2025-02-12 18:33:53 -08:00
Srinivasan Iyer 48e4ad0bd2
make sure max_encoder_seq_length matches (#55)
* make sure max_encoder_seq_length matches

* black and assert comment

---------

Co-authored-by: Srini Iyer <sviyer@meta.com>
2025-02-12 18:27:22 -08:00
Pedro Rodriguez 078791996f
Merge ece82cb960 into sapling-pr-archive-EntilZha
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-12 11:25:18 -08:00
Pedro Rodriguez ece82cb960 Make it possible to specify multiple config files
Summary:

Test Plan:

Test that this iterpolates in the right order, config -> configs -> cli args

```
# All three sources
python -m bytelatent.print_config config=bytelatent/configs/debug.yaml configs=[internal/configs/s3_debug.yaml] eval=null

# What worked before
python -m bytelatent.print_config config=internal/configs/s3_debug.yaml eval=null
```
2025-02-12 19:25:06 +00:00
Pedro Rodriguez 15d9c40abe
Merge 3e3193c1d4 into sapling-pr-archive-EntilZha 2025-02-12 10:24:54 -08:00
Pedro Rodriguez c0c5bdba91
Merge c54c9f0517 into sapling-pr-archive-EntilZha 2025-02-12 10:24:45 -08:00
Pedro Rodriguez 3e3193c1d4 Fix multiprocessing dataloader checkpointing and use it in the train script
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:

Test Plan:
2025-02-12 18:24:40 +00:00
Pedro Rodriguez c54c9f0517 Test first batch matches
Summary:

Test Plan:
2025-02-12 18:24:39 +00:00
Pedro Rodriguez ec59c13d81
Merge bd3cf61bb9 into sapling-pr-archive-EntilZha 2025-02-12 10:11:44 -08:00
Pedro Rodriguez 9613e0ea5f merge commit for archive created by Sapling 2025-02-12 18:09:31 +00:00
Pedro Rodriguez bd3cf61bb9 Fix multiprocessing dataloader checkpointing and use it in the train script
Summary:

Test Plan:
2025-02-12 18:09:26 +00:00
Pedro Rodriguez 4cee32ea8c Test first batch matches
Summary:

Test Plan:
2025-02-12 18:09:26 +00:00
Pedro Rodriguez b61a612bbb
Merge 92af9b3f56 into sapling-pr-archive-EntilZha 2025-02-12 10:07:50 -08:00
Pedro Rodriguez 92af9b3f56 Test first batch matches
Summary:

Test Plan:
2025-02-12 18:07:22 +00:00
Pedro Rodriguez c6cbacc8c1 merge commit for archive created by Sapling
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
2025-02-11 22:56:32 +00:00