Commit graph

10 commits

Author SHA1 Message Date
Pedro Rodriguez 4cee32ea8c Test first batch matches
Summary:

Test Plan:
2025-02-12 18:09:26 +00:00
Pedro Rodriguez 936d9437be
Allow ArrowIterator to read from json (#45)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:

Currently, arrow iterator can only read arrow files. However, the pyarrow library can read
other formats, including jsonlines. This allows the same ArrowIterator to read from jsonlines,
so we can read from the original source data, and simply omit the entropy column when doing so

Test Plan:

Run train script until dataloader starts
2025-02-06 09:57:22 -08:00
Pedro Rodriguez 7044771a12
This includes fixes that make checkpointing and reloading work correctly. (#35)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
It also batches in a first set of changes for fixing eval code

Summary:

Test Plan:
2025-01-27 16:56:42 -08:00
Pedro Rodriguez 7622d28b74
Initial codes and scripts for training entropy model (#34)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:

Test Plan:
2025-01-27 09:46:44 -08:00
Pedro Rodriguez bc42cebd7d
Update file check script to check sizes (#32)
Some checks failed
Lint with isort / lint (push) Has been cancelled
Lint with Black / lint (push) Has been cancelled
Summary:

Test Plan:
2025-01-22 13:06:46 -08:00
Ink 392117bff2
Fix realtime entropy patching (#26)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
* allow loading of the entropy model directly

* remove unused argument

* remove spammy warning

* allow patch_batch_size to be adjusted in the forward() method

* revert to original patcher style, fix warning

* allow grads when calculating entropies

* fix grad flow

* return preds from calculate_entropies()

* remove legacy arg

* fix an error with monotonicity and small sequence lengths

* ensure patcher is serializable

* revert patcher to original

* remove unused import
2025-01-21 16:34:23 -08:00
Pedro Rodriguez 6ffeb66b53
Changes for training entropy model and correcting attention in local models (#25)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

- Refactor local model configs to be separate and clearer
- Add attention arguments and correct which attention is used in local models
- Preparation for being able to have an entropy train script
- Fix failing unit tests

Test Plan:
2025-01-17 14:23:01 -08:00
Pedro Rodriguez 1da3dd9315
Update preprocess_entropies script to blt inference + add fsspec support (#23)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run
Summary:

Test Plan:
2025-01-13 15:28:14 -08:00
Pedro Rodriguez b0120da72f
Replace regular filesystem calls with fsspec + add s3 support (#18)
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Summary:

For compatibility with either local/nfs or S3 datasets, swap to fsspec.

Add a tool to compare local and remote filesystems

Test Plan:

- Ran regular train script
- Ran with config with data in S3
2025-01-10 11:04:41 -08:00
Pedro Rodriguez bcc039bb75 Initial commit 2024-12-12 15:32:30 -08:00