blt/bytelatent
Pedro Rodriguez 3117ac1f1f
Some checks failed
Lint with Black / lint (push) Has been cancelled
Lint with isort / lint (push) Has been cancelled
Make it possible to specify multiple config files
Summary:

Make it possible to specify multiple config files.
Parsing CLI is not a special case anymore, just uses the same config inheritance method.

Test Plan:

Test that this iterpolates in the right order via unit tests

Sample usage, loads the internal config, which references bytelatent/configs/entropy_model.yaml. The precendence order is:

- Default pydantic args
- Included configs, eg `config`
- CLI args

```
python -m bytelatent.print_config config=internal/configs/entropy_model.yaml eval=null

```


Summary:

Test Plan:
2025-02-18 18:41:02 +00:00
..
configs Make it possible to specify multiple config files 2025-02-18 18:41:02 +00:00
data Fix multiprocessing dataloader checkpointing and use it in the train script (#50) 2025-02-13 11:58:23 -08:00
model using apex rmsnorm (#57) 2025-02-14 11:22:03 -08:00
plotting Add plotting code from paper (#17) 2025-01-09 12:11:50 -08:00
preprocess Allow ArrowIterator to read from json (#45) 2025-02-06 09:57:22 -08:00
tokenizers Initial commit 2024-12-12 15:32:30 -08:00
.DS_Store Initial commit 2024-12-12 15:32:30 -08:00
__init__.py Initial commit 2024-12-12 15:32:30 -08:00
args.py Make it possible to specify multiple config files 2025-02-18 18:41:02 +00:00
base_transformer.py using apex rmsnorm (#57) 2025-02-14 11:22:03 -08:00
checkpoint.py Update checkpointing to use fsspec (#39) 2025-02-06 09:41:58 -08:00
config_parser.py Make it possible to specify multiple config files 2025-02-18 18:41:02 +00:00
constants.py Initial commit 2024-12-12 15:32:30 -08:00
distributed.py Add bpb and n_bytes to metric logging (#41) 2025-02-07 13:14:30 -08:00
entropy_model.py Changes for training entropy model and correcting attention in local models (#25) 2025-01-17 14:23:01 -08:00
eval.py Make it possible to specify multiple config files 2025-02-18 18:41:02 +00:00
float8.py Initial commit 2024-12-12 15:32:30 -08:00
generate.py This includes fixes that make checkpointing and reloading work correctly. (#35) 2025-01-27 16:56:42 -08:00
logger.py Update checkpointing to use fsspec (#39) 2025-02-06 09:41:58 -08:00
metrics.py Add bpb and n_bytes to metric logging (#41) 2025-02-07 13:14:30 -08:00
norms.py Fix distributed all reduce grad norm (#40) 2025-02-04 16:53:50 -08:00
optim.py Initial commit 2024-12-12 15:32:30 -08:00
print_config.py Make it possible to specify multiple config files 2025-02-18 18:41:02 +00:00
probe.py Initial commit 2024-12-12 15:32:30 -08:00
profiling.py Initial commit 2024-12-12 15:32:30 -08:00
stool.py Allow ArrowIterator to read from json (#45) 2025-02-06 09:57:22 -08:00
test_blt.py Initial codes and scripts for training entropy model (#34) 2025-01-27 09:46:44 -08:00
test_config_parser.py Make it possible to specify multiple config files 2025-02-18 18:41:02 +00:00
test_entropy_model.py Test first batch matches (#53) 2025-02-13 10:05:08 -08:00
train.py Make it possible to specify multiple config files 2025-02-18 18:41:02 +00:00
transformer.py using apex rmsnorm (#57) 2025-02-14 11:22:03 -08:00