Summary:
Make it possible to specify multiple config files.
Parsing CLI is not a special case anymore, just uses the same config inheritance method.
Test Plan:
Test that this iterpolates in the right order via unit tests
Sample usage, loads the internal config, which references bytelatent/configs/entropy_model.yaml. The precendence order is:
- Default pydantic args
- Included configs, eg `config`
- CLI args
```
python -m bytelatent.print_config config=internal/configs/entropy_model.yaml eval=null
```
Summary:
Test Plan:
Summary:
Make it possible to specify multiple config files.
Parsing CLI is not a special case anymore, just uses the same config inheritance method.
Test Plan:
Test that this iterpolates in the right order via unit tests
Sample usage, loads the internal config, which references bytelatent/configs/entropy_model.yaml. The precendence order is:
- Default pydantic args
- Included configs, eg `config`
- CLI args
```
python -m bytelatent.print_config config=internal/configs/entropy_model.yaml eval=null
```
Summary:
Test Plan:
Test that this iterpolates in the right order, config -> configs -> cli args
```
# All three sources
python -m bytelatent.print_config config=bytelatent/configs/debug.yaml configs=[internal/configs/s3_debug.yaml] eval=null
# What worked before
python -m bytelatent.print_config config=internal/configs/s3_debug.yaml eval=null
```
Summary:
Test Plan:
Test that this iterpolates in the right order, config -> configs -> cli args
```
# All three sources
python -m bytelatent.print_config config=bytelatent/configs/debug.yaml configs=[internal/configs/s3_debug.yaml] eval=null
# What worked before
python -m bytelatent.print_config config=internal/configs/s3_debug.yaml eval=null
```
Summary:
Test Plan:
Test that this iterpolates in the right order, config -> configs -> cli args
```
# All three sources
python -m bytelatent.print_config config=bytelatent/configs/debug.yaml configs=[internal/configs/s3_debug.yaml] eval=null
# What worked before
python -m bytelatent.print_config config=internal/configs/s3_debug.yaml eval=null
```
Summary:
Currently, arrow iterator can only read arrow files. However, the pyarrow library can read
other formats, including jsonlines. This allows the same ArrowIterator to read from jsonlines,
so we can read from the original source data, and simply omit the entropy column when doing so
Test Plan:
Run train script until dataloader starts
Summary:
Currently, arrow iterator can only read arrow files. However, the pyarrow library can read
other formats, including jsonlines. This allows the same ArrowIterator to read from jsonlines,
so we can read from the original source data, and simply omit the entropy column when doing so
Test Plan:
Run train script until dataloader starts