mirror of
https://github.com/facebookresearch/blt.git
synced 2025-02-22 21:12:15 +00:00
Summary: - Make the data/checkpoint code fsspec compatible - Still will not work with s3 saves, due to `torch.distributed.checkpoint.save` not being out of the box workable with `fsspec`. Will implement in followup PR Test Plan: Run unit tests and the commands below ``` python -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100 ``` ``` torchrun --nproc-per-node 8 -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100 ``` These currently won't work due to the torch distributed save, but theses hould be tested at a later date ``` python -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100 dump_dir=s3://blt/scratch/checkpoint-test/ ``` ``` torchrun --nproc-per-node 8 -m bytelatent.train config=internal/configs/s3_debug.yaml eval=null checkpoint.dump.every=100 dump_dir=s3://blt/scratch/checkpoint-test/ ``` |
||
---|---|---|
.. | ||
configs | ||
data | ||
model | ||
plotting | ||
preprocess | ||
tokenizers | ||
.DS_Store | ||
__init__.py | ||
args.py | ||
base_transformer.py | ||
checkpoint.py | ||
constants.py | ||
distributed.py | ||
entropy_model.py | ||
eval.py | ||
float8.py | ||
generate.py | ||
logger.py | ||
metrics.py | ||
norms.py | ||
optim.py | ||
probe.py | ||
profiling.py | ||
stool.py | ||
test_blt.py | ||
test_entropy_model.py | ||
train.py | ||
transformer.py |