Summary: Test Plan:
Summary: - Refactor local model configs to be separate and clearer - Add attention arguments and correct which attention is used in local models - Preparation for being able to have an entropy train script - Fix failing unit tests Test Plan:
* allow flex-attention to silently fail * allow flex-attn to be disabled via an env var