Fix realtime entropy patching (#26)
Some checks are pending
Lint with Black / lint (push) Waiting to run
Lint with isort / lint (push) Waiting to run

* allow loading of the entropy model directly

* remove unused argument

* remove spammy warning

* allow patch_batch_size to be adjusted in the forward() method

* revert to original patcher style, fix warning

* allow grads when calculating entropies

* fix grad flow

* return preds from calculate_entropies()

* remove legacy arg

* fix an error with monotonicity and small sequence lengths

* ensure patcher is serializable

* revert patcher to original

* remove unused import
This commit is contained in:
Ink 2025-01-21 18:34:23 -06:00 committed by GitHub
parent 6ffeb66b53
commit 392117bff2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 26 additions and 12 deletions

View file

@ -162,9 +162,6 @@ def create_causal_mask(
return "causal"
if BLT_SUPPRESS_ATTN_ERROR == 1:
logging.warning(
"SDPA attention being used, which doesn't have specialized attention implementations for block_causal and local_block_causal attention. Allowing model to run since BLT_SUPPRESS_ATTN_ERROR=1"
)
return "causal"
else:
raise ValueError(