GradientAI Auto ROPE Base calculation (#910)

* GradientAI Auto ROPE Base calculation

https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models
has a formula that better fits the ideal rope scaling. 

Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX.

* add in solar scaling logic

Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k.

* Update model_adapter.h

adding in tensor count to identify solar models based on tensor count of 435.

* Update model_adapter.cpp

add in n_tensor count for solar identification

* refactor and cleanup GradientAI rope scaling

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
This commit is contained in:
askmyteapot 2024-06-13 20:12:00 +10:00 committed by GitHub
parent 49e4c3fd7b
commit 1e72b65c38
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
3 changed files with 39 additions and 22 deletions

View file

@ -56,6 +56,7 @@ enum GGUFArch
ARCH_FALCON = 1,
ARCH_PHI = 2,
ARCH_MAMBA = 3,
ARCH_SOLAR = 4,
};
struct FileFormatExtraMeta