Add memoized cache to llama_grammar_reject_candidates_for_stack (#1615)

* Add memoized cache to llama_grammar_reject_candidates_for_stack

* make size cutoff more aggressive and move to outer branch

* update comment

* add cache reset whenever grammar is reloaded

* remove explicit reference types for compiler transportability
This commit is contained in:
Reithan 2025-06-25 04:22:19 -07:00 committed by GitHub
parent b884a7f058
commit 54dde5e565
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 60 additions and 0 deletions

View file

@ -1773,6 +1773,7 @@ static void load_grammar(const std::string & gammarstr)
{
if(grammar!=nullptr) //on demand free when next grammar is loaded
{
llama_grammar_reset_memos();
llama_grammar_free_impl(grammar);
grammar = nullptr;
}