Add memoized cache to llama_grammar_reject_candidates_for_stack (#1615)

* Add memoized cache to llama_grammar_reject_candidates_for_stack * make size cutoff more aggressive and move to outer branch * update comment * add cache reset whenever grammar is reloaded * remove explicit reference types for compiler transportability
2025-09-09 16:44:35 +00:00 · 2025-06-25 04:22:19 -07:00 · 2025-06-25 04:22:19 -07:00 · 54dde5e565
commit 54dde5e565
parent b884a7f058
2 changed files with 60 additions and 0 deletions
--- a/gpttype_adapter.cpp
+++ b/gpttype_adapter.cpp
@ -1773,6 +1773,7 @@ static void load_grammar(const std::string & gammarstr)
 {
    if(grammar!=nullptr) //on demand free when next grammar is loaded
    {
+        llama_grammar_reset_memos();
        llama_grammar_free_impl(grammar);
        grammar = nullptr;
    }