API: add /props route (#1222)

* API: add an /extra/chat_template route A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model. * switch to pre-established /props endpoint for chat template * bug-fix (upstream): one-off in string juggling
2025-09-10 00:54:41 +00:00 · 2024-11-21 11:58:32 +09:00 · 2024-11-21 11:58:32 +09:00 · 547ab2aebb
commit 547ab2aebb
parent 8ab3eb89a8
4 changed files with 35 additions and 0 deletions
--- a/gpttype_adapter.cpp
+++ b/gpttype_adapter.cpp
@ -2491,6 +2491,21 @@ bool gpttype_generate_abort()
    return true;
 }

+std::string gpttype_get_chat_template()
+{
+    // copied from examples/server/utils.hpp::llama_get_chat_template
+    std::string template_key = "tokenizer.chat_template";
+    // call with NULL buffer to get the total size of the string
+    int32_t res = llama_model_meta_val_str(&llama_ctx_v4->model, template_key.c_str(), NULL, 0);
+    if (res < 0) {
+        return "";
+    }
+
+    std::vector<char> model_template(res + 1, 0);
+    llama_model_meta_val_str(&llama_ctx_v4->model, template_key.c_str(), model_template.data(), model_template.size());
+    return std::string(model_template.data(), model_template.size() - 1);
+}
+
 std::vector<int> gpttype_get_token_arr(const std::string & input, bool addbos)
 {
    std::vector<int> toks;