model : Qwen3 Next (#16095)

* Qwen3 Next - cleaned up version * Whitespaces and stuff * Correct minor errors * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Misc. fixes. * Clean up code, add missing hybrid qualifier * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Whitespace * Proper tensors for cb calls * Use llama-graph.h vertical alignment * BROKEN: chunking * Set new tensors as inputs. * Proper chunk logic * It's the circle of life... * More shenanigans for n_seq > 1 * Nail in the coffin? * Fix Windows build * Eh, one fails on Windows, the other fails on Mac... just use general capture. * quant : cleanup * model : cleanup * qwen3 : cleanup * cont : cleanup * cont : cleanup * ggml : revert change * qwen3 : cleanup * cont : cleanup * Readd cmath * qwen3 : fix typo * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Usual suspects * fix my bad suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-05-09 19:46:11 +00:00 · 2025-11-28 12:02:56 +01:00 · 2025-11-28 12:02:56 +01:00 · ff55414c42
commit ff55414c42
parent 73955f7d2a
16 changed files with 1345 additions and 19 deletions
--- a/src/llama-context.cpp
+++ b/src/llama-context.cpp
@ -1,5 +1,6 @@
 #include "llama-context.h"

+#include "llama-arch.h"
 #include "llama-impl.h"
 #include "llama-batch.h"
 #include "llama-io.h"
@ -1386,6 +1387,9 @@ void llama_context::output_reorder() {
 //

 uint32_t llama_context::graph_max_nodes() const {
+    if (model.arch == LLM_ARCH_QWEN3NEXT) {
+        return std::max<uint32_t>(8192u, 32u*model.n_tensors());
+    }
    return std::max<uint32_t>(1024u, 8u*model.n_tensors());
 }