koboldcpp/src/models
Daniel Bevenius baf3cc6e1d
model : clarify MTP layer comment in qwen35.cpp [no ci] (#23338)
This commit attempts to clarify a code comment in graph_mtp regarding
where the MTP layer is stored.

The motivation for this is that it was not obvious to me what the
original comment meant and hopefully this makes it clearer.
2026-05-19 18:41:44 +02:00
..
afmoe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
apertus.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
arcee.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
arctic.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
arwkv7.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
baichuan.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
bailingmoe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
bailingmoe2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
bert.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
bitnet.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
bloom.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
chameleon.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
chatglm.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
codeshell.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
cogvlm.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
cohere2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
command-r.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
dbrx.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
deci.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
deepseek.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
deepseek2.cpp model : fix model type check for granite/llama3 and deepseek2/glm4.7 lite (#22870) 2026-05-10 08:44:29 +02:00
deepseek2ocr.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
delta-net-base.cpp llama : MTP clean-up (#23269) 2026-05-19 15:32:58 +03:00
dots1.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
dream.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
ernie4-5-moe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
ernie4-5.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
eurobert.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
exaone-moe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
exaone.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
exaone4.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
falcon-h1.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
falcon.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
gemma-embedding.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
gemma.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
gemma2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
gemma3.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
gemma3n.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
gemma4.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
glm-dsa.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
glm4-moe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
glm4.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
gpt2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
gptneox.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
granite-hybrid.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
granite-moe.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
granite.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
grok.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
grovemoe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
hunyuan-dense.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
hunyuan-moe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
hunyuan-vl.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
internlm2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
jais.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
jais2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
jamba.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
jina-bert-v2.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
jina-bert-v3.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
kimi-linear.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
lfm2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
lfm2moe.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
llada-moe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
llada.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
llama-embed.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
llama.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
llama4.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
maincoder.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
mamba-base.cpp model : wire up Nemotron-H tensors for NVFP4 support (#20561) 2026-03-16 09:19:16 +01:00
mamba.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
mamba2.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
mimo2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
minicpm.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
minicpm3.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
minimax-m2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
mistral3.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
mistral4.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
models.h llama : MTP clean-up (#23269) 2026-05-19 15:32:58 +03:00
modern-bert.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
mpt.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
nemotron-h-moe.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
nemotron-h.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
nemotron.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
neo-bert.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
nomic-bert-moe.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
nomic-bert.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
olmo.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
olmo2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
olmoe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
openai-moe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
openelm.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
orion.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
paddleocr.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
pangu-embed.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
phi2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
phi3.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
phimoe.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
plamo.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
plamo2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
plamo3.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
plm.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen2moe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen2vl.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen3.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen3moe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen3next.cpp llama + spec: MTP Support (#22673) 2026-05-16 20:06:23 +08:00
qwen3vl.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen3vlmoe.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
qwen35.cpp model : clarify MTP layer comment in qwen35.cpp [no ci] (#23338) 2026-05-19 18:41:44 +02:00
qwen35moe.cpp llama: avoid copying logits during prompt decode in MTP (#23198) 2026-05-17 23:30:25 +08:00
refact.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
rnd1.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
rwkv6-base.cpp models : deduplicate delta-net graphs for Qwen family (#19597) 2026-02-16 14:35:04 +02:00
rwkv6.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
rwkv6qwen2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
rwkv7-base.cpp models : deduplicate delta-net graphs for Qwen family (#19597) 2026-02-16 14:35:04 +02:00
rwkv7.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
seed-oss.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
smallthinker.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
smollm3.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
stablelm.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
starcoder.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
starcoder2.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
step35.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
t5.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
t5encoder.cpp model: move load_hparams and load_tensors to per-model definition (#22004) 2026-05-04 12:36:59 +02:00
wavtokenizer-dec.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00
xverse.cpp model : NvFP4 quantized LM head support (#23046) 2026-05-16 11:09:27 +02:00