koboldcpp/tools
Aman Gupta 255582687b
llama + spec: MTP Support (#22673)
* spec: support MTP

* fix batch size

* rename files

* cont : simplify (#7)

* MTP: clean-up (#9)

* MTP: clean-up

* review: use llama_context_type instead of llama_graph_type

* review: remove llama_model_has_mtp

* review: fix convert issues

* convert: fix pycheck

* review: formatting

* use `mtp-` for identifying mtp models

* convert: fix mtp conversion

* mtp -> draft-mtp

* remove unused llama_arch

* add need_embd in speculative

* llama: allow partial seq_rm for GDN models for speculative decoding

Currently speculative checkpoint needs to restart from a checkpoint
after some draft tokens are not accepted, this leads to some wastage in
running the target again. This PR adds the ability to rollback upto
`draft_max` by storing the GDN intermediates.

* fix pending state

* vulkan: add GDN partial rollback

* meta: extend check to axis 1

* metal: add GDN partial rollback

Extend the gated delta net kernel to store intermediate states for
partial rollback support on the Metal backend.

- Add K (snapshot slot count) as a function constant
- Read input state from slot 0 of the 3D state tensor
- Write intermediate states to different slots during token loop
- For K=1, maintain backward-compatible single-slot behavior

Ref: 8c05923630

Assisted-by: llama.cpp:local pi

* delta_net_base: use ggml_pad instead of new_tensor

* review: add need_rs_seq

* review: rename part_bounded to n_rs

* review: deslop comments

* review: rename, add asserts

* server : adjust checkpoint logic (#11)

* server : adjust checkpoint logic

* cont : rm asserts

* server-context: fix early exit

* spec : fix compatibility with n-gram and add TODOs (#13)

* metal : cleanup

* llama : fix faulty bitwise check in recurrent memory

* server : disable RS-based MTP in combination with other spec types

* spec : add TODOs

* cont : fix comment

* cont : update comment

* common : fix logic for ngram + mtp compat

* llama-memory: enable checkpointing with partial rollback

* cont: add test-case for loading into a dirty ctx

* llama-memory-recurrent: clear rs_idx in clear

* download: fix mtp path

* llama-arch: fix enorm op

* docs: update docs

* conversion: fix type annotations

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-05-16 20:06:23 +08:00
..
batched-bench libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
cli llama + spec: MTP Support (#22673) 2026-05-16 20:06:23 +08:00
completion llama + spec: MTP Support (#22673) 2026-05-16 20:06:23 +08:00
cvector-generator libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
export-lora libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
fit-params fit-params : refactor + add option to output estimated memory per device (#22171) 2026-04-21 09:54:36 +03:00
gguf-split libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
imatrix libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
llama-bench spec : refactor params (#22397) 2026-04-28 09:07:33 +03:00
mtmd mtmd: add chunks and fix preproc for qwen3a (#23073) 2026-05-15 19:32:47 +02:00
parser libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
perplexity fit-params : refactor + add option to output estimated memory per device (#22171) 2026-04-21 09:54:36 +03:00
quantize libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
results libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
rpc fix: rpc-server cache may not work in Windows environments (#22394) 2026-04-27 17:25:09 +03:00
server llama + spec: MTP Support (#22673) 2026-05-16 20:06:23 +08:00
tokenize libs : rename libcommon -> libllama-common (#21936) 2026-04-17 11:11:46 +03:00
tts logs : reduce (#23021) 2026-05-14 13:05:52 +03:00
ui ui: Fix handling of MCP resource template parameters (#23117) 2026-05-16 13:25:41 +02:00
CMakeLists.txt ui: Restructure repo to use tools/ui folder and ui / UI / llama-ui / LLAMA_UI naming (#23064) 2026-05-16 02:02:40 +02:00