koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-20 09:25:53 +00:00

History

Aman Gupta 255582687b llama + spec: MTP Support (#22673 ) * spec: support MTP * fix batch size * rename files * cont : simplify (#7) * MTP: clean-up (#9) * MTP: clean-up * review: use llama_context_type instead of llama_graph_type * review: remove llama_model_has_mtp * review: fix convert issues * convert: fix pycheck * review: formatting * use `mtp-` for identifying mtp models * convert: fix mtp conversion * mtp -> draft-mtp * remove unused llama_arch * add need_embd in speculative * llama: allow partial seq_rm for GDN models for speculative decoding Currently speculative checkpoint needs to restart from a checkpoint after some draft tokens are not accepted, this leads to some wastage in running the target again. This PR adds the ability to rollback upto `draft_max` by storing the GDN intermediates. * fix pending state * vulkan: add GDN partial rollback * meta: extend check to axis 1 * metal: add GDN partial rollback Extend the gated delta net kernel to store intermediate states for partial rollback support on the Metal backend. - Add K (snapshot slot count) as a function constant - Read input state from slot 0 of the 3D state tensor - Write intermediate states to different slots during token loop - For K=1, maintain backward-compatible single-slot behavior Ref: `8c05923630` Assisted-by: llama.cpp:local pi * delta_net_base: use ggml_pad instead of new_tensor * review: add need_rs_seq * review: rename part_bounded to n_rs * review: deslop comments * review: rename, add asserts * server : adjust checkpoint logic (#11) * server : adjust checkpoint logic * cont : rm asserts * server-context: fix early exit * spec : fix compatibility with n-gram and add TODOs (#13) * metal : cleanup * llama : fix faulty bitwise check in recurrent memory * server : disable RS-based MTP in combination with other spec types * spec : add TODOs * cont : fix comment * cont : update comment * common : fix logic for ngram + mtp compat * llama-memory: enable checkpointing with partial rollback * cont: add test-case for loading into a dirty ctx * llama-memory-recurrent: clear rs_idx in clear * download: fix mtp path * llama-arch: fix enorm op * docs: update docs * conversion: fix type annotations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2026-05-16 20:06:23 +08:00
..
ggml-alloc.h	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )	2025-12-15 09:24:59 +01:00
ggml-backend.h	CUDA: lower-case PCI bus id, standardize for ggml (#22820 )	2026-05-08 10:09:38 +02:00
ggml-blas.h	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-cann.h	docs : Minor cleanups (#19252 )	2026-02-02 08:38:55 +02:00
ggml-cpp.h	ggml : fix ggml_gallocr_ptr type (ggml/1205)	2025-05-01 09:58:44 +03:00
ggml-cpu.h	ggml-cpu: FA split across kv for faster TG (#19209 )	2026-02-03 01:19:55 +08:00
ggml-cuda.h	ggml: backend-agnostic tensor parallelism (experimental) (#19378 )	2026-04-09 16:42:19 +02:00
ggml-hexagon.h	Add experimental ggml-hexagon backend for the Hexagon NPU (#16547 )	2025-10-22 13:47:09 -07:00
ggml-metal.h	metal : refactor + optimize v2 (#15995 )	2025-09-17 20:38:12 +03:00
ggml-opencl.h	Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693 )	2024-12-13 12:23:52 -08:00
ggml-openvino.h	ggml : add OpenVINO backend (#15307 )	2026-03-14 07:56:55 +02:00
ggml-opt.h	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
ggml-rpc.h	rpc : add native RDMA transport for RPC backend (RoCEv2) (#20590 )	2026-04-15 16:44:02 +03:00
ggml-sycl.h	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-virtgpu.h	ggml-virtgpu: make the code thread safe (#19204 )	2026-02-04 10:46:18 +08:00
ggml-vulkan.h	vulkan: Make Vulkan optional at runtime (#11493 ). (#11494 )	2025-02-10 07:17:21 +01:00
ggml-webgpu.h	ggml: Add initial WebGPU backend (#14521 )	2025-07-16 18:18:51 +03:00
ggml-zdnn.h	zdnn: refactor codebase + add docs (#16178 )	2025-09-23 14:53:05 +08:00
ggml-zendnn.h	ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 )	2025-12-07 00:13:33 +08:00
ggml.h	llama + spec: MTP Support (#22673 )	2026-05-16 20:06:23 +08:00
gguf.h	llama: fix llama-model-saver (#20503 )	2026-03-25 12:53:16 +02:00