koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-23 04:19:08 +00:00

History

Georgi Gerganov d14ce3dab4 llama : MTP clean-up (#23269 ) * llama : disable equal splits for recurrent memory with partial rollback * spec : re-enable p-min with MTP drafts * spec : re-enable ngram spec in combination with RS rollback * spec : fix ngram-map-* params * spec : fix acceptance logic in combined ngram + draft configs * graph : fix reuse for combined `token` + `embd` batches * spec : log parameters for each speculative implementation - add LOG_INF in each constructor with implementation type and parameters - extract device string logic into common_speculative_get_devices_str() - move 'adding speculative implementation' log from init into constructors Assisted-by: llama.cpp:local pi * spec : extend --spec-default with ngram-map-k4v Assisted-by: llama.cpp:local pi * minor : fix n_embd log * args : update draft.n_max == 3 + regen docs * spec : relax ngram-mod rejection thold to 0.25 @ 5 low * logs : improve * docs : update speculative decoding CLI argument documentation - Add missing draft model CPU scheduling and tensor override parameters - Update --spec-type to include all available types (excluding draft-eagle3 WIP) - Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0) - Remove deprecated options (spec-draft-ctx-size, spec-draft-replace) - Add environment variables for new parameters Assisted-by: llama.cpp:local pi * arg : step-back on adding k4v to the default spec config * cont : fix name		2026-05-19 15:32:58 +03:00
..
android	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
backend	[SCYL] add chapter for performance reference in SYCL.md (#23315 )	2026-05-19 09:44:51 +03:00
development	docs: more extensive RoPE documentation [no ci] (#21953 )	2026-04-15 14:45:16 +02:00
multimodal	mtmd : support MiniCPM-V 4.6 (#22529 )	2026-05-06 21:54:09 +02:00
ops	ggml-webgpu: Enables running gpt-oss-20b (#22906 )	2026-05-12 07:27:40 -07:00
android.md	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
autoparser.md	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
build-riscv64-spacemit.md	ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (#22863 )	2026-05-14 17:39:30 +08:00
build-s390x.md	docs: update s390x build docs (#19643 )	2026-02-16 00:33:34 +08:00
build.md	CUDA: require explicit opt-in for P2P access (#21910 )	2026-04-15 16:01:46 +02:00
docker.md	CI : Enable CUDA and Vulkan ARM64 runners and fix CI/CD (#21122 )	2026-03-30 20:24:37 +02:00
function-calling.md	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00
install.md	docs : add "Quick start" section for new users (#13862 )	2025-06-03 13:09:36 +02:00
llguidance.md
multi-gpu.md	Write a readme on Multi-GPU usage in llama.cpp (#22729 )	2026-05-07 17:48:40 +02:00
multimodal.md	docs: listing qwen3-asr and qwen3-omni as supported (#21857 )	2026-04-13 22:28:17 +02:00
ops.md	ggml-webgpu: Enables running gpt-oss-20b (#22906 )	2026-05-12 07:27:40 -07:00
preset.md	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
speculative.md	llama : MTP clean-up (#23269 )	2026-05-19 15:32:58 +03:00