koboldcpp

server: correct accepted tokens when need draft token replay (#26320)

python-type-check.yml #426 -Commit 000547513f pushed by vrr

upstream

2026-07-31 11:43:42 +00:00

0s

fit : count nextn (MTP) blocks in n_gpu_layers so front layers stay on GPU (#26177)

python-type-check.yml #425 -Commit 0324696b8e pushed by vrr

upstream

2026-07-28 20:03:29 +00:00

0s

fit : count nextn (MTP) blocks in n_gpu_layers so front layers stay on GPU (#26177)

pre-tokenizer-hashes.yml #424 -Commit 0324696b8e pushed by vrr

upstream

2026-07-28 20:03:29 +00:00

0s

opencl: cache compiled cl_program binaries on disk (#26050)

update-ops-docs.yml #423 -Commit ed7adbfefd pushed by vrr

upstream

2026-07-25 20:03:25 +00:00

0s

opencl: cache compiled cl_program binaries on disk (#26050)

python-type-check.yml #422 -Commit ed7adbfefd pushed by vrr

upstream

2026-07-25 20:03:25 +00:00

0s

opencl: cache compiled cl_program binaries on disk (#26050)

python-check-requirements.yml #421 -Commit ed7adbfefd pushed by vrr

upstream

2026-07-25 20:03:24 +00:00

0s

opencl: cache compiled cl_program binaries on disk (#26050)

pre-tokenizer-hashes.yml #420 -Commit ed7adbfefd pushed by vrr

upstream

2026-07-25 20:03:24 +00:00

0s

opencl: load and use `kernel_gemm_moe_q6_k_f32_ns` from bin kernel lib (#25797)

python-type-check.yml #419 -Commit 86a9c79f86 pushed by vrr

upstream

2026-07-19 08:03:25 +00:00

0s

DeepseekV4: reduce graph splits (#25702)

update-ops-docs.yml #418 -Commit 33a75f41c3 pushed by vrr

upstream

2026-07-16 15:18:24 +00:00

0s

DeepseekV4: reduce graph splits (#25702)

python-type-check.yml #417 -Commit 33a75f41c3 pushed by vrr

upstream

2026-07-16 15:18:24 +00:00

0s

DeepseekV4: reduce graph splits (#25702)

python-check-requirements.yml #416 -Commit 33a75f41c3 pushed by vrr

upstream

2026-07-16 15:18:24 +00:00

0s

DeepseekV4: reduce graph splits (#25702)

pre-tokenizer-hashes.yml #415 -Commit 33a75f41c3 pushed by vrr

upstream

2026-07-16 15:18:24 +00:00

0s

tests: Harmonize header use (#25616)

python-type-check.yml #414 -Commit f4253ef965 pushed by vrr

upstream

2026-07-14 21:18:22 +00:00

0s

server: accept null sampling params (#25538)

update-ops-docs.yml #413 -Commit 4f37f51972 pushed by vrr

upstream

2026-07-12 03:18:22 +00:00

0s

server: accept null sampling params (#25538)

python-type-check.yml #412 -Commit 4f37f51972 pushed by vrr

upstream

2026-07-12 03:18:22 +00:00

0s

server-stream: follow-up on SSE Replay Buffer (#23226) (#25047)

update-ops-docs.yml #411 -Commit bbebeec4a8 pushed by vrr

upstream

2026-07-09 21:18:26 +00:00

0s

server-stream: follow-up on SSE Replay Buffer (#23226) (#25047)

python-type-check.yml #410 -Commit bbebeec4a8 pushed by vrr

upstream

2026-07-09 21:18:25 +00:00

0s

CUDA: extend K-type validation to V-types for flash attention (#24403)

python-type-check.yml #409 -Commit cb295bf596 pushed by vrr

upstream

2026-07-08 00:36:56 +00:00

0s

llama : add guard for K/V rotation input when buffer is unallocated (#25215)

python-type-check.yml #408 -Commit a4107133a6 pushed by vrr

upstream

2026-07-06 12:36:56 +00:00

0s

opencl: allow loading precompiled binary kernels from library (#23042)

python-type-check.yml #407 -Commit 4fc4ec5541 pushed by vrr

upstream

2026-07-03 18:36:56 +00:00

0s

common : dedup preset and cached model entries in /v1/models (#25131)

python-type-check.yml #406 -Commit 6f4f53f2b7 pushed by vrr

upstream

2026-06-30 23:40:29 +00:00

0s

common : dedup preset and cached model entries in /v1/models (#25131)

pre-tokenizer-hashes.yml #405 -Commit 6f4f53f2b7 pushed by vrr

upstream

2026-06-30 23:40:29 +00:00

0s

common : remove unused regex-partial (#25118)

python-type-check.yml #404 -Commit 277a105dc8 pushed by vrr

upstream

2026-06-29 20:30:44 +00:00

0s

app : allow --version, --licenses & --help (#25054)

python-type-check.yml #403 -Commit 050ee92d04 pushed by vrr

upstream

2026-06-28 05:40:31 +00:00

0s

vulkan: allow reducing the graph submission batches to avoid timeouts (#24872)

python-type-check.yml #402 -Commit 51eae8cfca pushed by vrr

upstream

2026-06-27 03:19:34 +00:00

0s

server: fix edit_file crash on append at end of file (line_start -1) (#24893)

python-type-check.yml #401 -Commit d0f9d2e5ac pushed by vrr

upstream

2026-06-23 11:28:40 +00:00

0s

server: fix edit_file crash on append at end of file (line_start -1) (#24893)

pre-tokenizer-hashes.yml #400 -Commit d0f9d2e5ac pushed by vrr

upstream

2026-06-23 11:28:40 +00:00

0s

ggml-webgpu: add adapter toggles for F16 on Vulkan + NVIDIA

python-type-check.yml #399 -Commit f449e05537 pushed by vrr

upstream

2026-06-21 11:28:40 +00:00

0s

[SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO (#24719)

update-ops-docs.yml #398 -Commit 9724f664e8 pushed by vrr

upstream

2026-06-19 17:28:41 +00:00

0s

[SYCL] rename GGML_SYCL_SUPPORT_LEVEL_ZERO (#24719)

python-type-check.yml #397 -Commit 9724f664e8 pushed by vrr

upstream

2026-06-19 17:28:41 +00:00

0s