koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 01:41:37 +00:00

Author	SHA1	Message	Date
Concedo	82d562ad7b	unstable merge	2025-12-28 23:03:03 +08:00
Concedo	9082403a43	disable vk events until directio pr or jeff's fix is added. (+1 squashed commits) Squashed commits: [4796db21a] disable vk events until directio pr or jeff's fix is added.	2025-12-28 21:54:25 +08:00
Concedo	a94d5ffbec	Revert "Triage: revert https://github.com/ggml-org/llama.cpp/pull/18047 and https://github.com/ggml-org/llama.cpp/pull/18302 " This reverts commit `dfa1b72d2f`.	2025-12-28 21:48:55 +08:00
Concedo	4c1daf886a	updated lite	2025-12-28 21:43:18 +08:00
Concedo	07fb18a04b	handle case differences	2025-12-28 21:41:56 +08:00
Concedo	46891b3c0a	updated lite	2025-12-28 18:07:13 +08:00
Concedo	21d801f6d5	init total weight for adaptive p	2025-12-28 15:33:06 +08:00
Concedo	ec95655f3c	fixed default handling for special keys	2025-12-28 13:56:05 +08:00
Concedo	27261bfc26	adaptive decay as an overridable param (+1 squashed commits) Squashed commits: [d94df7843] adaptive decay as an overridable param	2025-12-28 13:34:20 +08:00
Concedo	1051313cb2	added deprecated item sdgendefaults (+1 squashed commits) Squashed commits: [efc14a5d9] fixed sd error	2025-12-27 22:47:43 +08:00
Concedo	f5282e114d	allow ANY api field to have specified defaults, and to be overwritten by value specified at load time	2025-12-27 18:57:04 +08:00
Concedo	6548645aaa	rename power law sampler to adaptive p	2025-12-27 17:50:58 +08:00
Johannes Gäßler	9045c9afe5	llama-fit-params: fix Gemma 3 calculation (#18372 )	2025-12-27 09:56:04 +01:00
Concedo	445aad5e00	remove sdcpp qwen image lora hack	2025-12-27 16:31:29 +08:00
Wagner Bruna	84765f5967	sd: sync to master-447-ccb6b0a (#1898 ) * sd: sync to master-438-298b110 * sd: sync to master-440-3e81246 * sd: sync to master-444-a0adcfb * sd: sync to master-447-ccb6b0a	2025-12-27 16:30:52 +08:00
Concedo	9bb362cce9	revised power law sampling	2025-12-27 10:59:46 +08:00
Concedo	91d8863f18	power law sampler added	2025-12-27 09:46:06 +08:00
Jeff Bolz	c9ced4910b	vulkan: preprocess mul_mat_id experts and discard workgroups more quickly (#18352 ) Run a preprocess to count how many times each expert is used, and use this to quickly discard workgroups that aren't needed.	2025-12-26 16:12:58 -06:00
Jeff Bolz	7ac8902133	vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (#18349 ) * vulkan: Use BK=32 for coopmat2 mul_mat_id * vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader Disable robustness, remove the OOB check in decodeFuncB, and initialize the row_ids to zero to avoid OOB access. Don't slice/offset the B matrix to ic * BN, only to adjust the coord back down to the range [0, BN) in decodeFuncB. Instead just slice with a row offset of zero and remove the '& (BN - 1)'. This allows the compiler to common some of the shared memory loads.	2025-12-26 18:15:50 +01:00
Jeff Bolz	9bf20d8ac3	vulkan: Use BK=32 for coopmat2 mul_mat_id (#18332 )	2025-12-26 18:15:02 +01:00
Eve	cb999704fb	vulkan: small dequantization improvements (#18380 ) * iq4_xs * quants	2025-12-26 18:12:11 +01:00
Jeff Bolz	b96b82fc85	vulkan: Support UPSCALE w/antialias (#18327 )	2025-12-26 17:00:57 +01:00
Jeff Bolz	10dc500bdb	vulkan: handle rope with large number of rows (#18306 )	2025-12-26 16:53:46 +01:00
o7si	4893cc07bb	server : fix crash when seq_rm fails for hybrid/recurrent models (#18391 ) * server : fix crash when seq_rm fails for hybrid/recurrent models * server : add allow_processing param to clear_slot	2025-12-26 16:35:29 +01:00
Francisco Herrera	af3be131c0	docs: added note for pre SYCL Intel hardware (#18016 ) Specify that it's for pre sycl hardware	2025-12-26 10:34:30 +08:00
0Marble	b07cda687c	CANN: implement the SSM_CONV operator (#17737 ) * CANN: implement SSM_CONV operator Co-authored-by: Aleksei Lobanov, <zeromarblectm@gmail.com> Co-authored-by: Sujin Kang, <waterjin326@gmail.com> * CANN: remove custom error limit for SSM_CONV * CANN: merge SSM_CONV tensor shape/strides into one line --------- Co-authored-by: Sujin Kang, <waterjin326@gmail.com>	2025-12-26 09:12:04 +08:00
Aman Gupta	85c40c9b02	ggml-cuda: fix regex for arch list (#18371 ) * ggml-cuda: fix regex for arch list * make regex exact	2025-12-26 01:35:14 +08:00
Concedo	dfa1b72d2f	Triage: revert https://github.com/ggml-org/llama.cpp/pull/18047 and https://github.com/ggml-org/llama.cpp/pull/18302 Revert "vulkan: Implement set_tensor_async and the event interfaces (#18047)" This reverts commit `e1f15b454f`. (+1 squashed commits) Squashed commits: [3cfbc7b1a] Revert "vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (#18302)" This reverts commit `2a9ea2020c`.	2025-12-26 01:20:31 +08:00
Concedo	399fc9c57e	rename tokens tab to context, move fa to hardware	2025-12-26 00:06:07 +08:00
Aman Gupta	83b3b1c271	cuda: optimize cumsum cub path (#18362 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details * cuda: optimize cumsum cub path * remove heavy perf test	2025-12-25 23:55:38 +08:00
Concedo	062f8b28eb	fixed sdui gen queue	2025-12-25 23:21:33 +08:00
Aman Gupta	b0fb0f0aee	ggml-cuda: fix blackwell native builds (#18361 ) * ggml-cuda: fix blackwell native builds Replace 12x in native architectures by 12xa * replace for GGML_NATIVE=OFF too * only replace for native * remove 120f-virtual for default compilation --------- Co-authored-by: Aman Gupta <aman>	2025-12-25 22:12:11 +08:00
Concedo	cf4201e213	wip power law sampling	2025-12-25 22:01:16 +08:00
Penglin Cai	e68c19b0fd	CANN: Add support for CONV_TRANSPOSE_1D when kernel size > 255 (#17934 ) * CONV_TRANSPOSE_1D kernel_size>255 * remove condition check * fix the bug of type conversion * removing trailing whitespaces * fix: return true in the switch case	2025-12-25 16:46:09 +08:00
Aadeshveer Singh	c54bba869d	ggml : optimize cuda cumsum fallback kernel (#18343 )	2025-12-25 12:11:13 +08:00
Xuan-Son Nguyen	f5acfb2ffa	server: (router) add stop-timeout option (#18350 ) * server: (router) add stop-timeout option * also allow stop while loading * add docs * unload_lru: also wait for unload to complete	2025-12-24 23:47:49 +01:00
Xuan-Son Nguyen	4cbafad4f0	model: support MiMo-V2-Flash (#18328 ) * mimov2: convert ok * rename mimov2 --> mimo2 * fix conversion * runnable not incorrect * use sink * add_sliding_window_pattern * add swa and per-layer n_head_kv * correct params * somewhat working * correct gating func * nits * mimo2: wire RMS eps + MoE bias + converter guards * add co-author Co-authored-by: Aaryan-Kapoor <Aaryan-Kapoor@users.noreply.github.com> * use add_rope_freq_base_swa --------- Co-authored-by: Aaryan Kapoor <aaryankapoor2006@gmail.com> Co-authored-by: Aaryan-Kapoor <Aaryan-Kapoor@users.noreply.github.com>	2025-12-24 23:07:08 +01:00
Concedo	6cc71db85a	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/backend/SYCL.md # examples/model-conversion/Makefile # examples/model-conversion/scripts/causal/run-org-model.py # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cuda/CMakeLists.txt	2025-12-25 00:06:27 +08:00
Concedo	3589a5e136	Merge commit '`12ee1763a6`' into concedo_experimental # Conflicts: # docs/backend/hexagon/README.md # docs/backend/hexagon/developer.md # examples/gen-docs/gen-docs.cpp # examples/model-conversion/scripts/embedding/run-original-model.py # examples/model-conversion/scripts/utils/semantic_check.py # examples/sycl/run-llama2.sh # examples/sycl/run-llama3.sh # examples/sycl/win-run-llama2.bat # examples/sycl/win-run-llama3.bat # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp-utils.h # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/htp-dma.c # ggml/src/ggml-hexagon/htp/htp-dma.h # ggml/src/ggml-hexagon/htp/hvx-utils.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-opencl/kernels/transpose.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/snapdragon/adb/run-cli.sh # src/CMakeLists.txt # tests/test-backend-ops.cpp # tools/cli/README.md # tools/completion/README.md # tools/server/README.md	2025-12-24 23:57:41 +08:00
Concedo	afe41b6eea	Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental	2025-12-24 23:42:52 +08:00
Concedo	d1983959d2	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/release.yml # AGENTS.md # common/CMakeLists.txt # docs/development/parsing.md # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-vulkan/ggml-vulkan.cpp # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp # tests/test-grammar-llguidance.cpp # tests/test-tokenizer-0.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-spm.cpp # tools/batched-bench/batched-bench.cpp # tools/cli/cli.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-12-24 23:42:28 +08:00
Aadeshveer Singh	c184284230	fit-params : fix race condition in fit-params output (#18276 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details	2025-12-24 15:57:38 +01:00
Aman Gupta	c8a2417d7b	CUDA: experimental native mxfp4 support for blackwell (#17906 ) * CUDA: experimental native mxfp4 support for blackwell * optimize load_tiles * optimize quantize_mxfp4 * cleanup * first pass review: formatting * use interleaved layout for mma * mmq: add assert for size * use __nv_fp4x4_e2m1 * use iter_k as 512, cleanup * Use 1200 as blackwell instead of 1000 * address review comments * mmq: fix stride * quantize.cu: use reference impl of e8m0 scale * address review comments * add 120f-virtual + minor fixes --------- Co-authored-by: Aman Gupta <aman>	2025-12-24 22:28:26 +08:00
Wagner Bruna	f30da43b7f	sd: get the available schedulers directly from sd.cpp (#1900 ) Avoids a hardcoded list on the Python side.	2025-12-24 21:55:24 +08:00
Saba Fallah	54132f1b1f	model : support for LlamaBidirectionalModel architecture (#18220 ) * model: llama-embed-nemotron * minor: python lint * changed arch-name * templated llm_build_llama to be used for both llama and llama-embed arch	2025-12-24 14:02:36 +01:00
Jeff Bolz	2a9ea2020c	vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (#18302 )	2025-12-24 12:36:34 +01:00
Concedo	26d89bf589	support for downloading AVI from sdui	2025-12-24 18:40:10 +08:00
Wang Weixuan	ce7a6dc0fc	CANN : refactor ACL graph cache (#17752 ) Move the graph property checking code into methods of LRU cache. Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>	2025-12-24 17:50:24 +08:00
Jesse Ikonen	1ce0126b18	docs: Fix typos in SYCL documentation (#18269 )	2025-12-24 17:19:47 +08:00
Ruben Ortlam	7f459c98e7	vulkan: use fewer FA rows for small cache runs (#18280 )	2025-12-24 08:59:14 +01:00

1 2 3 4 5 ...

11022 commits