koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 20:31:01 +00:00

Author	SHA1	Message	Date
Concedo	76ef726ec8	adaptive p sharpness to 10.0f	2025-12-31 17:28:30 +08:00
Concedo	20ea081594	updated lite (+3 squashed commit) Squashed commit: [605fef9ca] updated lite [dad606fad] updated sdui [22246d7eb] updated lite	2025-12-30 22:38:56 +08:00
Concedo	329c0e7e32	mini qol to prevent fake tool calls	2025-12-29 17:54:27 +08:00
Concedo	0e26e4d354	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/011-bug-results.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp	2025-12-28 23:47:55 +08:00
Concedo	58d8635827	fixed autofit	2025-12-28 23:15:06 +08:00
Concedo	82d562ad7b	unstable merge	2025-12-28 23:03:03 +08:00
Concedo	9082403a43	disable vk events until directio pr or jeff's fix is added. (+1 squashed commits) Squashed commits: [4796db21a] disable vk events until directio pr or jeff's fix is added.	2025-12-28 21:54:25 +08:00
Concedo	a94d5ffbec	Revert "Triage: revert https://github.com/ggml-org/llama.cpp/pull/18047 and https://github.com/ggml-org/llama.cpp/pull/18302 " This reverts commit `dfa1b72d2f`.	2025-12-28 21:48:55 +08:00
Concedo	4c1daf886a	updated lite	2025-12-28 21:43:18 +08:00
Concedo	07fb18a04b	handle case differences	2025-12-28 21:41:56 +08:00
Aman Gupta	07a0c4ba92	Revert "ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATIVE=ON (#18413 )" (#18426 )	2025-12-28 20:53:36 +08:00
o7si	60f17f56da	rpc: fix segfault on invalid endpoint format (#18387 ) * rpc: fix segfault on invalid endpoint format * rpc: add error log for failed endpoint connection	2025-12-28 12:34:41 +02:00
Concedo	46891b3c0a	updated lite	2025-12-28 18:07:13 +08:00
Johannes Gäßler	f8d561eb87	llama-fit-params: fix step size for last device (#18415 )	2025-12-28 10:52:09 +01:00
Johannes Gäßler	e59efe6a78	github: update issue templates [no ci] (#18410 ) * github: update issue templates [no ci] * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-28 10:50:56 +01:00
Xuan-Son Nguyen	cffa5c46ea	mtmd: clarify that we no longer accept AI-generated PRs (#18406 )	2025-12-28 09:57:04 +01:00
Boian Berberov	94de74e7b1	cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` (#18186 ) * minor: Consolidated `#include <immintrin.h>` under `ggml-cpu-impl.h` * cmake: Added more x86-64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` - `ivybridge` - `piledriver` - `cannonlake` - `cascadelake` - `cooperlake` - `zen4` Resolves: #17966	2025-12-28 09:33:29 +02:00
Concedo	21d801f6d5	init total weight for adaptive p	2025-12-28 15:33:06 +08:00
Concedo	ec95655f3c	fixed default handling for special keys	2025-12-28 13:56:05 +08:00
Concedo	27261bfc26	adaptive decay as an overridable param (+1 squashed commits) Squashed commits: [d94df7843] adaptive decay as an overridable param	2025-12-28 13:34:20 +08:00
QDelta	4fd59e8427	ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATIVE=ON (#18413 )	2025-12-28 09:33:14 +08:00
lhez	08566977a7	opencl: allow resizing transpose buffers (#18384 ) * opencl: allow resizing transpose buffers instead of using fixed sizes * opencl: remove commented code	2025-12-27 15:51:14 -08:00
Johannes Gäßler	a4bf35889e	llama-fit-params: fix overflow check (#18354 )	2025-12-27 20:20:45 +01:00
Johannes Gäßler	026d2ad472	llama: fix magic number of 999 for GPU layers (#18266 ) * llama: fix magic number of 999 for GPU layers * use strings for -ngl, -ngld * enacapsulate n_gpu_layers, split_mode	2025-12-27 20:18:35 +01:00
Concedo	1051313cb2	added deprecated item sdgendefaults (+1 squashed commits) Squashed commits: [efc14a5d9] fixed sd error	2025-12-27 22:47:43 +08:00
Aman Gupta	06705fdcb3	ggml-cuda: Use same regex for GGML_NATIVE=OFF (#18407 )	2025-12-27 19:56:27 +08:00
Concedo	f5282e114d	allow ANY api field to have specified defaults, and to be overwritten by value specified at load time	2025-12-27 18:57:04 +08:00
Concedo	6548645aaa	rename power law sampler to adaptive p	2025-12-27 17:50:58 +08:00
Johannes Gäßler	a52dc60ba3	llama_fit_params: return enum for fail vs. error (#18374 )	2025-12-27 09:59:19 +01:00
Johannes Gäßler	9045c9afe5	llama-fit-params: fix Gemma 3 calculation (#18372 )	2025-12-27 09:56:04 +01:00
Concedo	445aad5e00	remove sdcpp qwen image lora hack	2025-12-27 16:31:29 +08:00
Wagner Bruna	84765f5967	sd: sync to master-447-ccb6b0a (#1898 ) * sd: sync to master-438-298b110 * sd: sync to master-440-3e81246 * sd: sync to master-444-a0adcfb * sd: sync to master-447-ccb6b0a	2025-12-27 16:30:52 +08:00
Concedo	9bb362cce9	revised power law sampling	2025-12-27 10:59:46 +08:00
Concedo	91d8863f18	power law sampler added	2025-12-27 09:46:06 +08:00
Jeff Bolz	c9ced4910b	vulkan: preprocess mul_mat_id experts and discard workgroups more quickly (#18352 ) Run a preprocess to count how many times each expert is used, and use this to quickly discard workgroups that aren't needed.	2025-12-26 16:12:58 -06:00
Jeff Bolz	7ac8902133	vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (#18349 ) * vulkan: Use BK=32 for coopmat2 mul_mat_id * vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader Disable robustness, remove the OOB check in decodeFuncB, and initialize the row_ids to zero to avoid OOB access. Don't slice/offset the B matrix to ic * BN, only to adjust the coord back down to the range [0, BN) in decodeFuncB. Instead just slice with a row offset of zero and remove the '& (BN - 1)'. This allows the compiler to common some of the shared memory loads.	2025-12-26 18:15:50 +01:00
Jeff Bolz	9bf20d8ac3	vulkan: Use BK=32 for coopmat2 mul_mat_id (#18332 )	2025-12-26 18:15:02 +01:00
Eve	cb999704fb	vulkan: small dequantization improvements (#18380 ) * iq4_xs * quants	2025-12-26 18:12:11 +01:00
Jeff Bolz	b96b82fc85	vulkan: Support UPSCALE w/antialias (#18327 )	2025-12-26 17:00:57 +01:00
Jeff Bolz	10dc500bdb	vulkan: handle rope with large number of rows (#18306 )	2025-12-26 16:53:46 +01:00
o7si	4893cc07bb	server : fix crash when seq_rm fails for hybrid/recurrent models (#18391 ) * server : fix crash when seq_rm fails for hybrid/recurrent models * server : add allow_processing param to clear_slot	2025-12-26 16:35:29 +01:00
Francisco Herrera	af3be131c0	docs: added note for pre SYCL Intel hardware (#18016 ) Specify that it's for pre sycl hardware	2025-12-26 10:34:30 +08:00
0Marble	b07cda687c	CANN: implement the SSM_CONV operator (#17737 ) * CANN: implement SSM_CONV operator Co-authored-by: Aleksei Lobanov, <zeromarblectm@gmail.com> Co-authored-by: Sujin Kang, <waterjin326@gmail.com> * CANN: remove custom error limit for SSM_CONV * CANN: merge SSM_CONV tensor shape/strides into one line --------- Co-authored-by: Sujin Kang, <waterjin326@gmail.com>	2025-12-26 09:12:04 +08:00
Aman Gupta	85c40c9b02	ggml-cuda: fix regex for arch list (#18371 ) * ggml-cuda: fix regex for arch list * make regex exact	2025-12-26 01:35:14 +08:00
Concedo	dfa1b72d2f	Triage: revert https://github.com/ggml-org/llama.cpp/pull/18047 and https://github.com/ggml-org/llama.cpp/pull/18302 Revert "vulkan: Implement set_tensor_async and the event interfaces (#18047)" This reverts commit `e1f15b454f`. (+1 squashed commits) Squashed commits: [3cfbc7b1a] Revert "vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (#18302)" This reverts commit `2a9ea2020c`.	2025-12-26 01:20:31 +08:00
Concedo	399fc9c57e	rename tokens tab to context, move fa to hardware	2025-12-26 00:06:07 +08:00
Aman Gupta	83b3b1c271	cuda: optimize cumsum cub path (#18362 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details * cuda: optimize cumsum cub path * remove heavy perf test	2025-12-25 23:55:38 +08:00
Concedo	062f8b28eb	fixed sdui gen queue	2025-12-25 23:21:33 +08:00
Aman Gupta	b0fb0f0aee	ggml-cuda: fix blackwell native builds (#18361 ) * ggml-cuda: fix blackwell native builds Replace 12x in native architectures by 12xa * replace for GGML_NATIVE=OFF too * only replace for native * remove 120f-virtual for default compilation --------- Co-authored-by: Aman Gupta <aman>	2025-12-25 22:12:11 +08:00
Concedo	cf4201e213	wip power law sampling	2025-12-25 22:01:16 +08:00

1 2 3 4 5 ...

11039 commits