koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-12 05:52:26 +00:00

Author	SHA1	Message	Date
Diego Devesa	a2e0088d92	Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_v…" (#16723 ) This reverts commit `19a5a3edfd`.	2025-10-22 20:20:55 +02:00
sirus20x6	19a5a3edfd	ggml : Leverage the existing GGML_F32_VEC helpers to vectorize ggml_vec_set_f32 for faster fills (#16522 ) * Leverage the existing GGML_F32_VEC helpers to broadcast the fill value across SIMD registers and store in vector-sized chunks, while retaining the scalar tail for leftover elements and non-SIMD builds. * Vectorize additional f32 helper loops * Normalize f32 helper tails for ggml vec ops --------- Co-authored-by: Aaron <shelhamer.aaron@gmail.com>	2025-10-22 12:14:14 +02:00
Aman Gupta	9285325ce0	CUDA: fix bug in topk-moe softmax (#16711 )	2025-10-22 12:33:08 +08:00
Aman Gupta	03792ad936	CUDA: topk-moe: add optional parameter for gpt-oss (#16649 )	2025-10-21 22:40:38 +08:00
Johannes Gäßler	51d1a8c997	CUDA: better error for FA kernel with 0 occupancy (#16643 )	2025-10-21 15:27:53 +02:00
Aman Gupta	4926419c4d	ggml: add ggml_can_fuse_subgraph (#16662 ) * ggml: add ggml_can_fuse_subgraph * ggml-cuda: use ggml_can_fuse_subgraph for topk-moe * format * 1. remove inputs from signature as they are transient nodes 2. add check for views: view_src should be part of the subgraph * - combine check into one loop - check all view_src parents - other minor review comments * remove redudant if test * - rename and other minor review comments * add assert about count < 32	2025-10-21 16:43:14 +08:00
lhez	6ea37f5739	opencl: fix warnings and clean up profiling (#16688 ) * opencl: remove unused headers, fix warnings * opencl: clean up profiling, only keep kernel time	2025-10-20 22:26:17 -07:00
Jeff Bolz	fb349848f3	vulkan: Handle FA with all -inf mask values (#16447 )	2025-10-20 22:16:08 -05:00
YehuditE	6de8ed7519	sycl : add PAD_REFLECT_D1 operator support (#16145 ) * sycl: add PAD_REFLECT_D1 operator support * docs(ops): regenerate docs/ops.md * remove trailing whitespaces * style: fix editorconfig issues — trim trailing spaces and normalize EOLs * fix: move PAD_REFLECT_1D case outside of fall-through block	2025-10-21 00:21:12 +02:00
Diego Devesa	b617cfd289	ggml-alloc : fix leak when reusing a tensor with a larger size (#16679 )	2025-10-20 14:53:50 +02:00
safranowith	2330de7b84	SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (#16613 ) * SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators Clean up unrelated changes from previous commit * Chore: remove empty lines and fix indentation * Clean up: remove leftover blank lines and fix spacing * chore: fix trailing whitespace and ensure final newline * Cleanup: remove redundant declarations already defined in header * Sync docs/ops.md with updated backend operation support * docs: update ops.md after rebase * docs: update ops.md - Vulkan supports SSM_CONV and SSM_SCAN	2025-10-20 11:08:32 +03:00
Aaron Teo	4f73d0a951	ci : fix binaries release failure for s390x (binaries may not work yet) (#16664 ) * devops: initial patch Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: forgot the z15 suffix Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: attempt at impl GGML_CPU_ALL_VARIANTS for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: rm baseline version Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-10-19 23:06:39 +02:00
Concedo	5b6ed8b057	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/release.yml # CODEOWNERS # ci/run.sh # ggml/src/ggml-hip/CMakeLists.txt	2025-10-19 19:54:19 +08:00
Johannes Gäßler	ee09828cb0	HIP: fix GPU_TARGETS (#16642 )	2025-10-18 14:47:32 +02:00
Jeff Bolz	e56abd2098	vulkan: Implement topk_moe fused shader, ported from CUDA (#16641 ) This is similar to the CUDA shader from #16130, but doesn't use shared memory and handles different subgroup sizes.	2025-10-18 12:22:57 +02:00
Aman Gupta	38355c6c8e	CUDA: use registers instead of smem in topk-moe (#16647 ) Uses the technique used in the vulkan PR #16641. Neat trick!	2025-10-18 11:52:53 +02:00
Concedo	f47a0690ac	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/ops.md # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tools/rpc/rpc-server.cpp	2025-10-18 11:10:37 +08:00
Concedo	85556118b5	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/presets.hpp	2025-10-18 10:56:55 +08:00
Shawn Gu	81387858f1	opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602 ) * opencl: transposed gemm/gemv moe kernel with mxfp4,f32 * add restore kernel for moe transpose * fix trailing whitespaces * resolve compilation warnings	2025-10-17 17:55:32 -07:00
Radoslav Gerganov	41386cf365	rpc : report actual free memory (#16616 ) * rpc : report actual free memory Start reporting the free memory on every device instead of using fixed values. Now llama-cli users can get a nice memory breakdown when using RPC devices. * drop --mem in rpc-server	2025-10-17 18:02:52 +03:00
Giuseppe Scrivano	3d4e86bbeb	vulkan: Add State Space Model (SSM) Operations Support (#16463 ) * vulkan: implement SSM scan operation Add State Space Model scan operation to the Vulkan backend. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> * vulkan: implement SSM conv operation Add State Space Model conv operation to the Vulkan backend. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-10-17 14:23:47 +02:00
muggle-stack	342c728d03	ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629 ) Fix incorrect task-to-batch index calculation in the quantization phase. The bug caused out-of-bounds access to qnbitgemm_args array when compute_idx exceeded per_gemm_block_count_m, leading to invalid pointer dereferences and SIGBUS errors. Correctly map tasks to batches by dividing compute_idx by per_gemm_block_count_m instead of block_size_m. Example: batch_feature=1, gemm_m=30, block_size_m=4 per_gemm_block_count_m = 8, task_count = 8 Old: gemm_idx = 4/4 = 1 (out of bounds New: gemm_idx = 4/8 = 0 (correct) Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model. Co-authored-by: muggle <mingjun.rong@spacemit.com>	2025-10-17 13:01:23 +03:00
Jeff Bolz	b19491599d	vulkan: fix debug build (add_rms_len/data not found) (#16624 )	2025-10-17 09:31:04 +02:00
Ilia Ilmer	9ad4f1931e	metal : add `CONV_TRANSPOSE_2D` (#16542 ) * initial: headers and metal-device.cpp updates * adding conv_transpose_2d * fix type * fix type: int32->int64 * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add checks for src[0] and src[1]; add type checks * Update ggml-metal.metal Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add more tests, add optimization to threading * add dynamic memory allocation in metal --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-17 09:33:58 +03:00
GittyBurstein	ceff6bb253	SYCL SET operator optimized for F32 tensors (#16350 ) * SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes * sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups * move SET op to standalone file, GPU-only implementation * Update SYCL SET operator for F32 * ci: fix editorconfig issues (LF endings, trailing spaces, final newline) * fixed ggml-sycl.cpp --------- Co-authored-by: Gitty Burstein <gitty@example.com>	2025-10-17 10:36:40 +08:00
GittyBurstein	b22572e97d	sycl : add ARANGE operator (#16362 ) * SYCL: update element-wise ops and presets * clean arange * Re-trigger CI --------- Co-authored-by: Gitty Burstein <gitty@example.com>	2025-10-16 15:26:21 +02:00
Chenguang Li	7a50cf388a	CANN: format code using .clang-format (#15863 ) This commit applies .clang-format rules to all source files under the ggml-cann directory to ensure consistent coding style and readability. The .clang-format option `SortIncludes: false` has been set to disable automatic reordering of include directives. No functional changes are introduced. Co-authored-by: hipudding <huafengchun@gmail.com>	2025-10-16 16:41:11 +08:00
Concedo	e3ee55a1d6	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/ops.md # docs/ops/CPU.csv	2025-10-16 14:29:47 +08:00
Concedo	c18d7991c8	rename tts.cpp ggml_round to ggml_ttsround to avoid conflict	2025-10-16 14:24:44 +08:00
takuya kodama	adc9b60f19	ggml-cpu: replace putenv with setenv for const-correctness (#16573 ) ## Why it failed When compiling with strict compiler flags (-Wwrite-strings -Werror=discarded-qualifiers), the build fails with the following error: ``` cmake \ -S . \ -B ../llama.cpp.build \ --preset=x64-linux-gcc-debug \ -DCMAKE_INSTALL_PREFIX=/tmp/local \ -DCMAKE_C_FLAGS="-Wwrite-strings -Werror=discarded-qualifiers" && \ cmake --build ../llama.cpp.build/ ... /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: In function ‘ggml_cpu_init’: /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3572:24: error: passing argument 1 of ‘putenv’ discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers] 3572 \| putenv("KMP_BLOCKTIME=200"); // 200ms \| ^~~~~~~~~~~~~~~~~~~ In file included from /home/otegami/work/cpp/llama.cpp/ggml/src/./ggml-impl.h:10, from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h:6, from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/traits.h:3, from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:6: /usr/include/stdlib.h:786:26: note: expected ‘char ’ but argument is of type ‘const char ’ 786 \| extern int putenv (char __string) __THROW __nonnull ((1)); \| ~~~~~~^~~~~~~~ cc1: some warnings being treated as errors ninja: build stopped: subcommand failed. ``` The issue is that putenv() expects a non-const char but receives a string literal (const char ). ## How to fix This PR replaces putenv("KMP_BLOCKTIME=200") with setenv("KMP_BLOCKTIME", "200", 0). Benefits of setenv(): - Accepts const char parameters (no qualifier warnings) - Makes copies of the strings (safer memory handling) - The third parameter (0) ensures we don't overwrite if already set	2025-10-16 08:10:32 +03:00
yael-works	ee50ee1ead	SYCL: Add GGML_OP_MEAN operator support (#16009 ) * SYCL: Add GGML_OP_MEAN operator support * SYCL: Fix formatting for GGML_OP_MEAN case * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-16 12:21:28 +08:00
Concedo	ebc1cb0641	before merging conflicting round	2025-10-16 12:15:44 +08:00
Concedo	2d22e61f3d	Merge commit '`1ee9d0b415`' into concedo_experimental # Conflicts: # tests/test-backend-ops.cpp	2025-10-16 12:09:46 +08:00
Concedo	1ff97f8a00	Merge commit '`5016b72862`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # docs/ops.md # docs/ops/SYCL.csv # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/binbcast.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-chat-parser.cpp # tests/test-json-partial.cpp	2025-10-16 12:05:21 +08:00
safranowith	466c1911ab	cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083 ) * CPU: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators - Added the operators to unary op enum - Implemented API functions - Implemented forward and unary-op logic in CPU backend - Updated ggml_get_n_tasks - Updated operators names array and static_assert - Updated docs and enabled automatic tests * docs: add documentation for ggml_trunc and ggml_trunc_inplace in ggml.h * chore: remove trailing whitespace from ggml.h * Remove unresolved merge markers * Apply review suggestions: cleanup formatting, enum order and leftover artifacts * Regenerate ops.md using create_ops_docs.py	2025-10-15 21:24:51 +02:00
lhez	0cb7a0683b	opencl: add q8_0 mm support (#16469 ) * opencl: add mm_q8_0_f32 * opencl: fix data loading for incomplete tile * opencl: use q8_0 mm for larger matrix * opencl: add some tests to cover the path	2025-10-15 10:51:04 -07:00
lhez	d93f8439b0	opencl: fix FA for f32 (#16584 )	2025-10-15 10:48:28 -07:00
Sam/Samuel	f4ce81c45e	metal: optimise `GGML_OP_SUM` (#16559 ) * optimise GGML_OP_SUM * add non-contiguous tests by permuting the input * change tests to require full contiguity of OP_SUM * cuda : add check GGML_OP_SUM --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-15 17:05:56 +03:00
Julius Tischbein	5acd455460	CUDA: Changing the CUDA scheduling strategy to spin (#16585 ) * CUDA set scheduling strategy to spinning for cc121 * Using prop.major and prop.minor, include HIP and MUSA * Exclude HIP and MUSA * Remove trailing whitespace Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Remove empty line Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-10-15 14:54:15 +03:00
Georgi Gerganov	fa882fd2b1	metal : avoid using Metal's gpuAddress property (#16576 ) * metal : avoid using Metal's gpuAddress property * metal : fix rope kernels buffer check	2025-10-14 20:33:05 +03:00
SavicStefan	ffa059034c	vulkan: Add ACC_TYPE_VEC2 implementation (#16203 ) Signed-off-by: Stefan Savic <stefan.savic@huawei.com> Co-authored-by: Stefan Savic <stefan.savic@huawei.com>	2025-10-14 19:18:05 +02:00
Aman Gupta	120bf7046d	CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (#16577 )	2025-10-14 07:48:08 -07:00
Jeff Bolz	4258e0cfe7	vulkan: Support FA with K/V in F32 (#16543 )	2025-10-14 15:53:37 +02:00
Jeff Bolz	7ea15bb64c	vulkan: Improve build time for MSVC (#16545 ) Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel. Enable /MP so source files are compiled in parallel.	2025-10-14 14:51:36 +02:00
Johannes Gäßler	9c7185dd28	CUDA: enable FA for FP32 KV cache (#16546 )	2025-10-14 14:22:47 +02:00
Aman Gupta	1ee9d0b415	CUDA: use fastdiv + ggml_cuda_mad for mmvf (#16557 ) * CUDA: use fastdiv + ggml_cuda_mad for mmvf * use bf16 directly + fix formatting * Add exception for HIP code	2025-10-14 13:16:21 +02:00
Aman Gupta	48e2fa9fb7	CUDA: add fp kernel for larger batch size MoE (#16512 ) * CUDA: kernel for larger batch sizes for MoE * WIP * WIP * WIP * WIP * WIP * WIP * fixup * tests * Move mmq_ids_helper to mmid * cleanup * Remove redundant checks	2025-10-14 13:15:15 +02:00
Anav Prasad	5b6913c47b	cuda : remove legacy copy-op pointer indirection code (#16485 ) * remove legacy copy-op pointer indirection code * further removal of copy-op indirection code * renamed check_node_graph_compatibility_and_refresh_copy_ops function	2025-10-14 11:53:49 +02:00
Concedo	a8c023d906	quick test rocm	2025-10-14 17:19:16 +08:00
Georgi Gerganov	e60f241eac	metal : FA support F32 K and V and head size = 32 (#16531 ) * metal : FA support F32 K and V and head size = 32 * graph : remove obsolete comment [no ci]	2025-10-13 23:07:57 +03:00

... 5 6 7 8 9 ...

2142 commits