koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-09 11:00:40 +00:00

Author	SHA1	Message	Date
Concedo	d9e898afe0	reset scheduler if default otherwise it will persist the old one	2025-10-19 19:53:40 +08:00
Concedo	7d20e6bdb3	updated layer count to be more accurate +1 instead of +3	2025-10-18 15:29:07 +08:00
Concedo	f47a0690ac	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/ops.md # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tools/rpc/rpc-server.cpp	2025-10-18 11:10:37 +08:00
Concedo	85556118b5	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/presets.hpp	2025-10-18 10:56:55 +08:00
Shawn Gu	81387858f1	opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602 ) * opencl: transposed gemm/gemv moe kernel with mxfp4,f32 * add restore kernel for moe transpose * fix trailing whitespaces * resolve compilation warnings	2025-10-17 17:55:32 -07:00
Johannes Gäßler	66b0dbcb2d	llama-model: fix insonsistent ctxs <-> bufs order (#16581 )	2025-10-17 17:41:09 +02:00
Radoslav Gerganov	41386cf365	rpc : report actual free memory (#16616 ) * rpc : report actual free memory Start reporting the free memory on every device instead of using fixed values. Now llama-cli users can get a nice memory breakdown when using RPC devices. * drop --mem in rpc-server	2025-10-17 18:02:52 +03:00
Giuseppe Scrivano	3d4e86bbeb	vulkan: Add State Space Model (SSM) Operations Support (#16463 ) * vulkan: implement SSM scan operation Add State Space Model scan operation to the Vulkan backend. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> * vulkan: implement SSM conv operation Add State Space Model conv operation to the Vulkan backend. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-10-17 14:23:47 +02:00
muggle-stack	342c728d03	ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629 ) Fix incorrect task-to-batch index calculation in the quantization phase. The bug caused out-of-bounds access to qnbitgemm_args array when compute_idx exceeded per_gemm_block_count_m, leading to invalid pointer dereferences and SIGBUS errors. Correctly map tasks to batches by dividing compute_idx by per_gemm_block_count_m instead of block_size_m. Example: batch_feature=1, gemm_m=30, block_size_m=4 per_gemm_block_count_m = 8, task_count = 8 Old: gemm_idx = 4/4 = 1 (out of bounds New: gemm_idx = 4/8 = 0 (correct) Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model. Co-authored-by: muggle <mingjun.rong@spacemit.com>	2025-10-17 13:01:23 +03:00
Pascal	ababae7e1e	webui: reorganize settings layout (#16607 ) * webui: reorganize settings layout * chore: update webui build output * fix: remove unused variable * chore: update webui build output	2025-10-17 10:35:03 +02:00
Jeff Bolz	b19491599d	vulkan: fix debug build (add_rms_len/data not found) (#16624 )	2025-10-17 09:31:04 +02:00
Ilia Ilmer	9ad4f1931e	metal : add `CONV_TRANSPOSE_2D` (#16542 ) * initial: headers and metal-device.cpp updates * adding conv_transpose_2d * fix type * fix type: int32->int64 * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-metal/ggml-metal.metal Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add checks for src[0] and src[1]; add type checks * Update ggml-metal.metal Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add more tests, add optimization to threading * add dynamic memory allocation in metal --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-17 09:33:58 +03:00
Olivier Chafik	79967ec596	grammar : use int64_t to avoid int overflows in int schema to grammar conversion logic (#16626 )	2025-10-17 08:59:31 +03:00
Concedo	f6916ba864	updated sdui	2025-10-17 13:56:45 +08:00
GittyBurstein	ceff6bb253	SYCL SET operator optimized for F32 tensors (#16350 ) * SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes * sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups * move SET op to standalone file, GPU-only implementation * Update SYCL SET operator for F32 * ci: fix editorconfig issues (LF endings, trailing spaces, final newline) * fixed ggml-sycl.cpp --------- Co-authored-by: Gitty Burstein <gitty@example.com>	2025-10-17 10:36:40 +08:00
Xuan-Son Nguyen	1bb4f43380	mtmd : support home-cooked Mistral Small Omni (#14928 )	2025-10-16 19:00:31 +02:00
Pascal	683fa6ba4e	fix: added a normalization step for MathJax-style \[\] and \(\) delimiters (#16599 ) * fix: added a normalization step for MathJax-style \[\] and \(\) delimiters So inline and block equations are converted before KaTeX rendering, enabling proper display of model-generated LaTeX in the WebUI * chore: update webui build output	2025-10-16 16:28:41 +02:00
GittyBurstein	b22572e97d	sycl : add ARANGE operator (#16362 ) * SYCL: update element-wise ops and presets * clean arange * Re-trigger CI --------- Co-authored-by: Gitty Burstein <gitty@example.com>	2025-10-16 15:26:21 +02:00
Chenguang Li	7a50cf388a	CANN: format code using .clang-format (#15863 ) This commit applies .clang-format rules to all source files under the ggml-cann directory to ensure consistent coding style and readability. The .clang-format option `SortIncludes: false` has been set to disable automatic reordering of include directives. No functional changes are introduced. Co-authored-by: hipudding <huafengchun@gmail.com>	2025-10-16 16:41:11 +08:00
Concedo	45a02ae534	rename blas to just batching	2025-10-16 16:27:51 +08:00
Concedo	48cd70b14b	updated lite	2025-10-16 14:30:09 +08:00
Concedo	e3ee55a1d6	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/ops.md # docs/ops/CPU.csv	2025-10-16 14:29:47 +08:00
Concedo	c18d7991c8	rename tts.cpp ggml_round to ggml_ttsround to avoid conflict	2025-10-16 14:24:44 +08:00
takasurazeem	6f5d924637	common : Update the docs on -t --threads (#16236 ) * Update the docs on -t --threads * Revert "Update the docs on -t --threads" This reverts commit eba97345e2c88d8ca510abec87d00bf6b9b0e0c2. * docs: clarify -t/--threads parameter uses CPU threads and defaults to all available cores * Update arg.cpp	2025-10-16 08:11:33 +03:00
takuya kodama	adc9b60f19	ggml-cpu: replace putenv with setenv for const-correctness (#16573 ) ## Why it failed When compiling with strict compiler flags (-Wwrite-strings -Werror=discarded-qualifiers), the build fails with the following error: ``` cmake \ -S . \ -B ../llama.cpp.build \ --preset=x64-linux-gcc-debug \ -DCMAKE_INSTALL_PREFIX=/tmp/local \ -DCMAKE_C_FLAGS="-Wwrite-strings -Werror=discarded-qualifiers" && \ cmake --build ../llama.cpp.build/ ... /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c: In function ‘ggml_cpu_init’: /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:3572:24: error: passing argument 1 of ‘putenv’ discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers] 3572 \| putenv("KMP_BLOCKTIME=200"); // 200ms \| ^~~~~~~~~~~~~~~~~~~ In file included from /home/otegami/work/cpp/llama.cpp/ggml/src/./ggml-impl.h:10, from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu-impl.h:6, from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/traits.h:3, from /home/otegami/work/cpp/llama.cpp/ggml/src/ggml-cpu/ggml-cpu.c:6: /usr/include/stdlib.h:786:26: note: expected ‘char ’ but argument is of type ‘const char ’ 786 \| extern int putenv (char __string) __THROW __nonnull ((1)); \| ~~~~~~^~~~~~~~ cc1: some warnings being treated as errors ninja: build stopped: subcommand failed. ``` The issue is that putenv() expects a non-const char but receives a string literal (const char ). ## How to fix This PR replaces putenv("KMP_BLOCKTIME=200") with setenv("KMP_BLOCKTIME", "200", 0). Benefits of setenv(): - Accepts const char parameters (no qualifier warnings) - Makes copies of the strings (safer memory handling) - The third parameter (0) ensures we don't overwrite if already set	2025-10-16 08:10:32 +03:00
yael-works	ee50ee1ead	SYCL: Add GGML_OP_MEAN operator support (#16009 ) * SYCL: Add GGML_OP_MEAN operator support * SYCL: Fix formatting for GGML_OP_MEAN case * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-16 12:21:28 +08:00
Concedo	ebc1cb0641	before merging conflicting round	2025-10-16 12:15:44 +08:00
Concedo	2d22e61f3d	Merge commit '`1ee9d0b415`' into concedo_experimental # Conflicts: # tests/test-backend-ops.cpp	2025-10-16 12:09:46 +08:00
Concedo	2cee3b2055	Merge commit '`e38b7c6e9e`' into concedo_experimental	2025-10-16 12:08:03 +08:00
Concedo	f3b0ed157b	Revert "graph : support cacheless embeddings with FA and iSWA" This reverts commit `d4d465bce4`.	2025-10-16 12:07:48 +08:00
Concedo	1ff97f8a00	Merge commit '`5016b72862`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # docs/ops.md # docs/ops/SYCL.csv # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/binbcast.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-chat-parser.cpp # tests/test-json-partial.cpp	2025-10-16 12:05:21 +08:00
Aleksei Nikiforov	7adc79c032	gguf-py : add support for endian conversion of BF16 data (#16594 ) Some checks failed Python Type-Check / pyright type-check (push) Has been cancelled Details Update Operations Documentation / update-ops-docs (push) Has been cancelled Details BF16 requires special handling in this script while it's a 2-bytes data, but view is 1-byte by default. Switch to correct view before attempting byteswapping. With this change correctly byteswapping models like Meta-Llama-3-8B-Instruct-bf16-GGUF should be possible.	2025-10-15 22:43:08 +02:00
safranowith	466c1911ab	cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (#16083 ) * CPU: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators - Added the operators to unary op enum - Implemented API functions - Implemented forward and unary-op logic in CPU backend - Updated ggml_get_n_tasks - Updated operators names array and static_assert - Updated docs and enabled automatic tests * docs: add documentation for ggml_trunc and ggml_trunc_inplace in ggml.h * chore: remove trailing whitespace from ggml.h * Remove unresolved merge markers * Apply review suggestions: cleanup formatting, enum order and leftover artifacts * Regenerate ops.md using create_ops_docs.py	2025-10-15 21:24:51 +02:00
Concedo	4eaf05dfeb	handle oai without v1 prefix	2025-10-16 02:16:49 +08:00
lhez	0cb7a0683b	opencl: add q8_0 mm support (#16469 ) * opencl: add mm_q8_0_f32 * opencl: fix data loading for incomplete tile * opencl: use q8_0 mm for larger matrix * opencl: add some tests to cover the path	2025-10-15 10:51:04 -07:00
lhez	d93f8439b0	opencl: fix FA for f32 (#16584 )	2025-10-15 10:48:28 -07:00
Concedo	dfeccea3a1	added shitty fractional scaling support for GNOME. but really just use KDE	2025-10-15 22:28:04 +08:00
Aleksander Grygier	f9fb33f263	Add server-driven parameter defaults and syncing (#16515 )	2025-10-15 16:22:20 +02:00
Sam/Samuel	f4ce81c45e	metal: optimise `GGML_OP_SUM` (#16559 ) * optimise GGML_OP_SUM * add non-contiguous tests by permuting the input * change tests to require full contiguity of OP_SUM * cuda : add check GGML_OP_SUM --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-15 17:05:56 +03:00
Georgi Gerganov	17304cbcc1	server : fix img token logs (#16595 )	2025-10-15 16:53:12 +03:00
Xuan-Son Nguyen	3e3cb19f64	llama-quant: add support for mmproj (#16592 ) * llama-quant: add support for mmproj * Update src/llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * check prefix instead * small fix --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-15 14:48:08 +02:00
Julius Tischbein	5acd455460	CUDA: Changing the CUDA scheduling strategy to spin (#16585 ) * CUDA set scheduling strategy to spinning for cc121 * Using prop.major and prop.minor, include HIP and MUSA * Exclude HIP and MUSA * Remove trailing whitespace Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Remove empty line Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-10-15 14:54:15 +03:00
Georgi Gerganov	554fd578a5	server : fix mtmd checkpoints (#16591 )	2025-10-15 11:51:27 +02:00
Concedo	5207b8d4be	more sd path fallbacks	2025-10-15 15:22:06 +08:00
Concedo	610ba18971	sdcpp precision fix	2025-10-15 11:08:35 +08:00
Georgi Gerganov	fa882fd2b1	metal : avoid using Metal's gpuAddress property (#16576 ) * metal : avoid using Metal's gpuAddress property * metal : fix rope kernels buffer check	2025-10-14 20:33:05 +03:00
SavicStefan	ffa059034c	vulkan: Add ACC_TYPE_VEC2 implementation (#16203 ) Signed-off-by: Stefan Savic <stefan.savic@huawei.com> Co-authored-by: Stefan Savic <stefan.savic@huawei.com>	2025-10-14 19:18:05 +02:00
Aman Gupta	120bf7046d	CUDA + openCL: fix bug in accessing rms_norm->src while doing fusion (#16577 )	2025-10-14 07:48:08 -07:00
Jeff Bolz	4258e0cfe7	vulkan: Support FA with K/V in F32 (#16543 )	2025-10-14 15:53:37 +02:00
Jeff Bolz	7ea15bb64c	vulkan: Improve build time for MSVC (#16545 ) Enable CMP0147 so custom build steps (invoking vulkan-shader-gen) are run in parallel. Enable /MP so source files are compiled in parallel.	2025-10-14 14:51:36 +02:00

1 2 3 4 5 ...

10000 commits