koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 01:41:37 +00:00

Author	SHA1	Message	Date
Concedo	746664fde6	Merge commit '`2cd20b72ed`' into concedo_experimental # Conflicts: # CONTRIBUTING.md # docs/backend/CANN.md # docs/backend/SYCL.md # docs/backend/snapdragon/README.md # docs/backend/snapdragon/windows.md # docs/build.md # docs/multimodal/MobileVLM.md # docs/ops.md # docs/ops/WebGPU.csv # examples/debug/README.md # examples/llama.vim # examples/model-conversion/README.md # examples/sycl/README.md # ggml/src/ggml-cpu/amx/mmq.cpp # ggml/src/ggml-cpu/arch/x86/repack.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp-drv.cpp # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/hvx-copy.h # ggml/src/ggml-hexagon/htp/hvx-inverse.h # ggml/src/ggml-hexagon/htp/hvx-reduce.h # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/worker-pool.c # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cpy.cl # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/quants.hpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/pr2wt.sh # scripts/server-bench.py # scripts/snapdragon/windows/run-cli.ps1 # tests/test-alloc.cpp # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/cli/cli.cpp # tools/completion/README.md # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/README.md # tools/perplexity/README.md # tools/server/public_simplechat/readme.md # tools/server/tests/README.md	2026-03-10 22:11:08 +08:00
Marcel Petrick	92f7da00b4	chore : correct typos [no ci] (#20041 ) * fix(docs): correct typos found during code review Non-functional changes only: - Fixed minor spelling mistakes in comments - Corrected typos in user-facing strings - No variables, logic, or functional code was modified. Signed-off-by: Marcel Petrick <mail@marcelpetrick.it> * Update docs/backend/CANN.md Co-authored-by: Aaron Teo <taronaeo@gmail.com> * Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8" This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256. * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Marcel Petrick <mail@marcelpetrick.it> Co-authored-by: Aaron Teo <taronaeo@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-05 08:50:21 +01:00
Concedo	749a606374	whisper broke	2026-02-26 16:45:04 +08:00
Georgi Gerganov	418dea39ce	ggml/gguf : prevent integer overflows (#19856 ) * gguf : prevent integer overflow for ggml_context mem size * ggml : fix int overflows in ggml_new_object() * gguf : prevent string exhaustion * gguf : prevent array elements exhaustion * ggml : fix negative tensor type oob * py : assert that alignment is non-zero power of 2 * ggml : check int overflow in ggml_new_tensor_impl and ggml_new_object * gguf-py : error on duplicate keys when reading * py : restore tensor_fields * enforce proper alignment in add_custom_alignment * gguf : better name * gguf : fix ctx size for no_alloc == true * gguf : minor print fix * ggml : print values when overflow * ggml : remove deprecated ggml_type_sizef() * ggml : relax ggml_type asserts to debug-only * gguf : add mem_size overflow test * gguf : add file size check for arrays * ggml : relax asseerts for ggml_get_type_traits() * flake8 fix --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-24 20:17:11 +02:00
Concedo	b6bb9c914e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/winget.yml # CMakeLists.txt # common/CMakeLists.txt # examples/model-conversion/scripts/causal/run-org-model.py # ggml/src/ggml-cpu/CMakeLists.txt # tools/perplexity/perplexity.cpp # tools/server/CMakeLists.txt	2026-02-17 19:41:28 +08:00
Judd	d23a55997d	ggml : make `ggml_is_view` as API (#19539 ) * make `ggml_is_view` as API * introduce `ggml_aux_is_view` as inline version for internal use. * change `ggml_aux_is_view` to `ggml_impl_is_view`	2026-02-16 17:43:34 +02:00
Concedo	1f803ae27b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/server.yml # CMakeLists.txt # cmake/common.cmake # ggml/src/ggml-virtgpu/apir_cs_ggml-rpc-front.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched-device.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h # ggml/src/ggml-virtgpu/backend/backend-dispatched.h # ggml/src/ggml-virtgpu/backend/backend.cpp # ggml/src/ggml-virtgpu/backend/shared/apir_cs.h # ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h # ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp # ggml/src/ggml-virtgpu/ggml-backend-device.cpp # ggml/src/ggml-virtgpu/ggml-backend-reg.cpp # ggml/src/ggml-virtgpu/ggml-remoting.h # ggml/src/ggml-virtgpu/ggmlremoting_functions.yaml # ggml/src/ggml-virtgpu/regenerate_remoting.py # ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp # ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp # ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp # ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp # ggml/src/ggml-virtgpu/virtgpu-forward-impl.h # ggml/src/ggml-virtgpu/virtgpu-forward.gen.h # ggml/src/ggml-virtgpu/virtgpu-shm.cpp # ggml/src/ggml-virtgpu/virtgpu.cpp # ggml/src/ggml-virtgpu/virtgpu.h	2026-02-04 16:21:06 +08:00
Kevin Pouget	015deb9048	ggml-virtgpu: make the code thread safe (#19204 ) * ggml-virtgpu: regenerate_remoting.py: add the ability to deprecate a function * ggml-virtgpu: deprecate buffer_type is_host remoting not necessary * ggml-virtgpu: stop using static vars as cache The static init isn't thread safe. * ggml-virtgpu: protect the use of the shared memory to transfer data * ggml-virtgpu: make the remote calls thread-safe * ggml-virtgpu: backend: don't continue if couldn't allocate the tensor memory * ggml-virtgpu: add a cleanup function for consistency * ggml-virtgpu: backend: don't crash if buft->iface.get_max_size is missing * fix style and ordering * Remove the static variable in apir_device_get_count * ggml-virtgpu: improve the logging * fix review minor formatting changes	2026-02-04 10:46:18 +08:00
Concedo	7b393fa487	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # AUTHORS # ci/run.sh # docs/backend/SYCL.md # docs/build.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmo4.0.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # docs/multimodal/minicpmv4.0.md # docs/multimodal/minicpmv4.5.md # docs/ops.md # docs/ops/SYCL.csv # docs/speculative.md # examples/deprecation-warning/README.md # examples/deprecation-warning/deprecation-warning.cpp # examples/model-conversion/Makefile # examples/model-conversion/scripts/causal/convert-model.sh # ggml/include/ggml-cann.h # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/concat.cl # ggml/src/ggml-opencl/kernels/repeat.cl # ggml/src/ggml-opencl/kernels/scale.cl # ggml/src/ggml-opencl/kernels/tanh.cl # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/outprod.cpp # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-sycl/wkv.cpp # src/llama-vocab.cpp # tests/test-autorelease.cpp # tests/test-backend-ops.cpp # tools/cvector-generator/pca.hpp # tools/export-lora/export-lora.cpp # tools/perplexity/README.md	2026-02-03 19:00:42 +08:00
Aman Gupta	9f682fb640	ggml-cpu: FA split across kv for faster TG (#19209 ) * ggml-cpu: split across kv for faster TG * simplify sinks application * add ref impl	2026-02-03 01:19:55 +08:00
Christian Kastner	7a4ca3cbd9	docs : Minor cleanups (#19252 ) * Update old URLs to github.com/ggml-org/ * Bump copyrights	2026-02-02 08:38:55 +02:00
Concedo	46cd17c17e	Merge commit '`88d23ad515`' into concedo_experimental # Conflicts: # CODEOWNERS # docs/build.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zendnn/CMakeLists.txt # tests/test-chat-template.cpp	2026-01-29 22:25:56 +08:00
Kevin Pouget	b7feacf7f3	ggml: new backend for Virglrenderer API Remoting acceleration (v2) (#18718 )	2026-01-28 17:49:40 +08:00
Concedo	5c6cc02985	remove clblast, part 2	2026-01-23 14:09:46 +08:00
Concedo	4984c9bc16	Merge commit '`12a4a47e6a`' into concedo_experimental # Conflicts: # ci/run.sh # examples/model-conversion/scripts/causal/run-converted-model-embeddings-logits.sh # examples/model-conversion/scripts/causal/run-converted-model.sh # examples/model-conversion/scripts/embedding/run-converted-model.sh # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zdnn/ggml-zdnn.cpp # ggml/src/ggml-zendnn/ggml-zendnn.cpp # tests/CMakeLists.txt # tests/test-chat-parser.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/cli/cli.cpp	2026-01-21 21:00:44 +08:00
Georgi Gerganov	365a3e8c31	ggml : add ggml_build_forward_select (#18550 ) * ggml : add ggml_build_forward_select * cuda : adapt CUDA graph compat to new feature * vulkan : update logic to handle command buffer closing * ggml : check compute for fusion * ggml : add comment	2026-01-19 20:03:19 +02:00
Concedo	0dc18c668c	Merge commit '`a61c8bc3bf`' into concedo_experimental # Conflicts: # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/pr2wt.sh # src/llama-model.cpp # tools/CMakeLists.txt # tools/mtmd/CMakeLists.txt # tools/mtmd/clip.cpp # tools/mtmd/clip.h	2026-01-13 23:06:50 +08:00
Masashi Yoshimura	480160d472	ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (#18628 ) * Fix GGML_MEM_ALIGN to 8 for emscripten. * Add a comment explaining the need for GGML_MEM_ALIGN == 8 in 64-bit wasm with emscripten	2026-01-08 08:36:42 -08:00
Concedo	7e1ae49e7d	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-cuda/ggml-cuda.cu # tests/test-backend-ops.cpp # tools/mtmd/CMakeLists.txt	2026-01-02 11:05:20 +08:00
Jeff Bolz	be47fb9285	vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295 ) * vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron Also handle GGML_OP_SCALE at the end (nemotron, deepseek2). Fewer pipeline variants and spec constants, just use push constants. In test_topk_moe, change exp_probs_b to be 1D, matching real networks. Update test-backend-ops and ggml-backend to allow verifying multiple outputs in a fusion test (topk_moe has two outputs). Previously only the final node was verified. * change test_topk_moe to allow results in arbitrary order * disable sigmoid fusion for moltenvk	2026-01-01 08:58:27 +01:00
Concedo	c93c4c5505	Merge commit '`4a4f7e6550`' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/011-bug-results.yml # CODEOWNERS # README.md # ci/run.sh # docs/development/HOWTO-add-model.md # grammars/README.md # src/llama-context.cpp # src/llama.cpp # tools/CMakeLists.txt # tools/completion/README.md # tools/llama-bench/README.md	2025-12-17 14:30:39 +08:00
Concedo	050a5b1f52	Merge commit '`4aced7a631`' into concedo_experimental # Conflicts: # .devops/cann.Dockerfile # .devops/cpu.Dockerfile # .devops/cuda.Dockerfile # .devops/intel.Dockerfile # .devops/musa.Dockerfile # .devops/rocm.Dockerfile # .devops/tools.sh # .devops/vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/release.yml # .gitignore # docs/ops.md # docs/ops/SYCL.csv # examples/batched/batched.cpp # examples/eval-callback/eval-callback.cpp # examples/gen-docs/gen-docs.cpp # examples/lookahead/lookahead.cpp # examples/lookup/lookup-create.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/model-conversion/scripts/causal/compare-logits.py # examples/model-conversion/scripts/causal/run-org-model.py # examples/model-conversion/scripts/utils/check-nmse.py # examples/parallel/parallel.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # examples/training/finetune.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/pad.cpp # ggml/src/ggml-sycl/ssm_conv.cpp # ggml/src/ggml-sycl/vecdotq.hpp # pyrightconfig.json # scripts/sync-ggml.last # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/imatrix.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/clip.cpp # tools/perplexity/perplexity.cpp # tools/server/README.md	2025-12-16 23:14:12 +08:00
Concedo	e88bf41fdc	Merge commit '`12280ae905`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # common/CMakeLists.txt # docs/docker.md # examples/model-conversion/scripts/causal/compare-logits.py # ggml/src/ggml-hexagon/htp/rope-ops.c # tests/test-backend-ops.cpp # tests/test-barrier.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-12-16 16:29:01 +08:00
Johannes Gäßler	b1f3a6e5db	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 ) * llama: automatically fit args to free memory llama-fit-params tool * fix CI * hints for bug reports, ensure no reallocation * fix segfault with Vulkan * add llama-fit-params to CI * fix CI * fix CI * fix CI * minor adjustments * fix assignment of 1 dense layer * fix logger not being reset on model load failure * remove --n-gpu-layer hint on model load failure * fix llama-fit-params verbosity * fix edge case * fix typo [no ci]	2025-12-15 09:24:59 +01:00
ixgbe	51604435e8	ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (#17951 ) * ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * using the name VLEN instead of CNT * Update ggml/include/ggml-cpu.h --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-12 16:26:03 +02:00
Georgi Gerganov	4dff236a52	ggml : remove GGML_KQ_MASK_PAD constant (#17910 ) * ggml : remove GGML_KQ_MASK_PAD constant * cont : remove comment	2025-12-10 20:53:16 +02:00
Concedo	17c0c8d55d	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/backend/zDNN.md # docs/build.md # docs/ops.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # src/llama-quant.cpp # tests/test-backend-ops.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-12-07 16:48:38 +08:00
Concedo	7c5d271d6c	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # .github/workflows/winget.yml # CMakeLists.txt # CODEOWNERS # CONTRIBUTING.md # cmake/build-info.cmake # docs/ops.md # docs/ops/BLAS.csv # docs/ops/Metal.csv # examples/CMakeLists.txt # examples/save-load-state/save-load-state.cpp # examples/simple-cmake-pkg/README.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py # src/llama-quant.cpp # tests/test-backend-ops.cpp # tools/server/CMakeLists.txt	2025-12-07 16:37:32 +08:00
Vishal Singh	017761daf5	ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 ) * ggml-zennn: add ZenDNN backend support * ggml-zendnn : address ZenDNN backend review fixes and suggestions * docs : apply blockquote syntax to ZenDNN docs --------- Co-authored-by: Manoj Kumar <mkumar@zettabolt.com>	2025-12-07 00:13:33 +08:00
Phylliida Dev	09c7c50e64	ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985 ) * Feat: Added vulkan circular tiling support * Feat: Added cpu circular * Feat: Added cuda kernels * Added tests * Added tests * Removed non-pad operations * Removed unneded changes * removed backend non pad tests * Update test-backend-ops.cpp * Fixed comment on pad test * removed trailing whitespace * Removed unneded test in test-backend-ops * Removed removed test from calls * Update ggml/src/ggml-vulkan/vulkan-shaders/pad.comp Co-authored-by: Ruben Ortlam <picard12@live.de> * Fixed alignment * Formatting Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Format pad * Format * Clang format * format * format * don't change so much stuff * clang format and update to bool * fix duplicates * don't need to fix the padding * make circular bool * duplicate again * rename vulkan to wrap around * Don't need indent * moved to const expr * removed unneded extra line break * More readable method calls * Minor wording changes * Added final newline * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Added circular pad ext tests * Gate non circular pad devices * Cleaned gating of non-circular pad devices --------- Co-authored-by: Phylliida <phylliidadev@gmail.com> Co-authored-by: Ruben Ortlam <picard12@live.de> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-06 15:07:02 +01:00
Concedo	eaf61c3fd7	dont use wmma for cuda anymore, fall back to tile kernel	2025-12-06 11:40:59 +08:00
Georgi Gerganov	8160b38a5f	rpc : fix alloc size logic (#17116 ) * rpc : fix alloc size logic * rpc : bump version	2025-12-05 19:39:04 +02:00
Adrien Gallouët	ef75a89fdb	build : move _WIN32_WINNT definition to headers (#17736 ) Previously, cmake was forcing `_WIN32_WINNT=0x0A00` for MinGW builds, This caused "macro redefined" warnings with toolchains that define the version. This also removes the `GGML_WIN_VER` variable as it is no longer needed. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-04 07:04:02 +01:00
Johannes Gäßler	2e1c9cd814	CUDA: generalized (mma) FA, add Volta support (#17505 ) * CUDA: generalized (mma) FA, add Volta support * use struct for MMA FA kernel config --------- Co-authored-by: Aman Gupta <aman>	2025-12-03 16:57:05 +01:00
Concedo	83269df91b	Merge commit '`649495c9d9`' into concedo_experimental # Conflicts: # CONTRIBUTING.md # SECURITY.md # docs/backend/SYCL.md # examples/sycl/run-llama2.sh # examples/sycl/run-llama3.sh # examples/sycl/win-run-llama2.bat # examples/sycl/win-run-llama3.bat # ggml/src/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-backend-ops.cpp # tests/test-json-schema-to-grammar.cpp # tools/server/CMakeLists.txt	2025-12-03 18:43:46 +08:00
Tarek Dakhran	2ba719519d	model: LFM2-VL fixes (#17577 ) * Adjust to pytorch * Add antialiasing upscale * Increase number of patches to 1024 * Handle default marker insertion for LFM2 * Switch to flag * Reformat * Cuda implementation of antialias kernel * Change placement in ops.cpp * consistent float literals * Pad only for LFM2 * Address PR feedback * Rollback default marker placement changes * Fallback to CPU implementation for antialias implementation of upscale	2025-11-30 21:57:31 +01:00
Concedo	d2d05bd365	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-rpc/ggml-rpc.cpp	2025-11-28 18:45:43 +08:00
Radoslav Gerganov	15d2b46b4d	rpc : cache and reuse compute graphs (#15405 ) Store the last computed graph and reuse it when possible. Also do not return response from GRAPH_COMPUTE and assume it always completes successfully. If this this is not the case, the server closes the connection. This saves us a network round trip to the server.	2025-11-28 08:33:51 +00:00
Concedo	4497096cb0	Merge commit '`3e18dba9fd`' into concedo_experimental # Conflicts: # CODEOWNERS # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # scripts/sync_vendor.py # tests/test-backend-ops.cpp	2025-11-27 00:07:37 +08:00
Georgi Gerganov	583cb83416	ggml : add ggml_top_k (#17365 ) * ggml : add ggml_top_k * cont : add ggml_argsort_top_k * metal : add top_k support * ggml : cleanup * tests : add virtual err() function for test_case * ggml : add comments	2025-11-25 15:31:43 +02:00
LostRuins Concedo	3fe0e39b62	Merge commit '`4dca015b7e`' into concedo_experimental # Conflicts: # .github/copilot-instructions.md # README.md # docs/ops.md # docs/ops/CPU.csv # docs/ops/CUDA.csv # docs/ops/Vulkan.csv # ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp # src/CMakeLists.txt # tests/test-backend-ops.cpp	2025-11-16 18:33:58 +08:00
LostRuins Concedo	85060da3ce	rename ggml_cumsum for tts	2025-11-16 17:54:58 +08:00
Piotr Wilkin (ilintar)	389ac78b26	ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063 ) * Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-13 20:54:47 +02:00
LostRuins Concedo	e6ca0aa8d0	Merge commit '`2f0c2db43e`' into concedo_experimental # Conflicts: # .github/labeler.yml # README.md # docs/backend/OPENCL.md # docs/ops.md # docs/ops/CUDA.csv # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/set_rows.tmpl.wgsl # scripts/sync-ggml.last # src/CMakeLists.txt # tools/server/README.md	2025-11-08 23:27:59 +08:00
Acly	cc98f8d349	ggml-cpu : bicubic interpolation (#16891 )	2025-11-04 13:12:20 +01:00
Concedo	2b00e55356	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # ggml/src/ggml-opencl/kernels/mul_mm_f16_f32_l4_lm.cl # ggml/src/ggml-opencl/kernels/mul_mm_f32_f32_l4_lm.cl # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-webgpu/wgsl-shaders/rope.tmpl.wgsl # requirements/requirements-convert_legacy_llama.txt # tests/test-backend-ops.cpp # tests/test-rope.cpp # tools/server/README.md	2025-10-31 10:52:57 +08:00
JJJYmmm	d261223d24	model: add support for qwen3vl series (#16780 ) * support qwen3vl series. Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> * bugfix: fix the arch check for qwen3vl-moe. * use build_ffn * optimize deepstack structure * optimize deepstack feature saving * Revert "optimize deepstack feature saving" for temporal fix This reverts commit f321b9fdf13e59527408152e73b1071e19a87e71. * code clean * use fused qkv in clip * clean up / rm is_deepstack_layers for simplification * add test model * move test model to "big" section * fix imrope check * remove trailing whitespace * fix rope fail * metal : add imrope support * add imrope support for sycl * vulkan: add imrope w/o check * fix vulkan * webgpu: add imrope w/o check * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix tensor mapping --------- Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-30 16:19:14 +01:00
Concedo	12a8bfd453	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CODEOWNERS # README.md # docs/ops.md # docs/ops/SYCL.csv # docs/ops/Vulkan.csv # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-backend-ops.cpp # tests/test-thread-safety.cpp	2025-10-23 17:22:17 +08:00
Max Krasnyansky	63d2fc46e1	Add experimental ggml-hexagon backend for the Hexagon NPU (#16547 ) * model: add support for extra bufs for all devices * hexagon: add experimental ggml-hexagon backend for the Hexagon NPU This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU. Highlights: - Supports Hexagon versions: v73, v75, v79, and v81 - Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5 - Supports Q4_0, Q8_0, MXFP4, and FP32 data types - Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX Note: This backend is experimental and may exhibit instability or limited performance across supported devices. It is intended for early testing and feedback from llama.cpp/ggml developer and user community. Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com> * hexagon: fix format checker errors * hexagon: update readme and cmake presets * ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions * hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input * hexagon: move ADB helper scripts into scripts/snapdragon/adb * hexagon: replace all f/printfs with GGML_LOG_... * readme: add hexagon to the list supported backends * hexagon: stack malmuts with quantized inputs only * hexagon: add TODO for fixing issues in hexagon_graph_optimize * hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC * scripts: fix lint errors * scripts: update qdc pytest script to make linter happy * hexagon: add reduce sum in fp32 * hexagon: reduce number of vector stores in matmul output * hexagon: remove the need for vdelta in reduce-multiply-x8 * hexagon: consistent use of reduce_sum_fp32 for row_sums * hexagon: some more matmul optimizations and comments Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models). We've handled those cases already but at a higher overhead. * hexagon: update cmake presets * hexagon: add OPMASK support for run-bench.sh wrapper * hexagon: update to use GGML_BACKEND_API * hexagon: remove unused logic for setting tensor flags for the views * hexagon: add asserts to set/get_tensor to make sure we handle complete tensors Same asserts as the CPU backend. * hexagon: use cpy_tensor slow path for non-host buffers * hexagon: error checks in the buffer allocator * cmake: move include(extProj) under ggml-hexagon * hexagon: don't forget to delete the backend on free * hexagon: set/get_tensor size assert apply only to quantized tensors * hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way. Ideally we need a bit more finer log levels. * docs: typos in hexagon developer docs (libggm-...) * hexagon: overhaul error handling in the session/device allocation this should handle all failure paths in the session allocation. * hexagon: update cmake presets to enable fp16 vectors * hexagon: remove unused time_usec function * hexagon: don't forget to release buffer contexts * hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure) * hexagon: remove custom can_repeat function and use ggml_can_repeat --------- Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>	2025-10-22 13:47:09 -07:00
Concedo	f47a0690ac	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/ops.md # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tools/rpc/rpc-server.cpp	2025-10-18 11:10:37 +08:00

1 2 3 4 5 ...

290 commits