koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-11 04:51:25 +00:00

Author	SHA1	Message	Date
Neo Zhang Jianyu	98bd9ab1e4	enhance argsort for UT (#17573 ) Co-authored-by: Neo Zhang <zhang.jianyu@outlook.com>	2025-12-02 08:56:46 +08:00
Georgi Gerganov	649495c9d9	metal : add FA head size 48 (#17619 )	2025-12-01 12:49:53 +02:00
Georgi Gerganov	90c72a614a	ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler (#17617 )	2025-12-01 12:49:33 +02:00
Aman Gupta	6eea666912	llama-graph: avoid expand_forward for fusion (#17633 )	2025-12-01 11:12:48 +02:00
Tarek Dakhran	2ba719519d	model: LFM2-VL fixes (#17577 ) * Adjust to pytorch * Add antialiasing upscale * Increase number of patches to 1024 * Handle default marker insertion for LFM2 * Switch to flag * Reformat * Cuda implementation of antialias kernel * Change placement in ops.cpp * consistent float literals * Pad only for LFM2 * Address PR feedback * Rollback default marker placement changes * Fallback to CPU implementation for antialias implementation of upscale	2025-11-30 21:57:31 +01:00
Concedo	addea2b62a	Merge commit '`47a268ea50`' into concedo_experimental # Conflicts: # ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp # tests/test-backend-ops.cpp	2025-11-30 17:28:07 +08:00
Concedo	95be49ac19	sync write_output_files function	2025-11-30 17:21:38 +08:00
Concedo	ef992b4ab7	cleanup	2025-11-30 15:55:58 +08:00
Concedo	bf5efcf86d	Merge commit '`d82b7a7c1d`' into concedo_experimental # Conflicts: # ci/run.sh # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cuda/common.cuh # tests/CMakeLists.txt	2025-11-30 15:43:11 +08:00
Gilad S.	fa0465954f	ggml: fix: macOS build with `-DGGML_BACKEND_DL=ON` (#17581 ) Some checks failed Python Type-Check / pyright type-check (push) Has been cancelled Details	2025-11-30 10:00:59 +08:00
Aman Gupta	c7af376c29	CUDA: add stream-based concurrency (#16991 ) * CUDA: add stream-based concurrency * HIP: fix hipStreamWaitEvent define and nodiscard warnings * ggml-cuda: fix fusion inside stream * ggml-cuda: fix bug w.r.t first stream launch * ggml-cuda: format * ggml-cuda: improve assert message * ggml-cuda: use lambda instead of duplicating code * ggml-cuda: add some more comments * ggml-cuda: add more detailed comments about concurrency * ggml-cuda: rename + remove unused var * ggml-cuda: fix condition for stream launch * ggml-cuda: address review comments, add destructor * common.cuh: add is_valid for concurrent events * common.cuh: make comment better * update comment Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * update comment Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * common.cuh: fix lower_bound condition + remove join_node data from write_ranges * ggml-cuda: fix overlap condition + shadowing parameter --------- Co-authored-by: Carl Philipp Klemm <carl@uvos.xyz> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-11-30 08:17:55 +08:00
Mahekk Shaikh	00425e2ed1	cuda : add error checking for cudaMemcpyAsync in argsort (#17599 ) * cuda : add error checking for cudaMemcpyAsync in argsort (#12836) * fix indentation	2025-11-30 08:16:28 +08:00
Acly	385c3da5e6	vulkan : fix FA mask load with bounds check (coopmat2) (#17606 )	2025-11-30 01:03:21 +01:00
Neo Zhang	7d2add51d8	sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566 ) Some checks are pending Python Type-Check / pyright type-check (push) Waiting to run Details Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2025-11-29 14:59:44 +02:00
ixgbe	f698a79c63	ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-29 14:56:31 +02:00
Ruben Ortlam	47a268ea50	Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900 ) * vulkan: split mul_mmq_funcs for mul_mat_vecq use * add mxfp4 mmvq * add q2_k mmvq * add q3_k mmvq * add q4_k and q5_k mmvq * add q6_k mmvq * handle 4x4 quants per mmvq thread * enable MUL_MAT_ID mmvq support * enable subgroup optimizations for mul_mat_vec_id shaders * device tuning * request prealloc_y sync after quantization * fix indentation * fix llvmpipe test failures * fix mul_mat_id mmvq condition * fix unused variable warning	2025-11-29 09:37:22 +01:00
Jeff Bolz	59d8d4e963	vulkan: improve topk perf for large k, fix overflow in unit tests (#17582 )	2025-11-29 08:39:57 +01:00
Diego Devesa	e072b2052e	ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276 ) * ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched Enabled in ggml-ci for testing. * llama : update worst-case graph for unified cache * ci : disable op offload in some tests * fix spelling --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 17:33:23 +02:00
Concedo	0ccb298087	Merge commit '`ddf9f94389`' into concedo_experimental # Conflicts: # examples/model-conversion/scripts/causal/run-converted-model.sh # examples/model-conversion/scripts/causal/run-org-model.py # src/CMakeLists.txt # src/llama-quant.cpp # tools/server/README.md	2025-11-28 23:27:50 +08:00
R0CKSTAR	c6f7a423c8	[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551 ) Some checks failed Python Type-Check / pyright type-check (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details * [MUSA] enable fp16/fast_fp16/bf16_mma on PH1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/fattn-vec.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/fattn-vec.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/fattn-tile.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-11-28 14:08:29 +01:00
Aman Gupta	2e7ef98f18	ggml-cuda: add stricter checking for fusion (#17568 ) * ggml-cuda: make conditions for fusion more explicit * ggml-cuda: remove size check as std::equal already does it	2025-11-28 20:34:51 +08:00
Piotr Wilkin (ilintar)	ff55414c42	model : Qwen3 Next (#16095 ) * Qwen3 Next - cleaned up version * Whitespaces and stuff * Correct minor errors * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Misc. fixes. * Clean up code, add missing hybrid qualifier * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Whitespace * Proper tensors for cb calls * Use llama-graph.h vertical alignment * BROKEN: chunking * Set new tensors as inputs. * Proper chunk logic * It's the circle of life... * More shenanigans for n_seq > 1 * Nail in the coffin? * Fix Windows build * Eh, one fails on Windows, the other fails on Mac... just use general capture. * quant : cleanup * model : cleanup * qwen3 : cleanup * cont : cleanup * cont : cleanup * ggml : revert change * qwen3 : cleanup * cont : cleanup * Readd cmath * qwen3 : fix typo * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Usual suspects * fix my bad suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 12:02:56 +01:00
Concedo	d2d05bd365	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-rpc/ggml-rpc.cpp	2025-11-28 18:45:43 +08:00
Johannes Gäßler	73955f7d2a	CUDA: no FP16 arithmetic for vector FA kernel (#17558 )	2025-11-28 10:29:09 +01:00
Jeff Bolz	35cf8887e1	vulkan: Implement GGML_OP_TRI (#17503 ) * vulkan: Implement GGML_OP_TRI * check types match	2025-11-28 10:07:29 +01:00
Radoslav Gerganov	15d2b46b4d	rpc : cache and reuse compute graphs (#15405 ) Store the last computed graph and reuse it when possible. Also do not return response from GRAPH_COMPUTE and assume it always completes successfully. If this this is not the case, the server closes the connection. This saves us a network round trip to the server.	2025-11-28 08:33:51 +00:00
yulo	6bca76ff5e	HIP: enable mul_mat_f for RDNA4 (#17437 ) * enable mmf for rdna4 * move some mmvf to mmf * revert lds128 for wmma loading * Revert "revert lds128 for wmma loading" This reverts commit db9ae8b6b4738a5def5b393caa1611d52133e9b5. * Revert "enable mmf for rdna4" This reverts commit 698c9f24187b990e35c3b73a8067e5387e6ddbd4. * Revert "move some mmvf to mmf" This reverts commit 99b92bd6653cc8593607f641e44606391691792f. * enable mul_mat for rdna4 --------- Co-authored-by: zhang hui <you@example.com>	2025-11-28 08:24:30 +01:00
Concedo	6aa79513a9	Merge branch 'cuda-fa-vec-fix-overflow-2' into concedo_experimental	2025-11-28 13:27:16 +08:00
Concedo	eda4a312cb	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/vulkan.Dockerfile # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/common.hpp # tests/test-backend-ops.cpp # tools/server/README.md	2025-11-28 13:22:02 +08:00
Concedo	e570478275	limit cuda arches + scale tweaks	2025-11-28 13:05:11 +08:00
Piotr Wilkin (ilintar)	cd0e3a7a3b	SOLVE_TRI CUDA kernel for small matrices (#17457 ) Some checks failed Python Type-Check / pyright type-check (push) Has been cancelled Details	2025-11-28 12:15:32 +08:00
Neo Zhang Jianyu	efaaccdd69	refactor pad_reflect_1d to make the UT case pass (#17204 ) Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>	2025-11-28 08:50:56 +08:00
Johannes Gäßler	b13fcf85c5	CUDA: no FP16 arithmetic for vector FA kernel	2025-11-27 21:13:46 +01:00
Jeff Bolz	4abef75f2c	vulkan: Implement SOLVE_TRI (#17486 ) * vulkan: Implement SOLVE_TRI * load B matrix through shared memory * use FLOAT_TYPE	2025-11-27 15:48:00 +01:00
matt23654	909072abcf	cuda : fix UMA detection on discrete GPUs. (#17537 )	2025-11-27 13:35:35 +02:00
Alberto Cabrera Pérez	cd8370b408	ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod only) (#17494 ) * Enabled q4_K_4x8 path * Fixed generic Q4_K 8x4 implementation * wip: dotprod gemm * Working arm q4_K dotprod gemm Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Undo acc rename Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Q4_K arm dotprod gemm Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fix: q4_qs reinterpret from uint to int Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Removed comments * Fixed macro guards * Fixed unused vars in generic implementation * Fixed unused vars in 8x4 repack * Fixed unused vars in generic implementation, unneeded comment * Missing arch fallback for x86 * minor : style --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-27 13:25:14 +02:00
Acly	b78db3bd50	vulkan : move contiguous checks to device_supports_op (#17490 ) * vulkan : remove op_supports_incontiguous and add missing constraints in device_supports_op * im2col: remove contraints on src0 (kernel input)	2025-11-27 06:54:19 +01:00
Jeff Bolz	142df17c9c	vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (#17514 )	2025-11-27 06:32:30 +01:00
lhez	7cba58bbea	opencl: add sqr, sqrt, mean and ssm_conv (#17476 ) * opencl: add sqr * opencl: add sqrt * opencl: add mean * opencl: add ssm_conv * opencl: add missing cl_khr_fp16 * opencl: do sqrt in f32 then convert to f16 for better precision	2025-11-26 13:29:58 -08:00
Alberto Cabrera Pérez	5449367b21	Fix chunks being too small with small matrix sizes (#17526 )	2025-11-26 13:14:54 -08:00
Concedo	d7c2f27749	try to fix some fattn inconsistencies	2025-11-27 01:55:26 +08:00
Concedo	e6ad29341b	disable FA for clip test	2025-11-27 01:02:19 +08:00
Concedo	4497096cb0	Merge commit '`3e18dba9fd`' into concedo_experimental # Conflicts: # CODEOWNERS # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # scripts/sync_vendor.py # tests/test-backend-ops.cpp	2025-11-27 00:07:37 +08:00
Jeff Bolz	eec1e33a9e	vulkan: allow graph_optimize for prompt processing workloads (#17475 )	2025-11-26 16:46:33 +01:00
Jeff Bolz	879d673759	vulkan: Implement top-k (#17418 ) * vulkan: Implement top-k Each pass launches workgroups that each sort 2^N elements (where N is usually 7-10) and discards all but the top K. Repeat until only K are left. And there's a fast path when K==1 to just find the max value rather than sorting. * fix pipeline selection * vulkan: Add N-ary search algorithm for topk * microoptimizations	2025-11-26 16:45:43 +01:00
Concedo	5fe1d51c24	fix gpt oss	2025-11-26 23:44:56 +08:00
xctan	6ab4e50d9c	ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448 ) * ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 * ggml-cpu : dedup scalar impl * Update ggml/src/ggml-cpu/vec.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-26 15:33:05 +02:00
Adrien Gallouët	e6923caaec	ggml : fix ARM feature verification (#17519 ) On arm64 with `cmake` version 3.31.6, the final feature verification fails: -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs -- Performing Test GGML_MACHINE_SUPPORTS_dotprod -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success -- Performing Test GGML_MACHINE_SUPPORTS_i8mm -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success -- Performing Test GGML_MACHINE_SUPPORTS_sve -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success -- Performing Test GGML_MACHINE_SUPPORTS_sme -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed -- Performing Test GGML_MACHINE_SUPPORTS_nosme -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success -- Checking for ARM features using flags: -- -U__ARM_FEATURE_SME -- -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme -- Performing Test HAVE_DOTPROD -- Performing Test HAVE_DOTPROD - Failed -- Performing Test HAVE_SVE -- Performing Test HAVE_SVE - Failed -- Performing Test HAVE_MATMUL_INT8 -- Performing Test HAVE_MATMUL_INT8 - Failed -- Performing Test HAVE_FMA -- Performing Test HAVE_FMA - Success -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed -- Performing Test HAVE_SME -- Performing Test HAVE_SME - Failed -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme We need to explicitly replace `;` with spaces from the list to make `CMAKE_REQUIRED_FLAGS` work correctly... Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-26 15:14:41 +02:00
Jiacheng (Jason) Chen	3e18dba9fd	HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (#17502 ) * patch failed test case MUL_MAT(type_a=q4_0,type_b=f32,m=576,n=512,k=576,bs=[1,1],nr=[1,1],per=[0,1,2,3],k_v=0,o=1) for enabling WMMA on RDNA4 * Quick clean up on mma.cuh to add ggml_cuda_memcpy_1 back in for half2 and bfloat162	2025-11-26 11:18:48 +01:00
hipudding	eeb5605de2	CANN: Add MROPE and IMROPE support (#17401 ) Some checks failed Python check requirements.txt / check-requirements (push) Has been cancelled Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details * CANN: ROPE supports both MROPE and IMROPE. 1. Optimize the caching logic of rope_cache_init. 2. Add support for mRoPE and i-mRoPE. Note that on Ascend 910B devices, it is necessary to disable FA in CLIP and disable NZ-format conversion. These two issues are still under investigation. * Resolve review comments	2025-11-26 16:44:19 +08:00

1 2 3 4 5 ...

2142 commits