koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-09 19:46:11 +00:00

Author	SHA1	Message	Date
Concedo	fb13e3e51b	Merge branch 'upstream' into concedo_experimental # Conflicts: # src/llama-context.cpp # tests/test-backend-ops.cpp	2025-06-22 23:26:15 +08:00
Concedo	abc1d8ac25	better way of checking for avx2 support	2025-06-22 22:56:50 +08:00
uvos	af3373f1ad	HIP: enable vec fattn on RDNA4 (#14323 )	2025-06-22 16:51:23 +02:00
yuiseki	5d5c066de8	mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326 ) Mistral Small 2506 models using Pixtral vision encoder were running out of GPU memory when processing images larger than 1024x1024 pixels due to exponential memory growth from unlimited image size. This fix applies the same 1024x1024 limit used by Qwen2VL models to prevent OOM issues while maintaining compatibility with existing models.	2025-06-22 14:44:57 +02:00
Concedo	52dcfe42d6	try auto selecting correct backend while checking intrinsics	2025-06-22 18:16:02 +08:00
Sigbjørn Skjæret	40bfa04c95	common : use std::string_view now that we target c++17 (#14319 )	2025-06-22 08:37:43 +03:00
Aman Gupta	aa064b2eb7	CUDA: add mean operation (#14313 ) * CUDA: add mean operation * add back sum_rows_f32_cuda * Review: early exit if col!=0	2025-06-22 12:39:54 +08:00
Sigbjørn Skjæret	aa0ef5c578	gguf-py : fix Qwen3-Embedding eos token (#14314 )	2025-06-21 18:12:05 +02:00
Concedo	72d467c6d5	vision is now working in ollama owui	2025-06-21 23:43:43 +08:00
Concedo	6039791adf	minor bugfixes	2025-06-21 18:41:28 +08:00
Concedo	45f589b78d	test gfx1200 again	2025-06-21 17:56:04 +08:00
Markus Tavenrath	bb16041cae	Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792 ) * Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled. * remove #ifdef for debug utils and add queue marker.	2025-06-21 08:17:12 +02:00
Sigbjørn Skjæret	58cba76a9a	gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312 )	2025-06-21 07:33:21 +02:00
Georgi Gerganov	67ae5312e2	metal : fix thread-safety (#14300 ) ggml-ci	2025-06-21 08:04:18 +03:00
Georgi Gerganov	692e3cdd0a	memory : rename interface to llama_memory_context_i (#14296 ) * memory : rename interface to llama_memory_context_i ggml-ci * cont : fix comments * cont : use "mctx" for referencing a memory context ggml-ci	2025-06-21 08:03:46 +03:00
Daniel Han	b23fa0b3f4	convert : fix Llama 4 conversion (#14311 )	2025-06-21 06:32:01 +02:00
Concedo	65ff041827	added more perf stats	2025-06-21 12:12:28 +08:00
Concedo	ea21a9d749	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/build.md # scripts/sync-ggml.last	2025-06-21 10:40:07 +08:00
Wagner Bruna	08adfb53c9	Configurable VAE threshold limit (#1601 ) * add backend support for changing the VAE tiling threshold * trigger VAE tiling by image area instead of dimensions I've tested with GGML_VULKAN_MEMORY_DEBUG all resolutions with the same 768x768 area (even extremes like 64x9216), and many below that: all consistently allocate 6656 bytes per image pixel. As tiling is primarily useful to avoid excessive memory usage, it seems reasonable to enable VAE tiling based on area rather than maximum image side. However, as there is currently no user interface option to change it back to a lower value, it's best to maintain the default behavior for now. * replace the notile option with a configurable threshold This allows selecting a lower threshold value, reducing the peak memory usage. The legacy sdnotile parameter gets automatically converted to the new parameter, if it's the only one supplied. * simplify tiling checks, 768 default visible in launcher --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2025-06-21 10:14:57 +08:00
Concedo	caea52407a	fix photomaker crash	2025-06-21 10:11:39 +08:00
Concedo	684d71e058	add old convert tool	2025-06-21 08:40:04 +08:00
Georgi Gerganov	06cbedfca1	sync : ggml ggml-ci	2025-06-20 21:02:47 +03:00
Acly	b7147673f2	Add `ggml_roll` (ggml/1274) * ggml : add ggml_roll * use set/get_op_params & std::min	2025-06-20 21:02:47 +03:00
David Chiu	d860dd99a4	docs : fix the link to llama.h (#14293 )	2025-06-20 19:43:35 +02:00
Concedo	ce58d1253f	fixed build and workflow	2025-06-21 00:56:27 +08:00
Concedo	4f2fcaa2ef	Merge branch 'upstream' into concedo_experimental # Conflicts: # ci/run.sh # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/concat.cpp # ggml/src/ggml-sycl/conv.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/getrows.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/gla.cpp # ggml/src/ggml-sycl/im2col.cpp # ggml/src/ggml-sycl/mmq.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/norm.cpp # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-sycl/tsembd.cpp # ggml/src/ggml-sycl/wkv.cpp # tests/test-backend-ops.cpp	2025-06-21 00:32:22 +08:00
Concedo	c16d672ce4	Merge commit '`9230dbe2c7`' into concedo_experimental # Conflicts: # ggml/src/ggml-cpu/CMakeLists.txt # src/llama-graph.cpp # tools/server/README.md	2025-06-21 00:01:29 +08:00
Concedo	b59b5dbbd1	Merge commit '`456af35eb7`' into concedo_experimental # Conflicts: # ggml/src/ggml-sycl/getrows.cpp # src/CMakeLists.txt # tools/llama-bench/llama-bench.cpp	2025-06-20 23:41:27 +08:00
Concedo	0ad95e8ea2	updated lite	2025-06-20 23:04:59 +08:00
Aman Gupta	c959f462a0	CUDA: add conv_2d_transpose (#14287 ) * CUDA: add conv_2d_transpose * remove direct include of cuda_fp16 * Review: add brackets for readability, remove ggml_set_param and add asserts	2025-06-20 22:48:24 +08:00
Sigbjørn Skjæret	22015b2092	lint : remove trailing whitepace (#14304 )	2025-06-20 16:37:44 +02:00
Ruikai Peng	dd6e6d0b6a	vocab : prevent tokenizer overflow (#14301 ) * vocab : prevent stack overflow in tokenize * vocab : return error instead of aborting on oversized token count * vocab : INT32_MIN from llama_tokenize on overflow	2025-06-20 07:13:06 -07:00
Concedo	2ba7803b95	replace_instruct_placeholders is now default	2025-06-20 22:11:58 +08:00
henk717	9c27ccde50	RWKV World chat adapters (#1612 )	2025-06-20 21:39:59 +08:00
Concedo	4e40f2aaf4	added photomaker face cloning	2025-06-20 21:33:36 +08:00
Nicolò Scipione	8308f98c7f	sycl: add usage of enqueue_functions extension (#14244 ) * Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-06-20 15:07:21 +02:00
Christian Kastner	6369be0735	Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286 ) * Add PowerPC feature detection and scoring * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC * ggml-cpu: Delay some initializations until function is called When using GGML_BACKEND_DL=ON, these initializations might use instructions that are not supported by the current CPU. --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-06-20 14:17:32 +02:00
Sigbjørn Skjæret	88fc854b4b	llama : improve sep token handling (#14272 )	2025-06-20 14:04:09 +02:00
Diego Devesa	e28c1b93fd	cuda : synchronize graph capture and cublas handle destruction (#14288 ) Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread	2025-06-20 13:57:36 +02:00
Georgi Gerganov	d27b3ca175	ggml : fix repack work size for mul_mat_id (#14292 ) ggml-ci	2025-06-20 11:19:15 +03:00
Charles Xu	9230dbe2c7	ggml: Update KleidiAI to v1.9.0 (#14277 )	2025-06-20 10:51:01 +03:00
Georgi Gerganov	812939a9e9	model : more uniform output id handling (#14275 ) * model : more uniform output id handling ggml-ci * cont : revert n_outputs < n_tokens optimization ggml-ci * cont : fix out_ids initialization ggml-ci	2025-06-20 10:50:27 +03:00
Concedo	21881a861d	rename restrict square to sdclampedsoft	2025-06-20 15:39:55 +08:00
Georgi Gerganov	4c9fdfbe15	ubatch : new splitting logic (#14217 ) ggml-ci	2025-06-20 10:14:14 +03:00
Concedo	175c99081e	merged https://github.com/leejet/stable-diffusion.cpp/issues/588 to fix vae tiling, ref https://github.com/LostRuins/koboldcpp/issues/1603	2025-06-20 11:13:04 +08:00
Aman Gupta	9eaa51e7f0	CUDA: add conv_2d_dw (#14265 ) * CUDA: add conv_2d_dw * better naming * simplify using template * Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const	2025-06-20 09:50:24 +08:00
Diego Devesa	8f71d0f3e8	ggml-cpu : remove unnecesary arm feature detection (#14281 ) Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.	2025-06-19 21:24:14 +02:00
Concedo	b925bbfc6d	add simple api example	2025-06-19 23:05:28 +08:00
Concedo	771261f5be	updated sdui	2025-06-19 22:16:23 +08:00
Alex Trotta	381174bbda	gguf-py : make sentencepiece optional (#14200 ) * Make sentencepiece optional * Bump to 0.18.0 * Bump patch instead of minor Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>	2025-06-19 15:56:12 +02:00

1 2 3 4 5 ...

8543 commits