koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 12:11:08 +00:00

Author	SHA1	Message	Date
Concedo	9809deed6a	updated docs	2025-06-14 16:56:28 +08:00
Concedo	238be98efa	Allow override config for gguf files when reloading in admin mode, updated lite, fixed typo (+1 squashed commits) Squashed commits: [fe14845cc] Allow override config for gguf files when reloading in admin mode, updated lite (+2 squashed commit) Squashed commit: [9ded66aa5] Allow override config for gguf files when reloading in admin mode [9597f6a34] update lite	2025-06-14 12:00:20 +08:00
Concedo	bfb47cbcd8	Revert "revert padding change for sd chroma" This reverts commit `7de88802f9`.	2025-06-14 10:10:34 +08:00
Concedo	5f9e96e82d	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # CMakeLists.txt # README.md # common/CMakeLists.txt # docs/multimodal.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # src/llama-context.cpp	2025-06-14 09:05:45 +08:00
Concedo	69e4a32ca2	Merge commit '`d4e0d95cf5`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # common/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/sync-ggml.last # tests/CMakeLists.txt	2025-06-14 01:58:53 +08:00
Concedo	33809c9e82	doing what i must because i can, after the mess that is https://github.com/ggml-org/llama.cpp/pull/13892 there is so much duplicate code in each cpu arch, i expect upstream will prune it eventually arch detection has no fallback if all the arches are not found, by right we should set GGML_CPU_GENERIC i should be relaxing its the weekend	2025-06-14 01:41:16 +08:00
Georgi Gerganov	fb85a288d7	vocab : fix build (#14175 ) ggml-ci	2025-06-13 20:03:05 +03:00
Svetlozar Georgiev	40643edb86	sycl: fix docker image (#14144 )	2025-06-13 18:32:56 +02:00
Guy Goldenberg	3cfbbdb44e	Merge commit from fork * vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-13 19:20:25 +03:00
Concedo	f50c793140	not working - refactoring	2025-06-14 00:03:21 +08:00
Georgi Gerganov	80709b70a2	batch : add LLAMA_BATCH_DEBUG environment variable (#14172 ) * batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display	2025-06-13 18:35:00 +03:00
Concedo	c494525b33	update deprecated apis	2025-06-13 22:21:15 +08:00
Concedo	4204f111f7	Merge commit '`8f47e25f56`' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-linux-cross.yml # docs/backend/CANN.md # examples/batched.swift/Sources/main.swift # examples/embedding/embedding.cpp # examples/gritlm/gritlm.cpp # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.swiftui/llama.cpp.swift/LibLlama.swift # examples/lookahead/lookahead.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/simple-chat/simple-chat.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/imatrix.cpp # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp # tools/run/run.cpp	2025-06-13 22:05:03 +08:00
ddpasa	26ff3685bf	docs : Update multimodal.md (#14122 ) * Update multimodal.md * Update multimodal.md	2025-06-13 15:17:53 +02:00
Wagner Bruna	f6d2d1ce5c	configurable resolution limit (#1586 ) * refactor image gen configuration screen * make image size limit configurable * fix resolution limits and keep dimensions closer to the original ratio * use 0.0 for the configured default image size limit This prevents the current default value from being saved into the config files, in case we later decide to adopt a different value. * export image model version when loading * restore model-specific default image size limit * change the image area restriction to be specified by a square side * move image resolution limits down to the C++ level * Revert "export image model version when loading" This reverts commit `fa65b23de3`. * Linting Fixes: PY: - Inconsistent var name sd_restrict_square -> sd_restrict_square_var - GUI swap back to using absolute row numbers for now. - fstring fix - size_limit -> side_limit inconsistency C++: - roundup_64 standalone function - refactor sd_fix_resolution variable names for clarity - move "anti crashing" hard total megapixel limit always to be applied after soft total megapixel limit instead of conditionally only when sd_restrict_square is unset * allow unsafe resolutions if debugmode is on --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2025-06-13 20:05:20 +08:00
Georgi Gerganov	60c666347b	batch : rework llama_batch_allocr (#14153 ) * batch : rework llama_batch_allocr ggml-ci * cont : move validation inside class ggml-ci * cont : move output counting to class ggml-ci * cont : minor ggml-ci * batch : add TODOs ggml-ci	2025-06-13 13:47:55 +03:00
Reithan	f1c9db4174	fix-loss-of-destroyed-tokens-in-grammar-pre-pass (#1600 )	2025-06-13 18:46:38 +08:00
Georgi Gerganov	b7cc7745e3	readme : remove survey link (#14168 )	2025-06-13 11:55:44 +03:00
Christian Kastner	cc8d081879	cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 ) * cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT * cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*	2025-06-13 10:38:52 +02:00
Đinh Trọng Huy	d714dadb57	pooling : make cls_b and cls_out_b optional (#14165 ) Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>	2025-06-13 11:34:08 +03:00
Georgi Gerganov	ffad043973	server : fix SWA condition for full context reprocess (#14163 ) ggml-ci	2025-06-13 11:18:25 +03:00
Concedo	5bac0fb3d5	remove debug prints for now, they were kind of cluttered	2025-06-13 16:00:23 +08:00
Reithan	5af9138ebe	Improve GNBF performance by attempting culled grammar search first (#1597 ) * cull tokens with top_3k first before running grammar, fallback to unculled if none found * fix errors * fix improvement and test against concedo's GBNF * revert non-culling changes	2025-06-13 15:57:27 +08:00
Anton Mitkov	0889eba570	sycl: Adding additional cpy dbg print output (#14034 )	2025-06-13 08:51:39 +01:00
Ewan Crawford	c61285e739	SYCL: Bump oneMath commit (#14152 ) Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669 which adds SYCL-Graph support for recording CUDA BLAS commands. With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph enabled. Prior to this change, an error would be thrown. ``` $ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2 UR CUDA ERROR: Value: 700 Name: CUDA_ERROR_ILLEGAL_ADDRESS Description: an illegal memory access was encountered Function: operator() Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154 Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN) Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator() SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code! in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598 $HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. ```	2025-06-13 08:45:37 +01:00
Christian Kastner	09cf2c7c65	cmake : Improve build-info.cpp generation (#14156 ) * cmake: Simplify build-info.cpp generation The rebuild of build-info.cpp still gets triggered when .git/index gets changes. * cmake: generate build-info.cpp in build dir	2025-06-13 09:51:34 +03:00
Georgi Gerganov	c33fe8b8c4	vocab : prevent heap overflow when vocab is too small (#14145 ) ggml-ci	2025-06-13 08:03:54 +03:00
Anton Mitkov	ed52f3668e	sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125 )	2025-06-12 15:15:11 +02:00
Georgi Gerganov	a681b4ba83	readme : remove project status link (#14149 )	2025-06-12 14:43:09 +03:00
Concedo	1cbe716e45	allow setting maingpu	2025-06-12 17:53:43 +08:00
Concedo	7a688e07cd	remove gfx12 until amd wakes up	2025-06-12 16:52:55 +08:00
Georgi Gerganov	7d516443dd	server : re-enable SWA speculative decoding (#14131 ) ggml-ci	2025-06-12 11:51:38 +03:00
Georgi Gerganov	f6e1a7aa87	context : simplify output counting logic during decode (#14142 ) * batch : remove logits_all flag ggml-ci * context : simplify output counting logic during decode ggml-ci * cont : fix comments	2025-06-12 11:50:01 +03:00
Georgi Gerganov	c3ee46fab4	batch : remove logits_all flag (#14141 ) ggml-ci	2025-06-12 11:49:26 +03:00
Concedo	1970d8c9e8	uvos said it might work	2025-06-12 16:44:46 +08:00
Georgi Gerganov	e2c0b6e46a	cmake : handle whitepsaces in path during metal build (#14126 ) * cmake : handle whitepsaces in path during metal build ggml-ci * cont : proper fix ggml-ci --------- Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2025-06-12 10:14:24 +03:00
Georgi Gerganov	9596506965	kv-cache : fix split_equal handling in unified implementation (#14130 ) ggml-ci	2025-06-12 10:02:15 +03:00
compilade	a20b2b05bc	context : round n_tokens to next multiple of n_seqs when reserving (#14140 ) This fixes RWKV inference which otherwise failed when the worst case ubatch.n_seq_tokens rounded to 0.	2025-06-12 02:56:04 -04:00
bandoti	2e89f76b7a	common: fix issue with regex_escape routine on windows (#14133 )	2025-06-11 17:19:44 -03:00
Christian Kastner	532802f938	Implement GGML_CPU_ALL_VARIANTS for ARM (#14080 ) * ggml-cpu: Factor out feature detection build from x86 * ggml-cpu: Add ARM feature detection and scoring This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_<FEAT> which need to be set in cmake, instead of GGML_<FEAT> that users would set for x86. This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags. * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>. Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used. * ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now The other platforms will need their own specific variants. This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior.	2025-06-11 21:07:44 +02:00
Sigbjørn Skjæret	d4e0d95cf5	chore : clean up relative source dir paths (#14128 )	2025-06-11 19:04:23 +02:00
Sigbjørn Skjæret	cc66a7f78f	tests : add test-tokenizers-repo (#14017 )	2025-06-11 17:16:32 +02:00
Jeff Bolz	bd248d4dc7	vulkan: Better thread-safety for command pools/buffers (#14116 ) This change moves the command pool/buffer tracking into a vk_command_pool structure. There are two instances per context (for compute+transfer) and two instances per device for operations that don't go through a context. This should prevent separate contexts from stomping on each other.	2025-06-11 09:48:52 -05:00
Aman	7781e5fe99	webui: Wrap long numbers instead of infinite horizontal scroll (#14062 ) * webui: Wrap long numbers instead of infinite horizontal scroll * Use tailwind class * update index.html.gz	2025-06-11 16:42:25 +02:00
Georgi Gerganov	89a184fa71	kv-cache : relax SWA masking condition (#14119 ) ggml-ci	2025-06-11 16:48:45 +03:00
Taylor	2baf07727f	server : pass default --keep argument (#14120 )	2025-06-11 13:43:43 +03:00
Georgi Gerganov	7ae2932116	kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121 )	2025-06-11 12:52:45 +03:00
Jeff Bolz	1f7d50b293	vulkan: Track descriptor pools/sets per-context (#14109 ) Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8) and move it to the vk_device. Move all the descriptor pool and set tracking to the context - none of it is specific to pipelines anymore. It has a single vector of pools and vector of sets, and a single counter to track requests and a single counter to track use.	2025-06-11 07:19:25 +02:00
lhez	4c763c8d1b	opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003 )	2025-06-10 16:55:58 -07:00
compilade	dad5c44398	kv-cache : avoid modifying recurrent cells when setting inputs (#13834 ) * kv-cache : avoid modifying recurrent cells when setting inputs * kv-cache : remove inp_s_mask It was replaced with equivalent and simpler functionality with rs_z (the first zeroed state) and the already-existing inp_s_copy. * kv-cache : fix non-consecutive token pos warning for recurrent models The problem was apparently caused by how the tail cells were swapped. * graph : simplify logic for recurrent state copies * kv-cache : use cell without src refs for rs_z in recurrent cache * llama-graph : fix recurrent state copy The `state_copy` shuffle assumes everything is moved at once, which is not true when `states_extra` is copied back to the cache before copying the range of states between `head` and `head + n_seqs`. This is only a problem if any of the cells in [`head`, `head + n_seqs`) have an `src` in [`head + n_seqs`, `head + n_kv`), which does happen when `n_ubatch > 1` in the `llama-parallel` example. Changing the order of the operations avoids the potential overwrite before use, although when copies are avoided (like with Mamba2), this will require further changes. * llama-graph : rename n_state to state_size in build_recurrent_state This naming should reduce confusion between the state size and the number of states.	2025-06-10 18:20:14 -04:00

1 2 3 4 5 ...

8432 commits