koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 04:00:53 +00:00

Author	SHA1	Message	Date
Concedo	69e4a32ca2	Merge commit '`d4e0d95cf5`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # common/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/sync-ggml.last # tests/CMakeLists.txt	2025-06-14 01:58:53 +08:00
Concedo	33809c9e82	doing what i must because i can, after the mess that is https://github.com/ggml-org/llama.cpp/pull/13892 there is so much duplicate code in each cpu arch, i expect upstream will prune it eventually arch detection has no fallback if all the arches are not found, by right we should set GGML_CPU_GENERIC i should be relaxing its the weekend	2025-06-14 01:41:16 +08:00
Concedo	f50c793140	not working - refactoring	2025-06-14 00:03:21 +08:00
Concedo	c494525b33	update deprecated apis	2025-06-13 22:21:15 +08:00
Concedo	4204f111f7	Merge commit '`8f47e25f56`' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-linux-cross.yml # docs/backend/CANN.md # examples/batched.swift/Sources/main.swift # examples/embedding/embedding.cpp # examples/gritlm/gritlm.cpp # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.swiftui/llama.cpp.swift/LibLlama.swift # examples/lookahead/lookahead.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/simple-chat/simple-chat.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/imatrix.cpp # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp # tools/run/run.cpp	2025-06-13 22:05:03 +08:00
Wagner Bruna	f6d2d1ce5c	configurable resolution limit (#1586 ) * refactor image gen configuration screen * make image size limit configurable * fix resolution limits and keep dimensions closer to the original ratio * use 0.0 for the configured default image size limit This prevents the current default value from being saved into the config files, in case we later decide to adopt a different value. * export image model version when loading * restore model-specific default image size limit * change the image area restriction to be specified by a square side * move image resolution limits down to the C++ level * Revert "export image model version when loading" This reverts commit `fa65b23de3`. * Linting Fixes: PY: - Inconsistent var name sd_restrict_square -> sd_restrict_square_var - GUI swap back to using absolute row numbers for now. - fstring fix - size_limit -> side_limit inconsistency C++: - roundup_64 standalone function - refactor sd_fix_resolution variable names for clarity - move "anti crashing" hard total megapixel limit always to be applied after soft total megapixel limit instead of conditionally only when sd_restrict_square is unset * allow unsafe resolutions if debugmode is on --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2025-06-13 20:05:20 +08:00
Reithan	f1c9db4174	fix-loss-of-destroyed-tokens-in-grammar-pre-pass (#1600 )	2025-06-13 18:46:38 +08:00
Concedo	5bac0fb3d5	remove debug prints for now, they were kind of cluttered	2025-06-13 16:00:23 +08:00
Reithan	5af9138ebe	Improve GNBF performance by attempting culled grammar search first (#1597 ) * cull tokens with top_3k first before running grammar, fallback to unculled if none found * fix errors * fix improvement and test against concedo's GBNF * revert non-culling changes	2025-06-13 15:57:27 +08:00
Concedo	1cbe716e45	allow setting maingpu	2025-06-12 17:53:43 +08:00
Concedo	7a688e07cd	remove gfx12 until amd wakes up	2025-06-12 16:52:55 +08:00
Concedo	1970d8c9e8	uvos said it might work	2025-06-12 16:44:46 +08:00
Sigbjørn Skjæret	d4e0d95cf5	chore : clean up relative source dir paths (#14128 )	2025-06-11 19:04:23 +02:00
Sigbjørn Skjæret	cc66a7f78f	tests : add test-tokenizers-repo (#14017 )	2025-06-11 17:16:32 +02:00
Jeff Bolz	bd248d4dc7	vulkan: Better thread-safety for command pools/buffers (#14116 ) This change moves the command pool/buffer tracking into a vk_command_pool structure. There are two instances per context (for compute+transfer) and two instances per device for operations that don't go through a context. This should prevent separate contexts from stomping on each other.	2025-06-11 09:48:52 -05:00
Aman	7781e5fe99	webui: Wrap long numbers instead of infinite horizontal scroll (#14062 ) * webui: Wrap long numbers instead of infinite horizontal scroll * Use tailwind class * update index.html.gz	2025-06-11 16:42:25 +02:00
Georgi Gerganov	89a184fa71	kv-cache : relax SWA masking condition (#14119 ) ggml-ci	2025-06-11 16:48:45 +03:00
Taylor	2baf07727f	server : pass default --keep argument (#14120 )	2025-06-11 13:43:43 +03:00
Georgi Gerganov	7ae2932116	kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121 )	2025-06-11 12:52:45 +03:00
Jeff Bolz	1f7d50b293	vulkan: Track descriptor pools/sets per-context (#14109 ) Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8) and move it to the vk_device. Move all the descriptor pool and set tracking to the context - none of it is specific to pipelines anymore. It has a single vector of pools and vector of sets, and a single counter to track requests and a single counter to track use.	2025-06-11 07:19:25 +02:00
lhez	4c763c8d1b	opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003 )	2025-06-10 16:55:58 -07:00
compilade	dad5c44398	kv-cache : avoid modifying recurrent cells when setting inputs (#13834 ) * kv-cache : avoid modifying recurrent cells when setting inputs * kv-cache : remove inp_s_mask It was replaced with equivalent and simpler functionality with rs_z (the first zeroed state) and the already-existing inp_s_copy. * kv-cache : fix non-consecutive token pos warning for recurrent models The problem was apparently caused by how the tail cells were swapped. * graph : simplify logic for recurrent state copies * kv-cache : use cell without src refs for rs_z in recurrent cache * llama-graph : fix recurrent state copy The `state_copy` shuffle assumes everything is moved at once, which is not true when `states_extra` is copied back to the cache before copying the range of states between `head` and `head + n_seqs`. This is only a problem if any of the cells in [`head`, `head + n_seqs`) have an `src` in [`head + n_seqs`, `head + n_kv`), which does happen when `n_ubatch > 1` in the `llama-parallel` example. Changing the order of the operations avoids the potential overwrite before use, although when copies are avoided (like with Mamba2), this will require further changes. * llama-graph : rename n_state to state_size in build_recurrent_state This naming should reduce confusion between the state size and the number of states.	2025-06-10 18:20:14 -04:00
Sigbjørn Skjæret	55f6b9fa65	convert : fix duplicate key DeepSeek-R1 conversion error (#14103 )	2025-06-10 23:29:52 +02:00
Concedo	5cdb2d3fc6	cleanup	2025-06-11 01:35:40 +08:00
Sigbjørn Skjæret	3678b838bb	llama : support GEGLU for jina-bert-v2 (#14090 )	2025-06-10 18:02:08 +02:00
Jeff Bolz	652b70e667	vulkan: force device 0 in CI (#14106 )	2025-06-10 10:53:47 -05:00
Juk Armstrong	3a12db23b6	Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104 )	2025-06-10 16:48:07 +01:00
Georgi Gerganov	ae92c1855b	sync : ggml ggml-ci	2025-06-10 18:39:33 +03:00
Georgi Gerganov	b7ce1ad1e3	ggml : fix weak alias win32 (whisper/0) ggml-ci	2025-06-10 18:39:33 +03:00
henk717	f151648f03	Pyinstaller launcher and dependency updates This PR adds a new launcher executable to the unpack feature, eliminating the need to have python and its dependencies in the unpacked version. It also does a few dependency changes to help future proof.	2025-06-10 23:08:02 +08:00
Concedo	8386546e08	Switched VS2019 for revert cu12.1 build, hopefully solves dll issues try change order (+3 squashed commit) Squashed commit: [457f02507] try newer jimver [`64af28862`] windows pyinstaller shim. the final loader will be moved into the packed directory later. [`0272ecf2d`] try alternative way of getting cuda toolkit 12.4 since jimver wont work, also fix rocm try again (+3 squashed commit) Squashed commit: [133e81633] try without pwsh [4d99cefba] try without pwsh [bdfa91e7d] try alternative way of getting cuda toolkit 12.4, also fix rocm	2025-06-10 23:08:02 +08:00
0cc4m	97340b4c99	Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099 )	2025-06-10 13:01:33 +01:00
Isaac McFadyen	2bb0467043	rpc : nicer error messages for RPC server crash (#14076 )	2025-06-10 09:41:01 +03:00
Georgi Gerganov	b8e2194efc	sync : ggml ggml-ci	2025-06-10 09:21:56 +03:00
Kai Pastor	1a3b5e80f7	Add in-build ggml::ggml ALIAS library (ggml/1260) Enable uniform linking with subproject and with find_package.	2025-06-10 09:21:56 +03:00
Georgi Gerganov	1f63e75f3b	metal : use less stack memory in FA kernel (#14088 ) * metal : use less stack memory in FA kernel ggml-ci * cont : fix BF16 variant	2025-06-09 23:05:02 +03:00
Georgi Gerganov	40cbf571c9	kv-cache : fix shift and defrag logic (#14081 ) * kv-cache : fix shift ggml-ci * cont : reset shift[i] ggml-ci * cont : fix defrag erasing cells that didn't move ggml-ci	2025-06-09 23:04:35 +03:00
Diego Devesa	7f4fbe5183	llama : allow building all tests on windows when not using shared libs (#13980 ) * llama : allow building all tests on windows when not using shared libraries * add static windows build to ci * tests : enable debug logs for test-chat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-09 20:03:09 +02:00
Concedo	28b35ca879	allow wmma flag for rocm	2025-06-10 01:23:48 +08:00
Concedo	7d8aa31f1f	fixed embeddings, added new parameter to limit max embeddings context	2025-06-10 01:11:55 +08:00
xctan	f470bc36be	ggml-cpu : split arch-specific implementations (#13892 ) * move ggml-cpu-aarch64 to repack * split quantize_row_q8_0/1 * split helper functions * split ggml_vec_dot_q4_0_q8_0 * split ggml_vec_dot_q4_1_q8_1 * split ggml_vec_dot_q5_0_q8_0 * split ggml_vec_dot_q5_1_q8_1 * split ggml_vec_dot_q8_0_q8_0 * split ggml_vec_dot_tq1_0_q8_K * split ggml_vec_dot_tq2_0_q8_K * split ggml_vec_dot_q2_K_q8_K * split ggml_vec_dot_q3_K_q8_K * split ggml_vec_dot_q4_K_q8_K * split ggml_vec_dot_q5_K_q8_K * split ggml_vec_dot_q6_K_q8_K * split ggml_vec_dot_iq2_xxs_q8_K * split ggml_vec_dot_iq2_xs_q8_K * split ggml_vec_dot_iq2_s_q8_K * split ggml_vec_dot_iq3_xxs_q8_K * split ggml_vec_dot_iq3_s_q8_K * split ggml_vec_dot_iq1_s_q8_K * split ggml_vec_dot_iq1_m_q8_K * split ggml_vec_dot_iq4_nl_q8_0 * split ggml_vec_dot_iq4_xs_q8_K * fix typos * fix missing prototypes * rename ggml-cpu-quants.c * rename ggml-cpu-traits * rename arm folder * move cpu-feats-x86.cpp * rename ggml-cpu-hbm * update arm detection macro in quants.c * move iq quant tables * split ggml_quantize_mat_q8_0/K * split ggml_gemv_* * split ggml_gemm_* * rename namespace aarch64 to repack * use weak aliases to replace test macros * rename GGML_CPU_AARCH64 to GGML_CPU_REPACK * rename more aarch64 to repack * clean up rebase leftover * fix compilation errors * remove trailing spaces * try to fix clang compilation errors * try to fix clang compilation errors again * try to fix clang compilation errors, 3rd attempt * try to fix clang compilation errors, 4th attempt * try to fix clang compilation errors, 5th attempt * try to fix clang compilation errors, 6th attempt * try to fix clang compilation errors, 7th attempt * try to fix clang compilation errors, 8th attempt * try to fix clang compilation errors, 9th attempt * more cleanup * fix compilation errors * fix apple targets * fix a typo in arm version of ggml_vec_dot_q4_K_q8_K Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-09 16:47:13 +02:00
Diego Devesa	8f47e25f56	cuda : fix device sync on buffer clear (#14033 )	2025-06-09 16:36:26 +02:00
Georgi Gerganov	201b31dc2e	graph : fix geglu (#14077 ) ggml-ci	2025-06-09 17:17:31 +03:00
Xinpeng Dou	e21d2d4ae2	CANN: Simplify the environment variable setting(#13104 ) * Simplify the environment variable setting to specify the memory pool type. * Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options. * update * fix CI * update * delete whitespace * fix according to review * update CANN.md * update CANN.md	2025-06-09 19:47:39 +08:00
R0CKSTAR	dc0623fddb	webui: fix sidebar being covered by main content (#14082 ) * webui: fix sidebar being covered by main content Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * webui: update index.html.gz Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-06-09 12:01:17 +02:00
Georgi Gerganov	87d34b381d	server : fix LRU check (#14079 ) ggml-ci	2025-06-09 12:57:58 +03:00
Concedo	8780b33c64	consolidate imports	2025-06-09 17:48:54 +08:00
Nicolò Scipione	b460d16ae8	sycl: Add reorder to Q6_K mmvq implementation (#13885 ) * Add Reorder to Q6_K mmvq implementation * Address PR comments: clean up comments * Remove unused parameter after refactoring q4_k * Adding inline to function and removing unnecessary reference to int --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-06-09 11:47:07 +02:00
Concedo	deece4be69	missed a build target	2025-06-09 17:05:56 +08:00
Concedo	68ec00909b	updated lite (+1 squashed commits) Squashed commits: [375c5768b] updated lite	2025-06-09 16:33:42 +08:00

1 2 3 4 5 ...

8404 commits