koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-09 19:46:11 +00:00

Author	SHA1	Message	Date
Reithan	f1c9db4174	fix-loss-of-destroyed-tokens-in-grammar-pre-pass (#1600 )	2025-06-13 18:46:38 +08:00
Concedo	5bac0fb3d5	remove debug prints for now, they were kind of cluttered	2025-06-13 16:00:23 +08:00
Reithan	5af9138ebe	Improve GNBF performance by attempting culled grammar search first (#1597 ) * cull tokens with top_3k first before running grammar, fallback to unculled if none found * fix errors * fix improvement and test against concedo's GBNF * revert non-culling changes	2025-06-13 15:57:27 +08:00
Concedo	1cbe716e45	allow setting maingpu	2025-06-12 17:53:43 +08:00
Concedo	7a688e07cd	remove gfx12 until amd wakes up	2025-06-12 16:52:55 +08:00
Concedo	1970d8c9e8	uvos said it might work	2025-06-12 16:44:46 +08:00
Concedo	5cdb2d3fc6	cleanup	2025-06-11 01:35:40 +08:00
henk717	f151648f03	Pyinstaller launcher and dependency updates This PR adds a new launcher executable to the unpack feature, eliminating the need to have python and its dependencies in the unpacked version. It also does a few dependency changes to help future proof.	2025-06-10 23:08:02 +08:00
Concedo	8386546e08	Switched VS2019 for revert cu12.1 build, hopefully solves dll issues try change order (+3 squashed commit) Squashed commit: [457f02507] try newer jimver [`64af28862`] windows pyinstaller shim. the final loader will be moved into the packed directory later. [`0272ecf2d`] try alternative way of getting cuda toolkit 12.4 since jimver wont work, also fix rocm try again (+3 squashed commit) Squashed commit: [133e81633] try without pwsh [4d99cefba] try without pwsh [bdfa91e7d] try alternative way of getting cuda toolkit 12.4, also fix rocm	2025-06-10 23:08:02 +08:00
Concedo	28b35ca879	allow wmma flag for rocm	2025-06-10 01:23:48 +08:00
Concedo	7d8aa31f1f	fixed embeddings, added new parameter to limit max embeddings context	2025-06-10 01:11:55 +08:00
Concedo	8780b33c64	consolidate imports	2025-06-09 17:48:54 +08:00
Concedo	deece4be69	missed a build target	2025-06-09 17:05:56 +08:00
Concedo	68ec00909b	updated lite (+1 squashed commits) Squashed commits: [375c5768b] updated lite	2025-06-09 16:33:42 +08:00
Concedo	82d7c53b85	embeddings handle base64	2025-06-09 00:26:40 +08:00
Concedo	7de88802f9	revert padding change for sd chroma	2025-06-08 23:48:46 +08:00
Concedo	1cf7648305	fixed adapter	2025-06-08 23:24:11 +08:00
Concedo	771bd7197b	updated lite (+1 squashed commits) Squashed commits: [907f10f2f] updated lite	2025-06-08 23:22:26 +08:00
Concedo	6c5c8be48d	try to make rocm work for the github ci, requires disabling rocwmma	2025-06-08 21:52:29 +08:00
Concedo	7f57846c2f	update bundled vcrts	2025-06-08 19:39:42 +08:00
Concedo	2d4c1aa5a0	chroma support is now usable	2025-06-08 18:53:59 +08:00
Concedo	30cf433ab4	merge base support for chroma, however its not working correctly	2025-06-08 18:06:23 +08:00
Concedo	dcf88d6e78	Revert "make tts use gpu by default. use --ttscpu to disable" This reverts commit `669f80265b`.	2025-06-08 17:08:04 +08:00
Concedo	669f80265b	make tts use gpu by default. use --ttscpu to disable	2025-06-08 17:06:19 +08:00
Concedo	7132d6b15c	test rocm rolling (+1 squashed commits) Squashed commits: [43c8f7fc6] test rocm rolling (+4 squashed commit) Squashed commit: [16a60aa77] test clobber 4 [a6c866450] test clobber 3 [9322f17f6] test clobber 2 [b7a420cbe] testing clobber	2025-06-08 15:33:05 +08:00
henk717	5d8f499f03	Remove 32GB of rocm dependencies with this one special trick (#1585 ) * One file to remove them all * That one lib wasn't versioned	2025-06-08 11:16:15 +08:00
Concedo	a80dfa5c10	various minor fixes	2025-06-08 01:11:42 +08:00
Concedo	301450b1eb	attempt to use system glslc first before using bundled glslc	2025-06-07 16:54:25 +08:00
Concedo	38ce7e06cc	updated readme	2025-06-07 10:23:41 +08:00
Concedo	cfcdfd69bd	allow embeddings models to use mmap	2025-06-07 10:14:00 +08:00
Concedo	abc272d89f	breaking change: standardize ci binary names	2025-06-07 00:40:46 +08:00
Concedo	6effb65cfe	change singleinstance order	2025-06-06 21:20:30 +08:00
Concedo	d18938fc70	fixed build	2025-06-06 18:05:44 +08:00
Concedo	d33c88b1f4	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # ci/run.sh # examples/embedding/embedding.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # src/CMakeLists.txt	2025-06-06 17:56:51 +08:00
Concedo	2b5d8e467b	updated lite	2025-06-06 17:49:56 +08:00
Concedo	740f91e3fd	lower aria interval	2025-06-06 17:43:38 +08:00
Concedo	8b141d8647	stick to cu12.1 for linux for now	2025-06-06 17:38:28 +08:00
Sigbjørn Skjæret	d17a809ef0	llama : support multiple classifier outputs and labels (#13940 )	2025-06-06 09:03:25 +02:00
Concedo	9cf32e5fee	step limits over adapter for sd	2025-06-06 14:12:43 +08:00
Concedo	5f38594dc0	remove debug prints	2025-06-06 14:08:57 +08:00
Concedo	ca99f79ea9	cu11 just always stick to wmma	2025-06-06 14:02:34 +08:00
Concedo	eec5a8ad16	breaking change: due to cuda12 upgrade, release filenames will change. standardize them to windows naming for the future. (+1 squashed commits) Squashed commits: [75842919a] cuda12.4 test	2025-06-06 14:02:34 +08:00
Concedo	50a27793d3	upgrade windows runners to windows 2022, cu11 still uses vs2019 this should finally work (+21 squashed commit) Squashed commit: [5edac5b59] Revert "quick dbg" This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f. [fd62a997c] quick dbg [bcccae7e6] sanity check 2 [568e2eb08] sanity check [2f30d573a] please work 2 [cf8765221] please work [c535e60d9] try a small trick [d4ba79b80] 2022 test [3f146b000] t2 [4a3b9a9b4] revert and test [4bdc9a149] reverted test2 [5081cb4a3] reverted test [ea9a826f3] broken test [3c11ae389] compare 2019 [8ecec4fec] not for cu12 [0be964f3a] added vs2019 for the other runners [5d24641cb] debugging 4 [1dee79207] debugging 3 [ab172f133] more debugging 2 [b1a895e84] more debugging [5d21d8bd0] vs2019 setup	2025-06-06 14:02:34 +08:00
Sigbjørn Skjæret	1caae7fc6c	gguf-py : add add_classifier_output_labels method to writer (#14031 ) * add add_classifier_output_labels * use add_classifier_output_labels	2025-06-05 17:42:31 +02:00
Masato Nakasaka	669c13e0f6	vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001 ) * allowing B580 and U9-288V * experimenting code to detect Xe2 * allowing coopmat only for Xe2 GPUs * fixed comment wording * fixed comment wording * removed unnecessary driver check	2025-06-05 16:00:29 +02:00
pockers21	146b88e8b3	ci: fix CUDA build failure on autodl cloud machines (#14005 ) Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection as 'native' fails on autodl cloud environments. Co-authored-by: pockers21 <liyang2@uniontech.com>	2025-06-05 16:25:29 +03:00
Georgi Gerganov	7f37b6cf1e	memory : migrate from llama_kv_cache to more generic llama_memory (#14006 ) * memory : merge llama_kv_cache into llama_memory + new `llama_memory` API ggml-ci * context : fix casts ggml-ci	2025-06-05 15:29:22 +03:00
Diego Devesa	3a077146a4	llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013 )	2025-06-05 11:57:42 +02:00
Olexandr88	d01d112abb	readme : add badge (#13938 )	2025-06-05 10:50:55 +03:00
Sigbjørn Skjæret	9f47fa5792	vocab : warn about missing mask token (#14022 )	2025-06-05 09:29:18 +02:00

1 2 3 4 5 ...

8360 commits