koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-07 00:41:50 +00:00

Author	SHA1	Message	Date
CasualAutopsy	7703bed260	Temp: Fix Needlessly Iterating on Candidates During Greedy Sampling (#1854 )	2025-11-22 16:06:50 +08:00
Concedo	8631bbcee3	linting	2025-11-18 18:56:31 +08:00
LostRuins Concedo	7aea1d7c02	clean up unused llava functions, fix qwen3vl loading	2025-11-18 10:34:55 +08:00
LostRuins Concedo	281542aa0d	add smoothing curve, not tested	2025-11-17 23:07:35 +08:00
LostRuins Concedo	3fe0e39b62	Merge commit '`4dca015b7e`' into concedo_experimental # Conflicts: # .github/copilot-instructions.md # README.md # docs/ops.md # docs/ops/CPU.csv # docs/ops/CUDA.csv # docs/ops/Vulkan.csv # ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp # src/CMakeLists.txt # tests/test-backend-ops.cpp	2025-11-16 18:33:58 +08:00
LostRuins Concedo	86f907272a	relocated shader compile warning	2025-11-15 23:17:47 +08:00
LostRuins Concedo	d6a2ad8455	still not really working right	2025-11-09 01:57:48 +08:00
LostRuins Concedo	cfb22b5c9d	rename a missed BLAS -> batch	2025-11-06 16:11:26 +08:00
Concedo	0891b0752d	qwen3vl fixed (+2 squashed commit) Squashed commit: [89f65ed0c] wip fixing q3vl [6fa34cff2] wip fixing q3vl	2025-10-31 17:52:33 +08:00
Concedo	57e1d9c822	rename blasbatchsize to batchsize	2025-10-24 18:16:54 +08:00
Concedo	68c9d955d2	support multiple override kv	2025-10-24 17:28:54 +08:00
Concedo	e92f9fd422	cursed hack for RNN models	2025-10-11 23:14:55 +08:00
Concedo	3b30f12ca7	future proof handling of rnn models	2025-10-07 19:12:47 +08:00
Concedo	5d89a48a50	add more rnn models supported	2025-09-24 18:14:59 +08:00
Concedo	7e35954695	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/build.md # docs/function-calling.md # examples/eval-callback/eval-callback.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/kleidiai/kernels.cpp # ggml/src/ggml-cpu/kleidiai/kernels.h # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # scripts/compare-llama-bench.py # scripts/server-bench.py # scripts/tool_bench.py # tests/test-chat.cpp # tools/batched-bench/batched-bench.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-08-31 23:33:36 +08:00
Concedo	3210b378e8	better tool calls	2025-08-20 22:11:31 +08:00
Concedo	5a921a40f9	add overridenativecontext flag, stop nagging me	2025-08-14 22:54:45 +08:00
Concedo	4c1faf61b2	increment version (+1 squashed commits) Squashed commits: [6e5080ad2] increment version	2025-08-09 20:53:26 +08:00
Concedo	338b1fe97e	readjusted mistral and oai template, fixed compile issue on termux, updated lite, show generated token ids in debug mode	2025-08-07 21:14:48 +08:00
Concedo	34487d3c02	gpt oss harmony template	2025-08-06 11:39:40 +08:00
Concedo	e40d26b9e7	allow offloading moe to cpu with --moecpu	2025-08-05 23:42:42 +08:00
Concedo	428a07416a	cleanup some debug	2025-08-05 00:07:22 +08:00
Concedo	3284757b56	voxstral mini is really bad	2025-07-29 21:22:17 +08:00
Concedo	abf527a207	clearer multimodal capability display	2025-07-28 22:54:49 +08:00
Concedo	12a6088a65	added voxtral support, however without the magic token it hears audio as text	2025-07-28 22:35:59 +08:00
Concedo	b87864144b	no ctx shift for all mrope	2025-07-25 13:53:20 +08:00
Concedo	9f4d0f6ccf	fixed swa pp bug by retrying smaller batches	2025-07-21 23:34:22 +08:00
Concedo	6d50def409	default kv_unified to true, handle LLAMA_SET_ROWS.	2025-07-21 16:13:20 +08:00
Concedo	b028dd4e84	minor fixes	2025-07-18 13:22:59 +08:00
Concedo	f0564f9caf	updated lite, added better separators for multimodal chunks (universal)	2025-07-17 00:11:08 +08:00
Concedo	bc2877d2fe	test without g3n fix	2025-07-13 23:42:59 +08:00
Concedo	811463a704	split audio and vision detection separately	2025-07-13 17:47:15 +08:00
Concedo	dca49de059	fixed qwen2 audio issues, works fine now (+3 squashed commit) Squashed commit: [b3053a1ba] updated lite [5071630d6] fixed mtmd issues, audio works [06efa5af4] fix mtmd compile	2025-07-12 18:54:41 +08:00
Concedo	5a3b2e3921	fix for jamba models - they have recurrent layers like rwkv, so context shifting and forwarding wont work on them.	2025-07-12 18:54:40 +08:00
Concedo	e9473305d0	wip2 (+1 squashed commits) Squashed commits: [4628777b6] wip	2025-07-12 18:54:40 +08:00
Concedo	c45b8dc56f	fix for gemma3n	2025-07-10 17:39:08 +08:00
Reithan	0097de5c57	improve performance by actually applying nsigma's masking (#1602 ) merging, please report any issues.	2025-07-07 15:41:46 +08:00
Concedo	2e14338455	additional padding for the swa kv cache itself	2025-06-28 15:52:48 +08:00
Concedo	815d2056d9	gentoken reservations	2025-06-28 09:16:20 +08:00
Concedo	39b0699c71	fixed savestates with drafting	2025-06-27 20:35:38 +08:00
Reithan	54dde5e565	Add memoized cache to `llama_grammar_reject_candidates_for_stack` (#1615 ) * Add memoized cache to llama_grammar_reject_candidates_for_stack * make size cutoff more aggressive and move to outer branch * update comment * add cache reset whenever grammar is reloaded * remove explicit reference types for compiler transportability	2025-06-25 19:22:19 +08:00
Concedo	65ff041827	added more perf stats	2025-06-21 12:12:28 +08:00
Reithan	f07434f4c1	streamline grammar sampler to speed up generation while using heavy grammar (#1606 )	2025-06-17 23:04:59 +08:00
Concedo	c494525b33	update deprecated apis	2025-06-13 22:21:15 +08:00
Reithan	f1c9db4174	fix-loss-of-destroyed-tokens-in-grammar-pre-pass (#1600 )	2025-06-13 18:46:38 +08:00
Concedo	5bac0fb3d5	remove debug prints for now, they were kind of cluttered	2025-06-13 16:00:23 +08:00
Reithan	5af9138ebe	Improve GNBF performance by attempting culled grammar search first (#1597 ) * cull tokens with top_3k first before running grammar, fallback to unculled if none found * fix errors * fix improvement and test against concedo's GBNF * revert non-culling changes	2025-06-13 15:57:27 +08:00
Concedo	1cbe716e45	allow setting maingpu	2025-06-12 17:53:43 +08:00
Concedo	f6bbc350f2	various qol fixes	2025-06-05 10:26:02 +08:00
Concedo	736030bb9f	save and load state upgraded to 3 available states	2025-06-04 22:09:40 +08:00

1 2 3 4 5 ...

452 commits