koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-09 08:34:37 +00:00

Author	SHA1	Message	Date
Concedo	e9473305d0	wip2 (+1 squashed commits) Squashed commits: [4628777b6] wip	2025-07-12 18:54:40 +08:00
Concedo	c45b8dc56f	fix for gemma3n	2025-07-10 17:39:08 +08:00
Reithan	0097de5c57	improve performance by actually applying nsigma's masking (#1602 ) merging, please report any issues.	2025-07-07 15:41:46 +08:00
Concedo	2e14338455	additional padding for the swa kv cache itself	2025-06-28 15:52:48 +08:00
Concedo	815d2056d9	gentoken reservations	2025-06-28 09:16:20 +08:00
Concedo	39b0699c71	fixed savestates with drafting	2025-06-27 20:35:38 +08:00
Reithan	54dde5e565	Add memoized cache to `llama_grammar_reject_candidates_for_stack` (#1615 ) * Add memoized cache to llama_grammar_reject_candidates_for_stack * make size cutoff more aggressive and move to outer branch * update comment * add cache reset whenever grammar is reloaded * remove explicit reference types for compiler transportability	2025-06-25 19:22:19 +08:00
Concedo	65ff041827	added more perf stats	2025-06-21 12:12:28 +08:00
Reithan	f07434f4c1	streamline grammar sampler to speed up generation while using heavy grammar (#1606 )	2025-06-17 23:04:59 +08:00
Concedo	c494525b33	update deprecated apis	2025-06-13 22:21:15 +08:00
Reithan	f1c9db4174	fix-loss-of-destroyed-tokens-in-grammar-pre-pass (#1600 )	2025-06-13 18:46:38 +08:00
Concedo	5bac0fb3d5	remove debug prints for now, they were kind of cluttered	2025-06-13 16:00:23 +08:00
Reithan	5af9138ebe	Improve GNBF performance by attempting culled grammar search first (#1597 ) * cull tokens with top_3k first before running grammar, fallback to unculled if none found * fix errors * fix improvement and test against concedo's GBNF * revert non-culling changes	2025-06-13 15:57:27 +08:00
Concedo	1cbe716e45	allow setting maingpu	2025-06-12 17:53:43 +08:00
Concedo	f6bbc350f2	various qol fixes	2025-06-05 10:26:02 +08:00
Concedo	736030bb9f	save and load state upgraded to 3 available states	2025-06-04 22:09:40 +08:00
Concedo	53f1511396	use a static buffer for kv reloads instead. also, added into lite ui	2025-06-03 22:32:46 +08:00
Concedo	4b57108508	Save KV State and Load KV State to memory added. GUI not yet updated	2025-06-03 17:46:29 +08:00
Concedo	6ce85c54d6	not working correctly	2025-06-02 22:12:10 +08:00
Concedo	8e1ebc55b5	dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored	2025-06-02 12:49:53 +08:00
Concedo	51dc1cf920	added scale for text lora	2025-06-02 00:13:42 +08:00
Concedo	0c108f6054	Merge commit '`34b7c0439e`' into concedo_experimental # Conflicts: # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # scripts/sync-ggml.last # src/CMakeLists.txt # tools/mtmd/clip.cpp	2025-05-31 12:27:45 +08:00
Concedo	f97bbdde00	fix to allow all EOGs to trigger a stop, occam's glm4 fix,	2025-05-24 22:55:11 +08:00
Concedo	ec04115ae9	swa options now available	2025-05-24 11:50:37 +08:00
Concedo	c4df151298	experimental swa flag	2025-05-23 21:33:26 +08:00
Concedo	69b5d4d4af	cursed hack for glm4, may or may not be better	2025-05-22 22:40:37 +08:00
Concedo	f125e724eb	fix off-by-one npast during some instances of fast forwarding	2025-05-22 19:51:21 +08:00
Concedo	f10574e598	debug text	2025-05-22 14:22:01 +08:00
Concedo	9f976e9c65	swa full used unless ctx shift and fast forward disabled	2025-05-21 22:47:45 +08:00
Concedo	3fefb3bdf2	Merge commit '`f0adb80bf7`' into concedo_experimental # Conflicts: # docs/backend/CANN.md # docs/backend/SYCL.md # docs/docker.md # examples/sycl/run-llama2.sh # examples/sycl/win-run-llama2.bat # ggml/src/ggml-sycl/ggml-sycl.cpp # tools/llama-bench/README.md	2025-05-21 19:10:57 +08:00
Concedo	8b6dfbd1be	disabling the gMask prefix for glm-4 completions	2025-05-21 17:29:24 +08:00
Concedo	49305942ab	try disabling the gMask prefix for glm-4 completions	2025-05-21 16:47:08 +08:00
Concedo	5a499a5d2e	updated ltie, fixed multi clip skip and seeds not incrementing (+2 squashed commit) Squashed commit: [a9328e29a] fixed multi clip skip and seeds not incrementing [cad3aa9db] streamline some debug outputs	2025-05-19 23:59:58 +08:00
Concedo	6cafc0e73e	Merge commit '`71bdbdb587`' into concedo_experimental # Conflicts: # ggml/src/ggml-cpu/CMakeLists.txt # tools/batched-bench/batched-bench.cpp # tools/mtmd/clip.h	2025-05-16 15:25:15 +08:00
Concedo	35284bcdb5	glm4 clamp 8 on vk	2025-05-13 17:03:24 +08:00
Concedo	48f86bbbc7	tweaked text	2025-05-13 15:54:59 +08:00
Concedo	2819f784d4	use a threadpool, seems to improve tg performance	2025-05-12 18:06:10 +08:00
Concedo	ea2e5ed1e9	mmq debug log	2025-05-09 18:30:11 +08:00
Concedo	2439014a03	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # examples/embedding/embedding.cpp # tools/imatrix/imatrix.cpp # tools/perplexity/perplexity.cpp	2025-05-08 23:41:02 +08:00
Concedo	fa22c1a5a4	fixed cfg scale, but turns out it sucks. embedded aria2c into pyinstaller	2025-05-07 18:30:36 +08:00
Concedo	a5b6f372a3	cfg scale wip	2025-05-07 00:36:00 +08:00
Concedo	0fa435b2a6	Merge commit '`9b61acf060`' into concedo_experimental # Conflicts: # Makefile # docs/multimodal/MobileVLM.md # docs/multimodal/glmedge.md # docs/multimodal/llava.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # requirements/requirements-all.txt # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/android/adb_run.sh # tools/mtmd/android/build_64.sh # tools/mtmd/clip-quantize-cli.cpp	2025-05-06 23:34:21 +08:00
Concedo	38a8778f24	wip cfg scale	2025-05-06 23:06:25 +08:00
Concedo	13cee48740	embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [b9b695217] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [90b5d389d] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [fbbaa989f] embed aria2c for windows	2025-05-06 18:56:02 +08:00
Concedo	9981ba8427	glm4 special BOS handling	2025-05-06 16:41:55 +08:00
Concedo	f59b5eb561	added toggle for guidance	2025-05-05 22:21:46 +08:00
Concedo	5a2808ffaf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .flake8 # .github/labeler.yml # .github/workflows/bench.yml.disabled # .github/workflows/build-linux-cross.yml # .github/workflows/build.yml # .github/workflows/server.yml # .gitignore # CMakeLists.txt # CODEOWNERS # Makefile # README.md # SECURITY.md # build-xcframework.sh # ci/run.sh # docs/development/HOWTO-add-model.md # docs/multimodal/MobileVLM.md # docs/multimodal/glmedge.md # docs/multimodal/llava.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # examples/CMakeLists.txt # examples/pydantic_models_to_grammar_examples.py # grammars/README.md # pyrightconfig.json # requirements/requirements-all.txt # scripts/fetch_server_test_models.py # scripts/tool_bench.py # scripts/xxd.cmake # tests/CMakeLists.txt # tests/run-json-schema-to-grammar.mjs # tools/batched-bench/CMakeLists.txt # tools/batched-bench/README.md # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/CMakeLists.txt # tools/cvector-generator/README.md # tools/cvector-generator/completions.txt # tools/cvector-generator/cvector-generator.cpp # tools/cvector-generator/mean.hpp # tools/cvector-generator/negative.txt # tools/cvector-generator/pca.hpp # tools/cvector-generator/positive.txt # tools/export-lora/CMakeLists.txt # tools/export-lora/README.md # tools/export-lora/export-lora.cpp # tools/gguf-split/CMakeLists.txt # tools/gguf-split/README.md # tools/imatrix/CMakeLists.txt # tools/imatrix/README.md # tools/imatrix/imatrix.cpp # tools/llama-bench/CMakeLists.txt # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/llava/CMakeLists.txt # tools/llava/README.md # tools/llava/android/adb_run.sh # tools/llava/android/build_64.sh # tools/llava/clip-quantize-cli.cpp # tools/main/CMakeLists.txt # tools/main/README.md # tools/perplexity/CMakeLists.txt # tools/perplexity/README.md # tools/perplexity/perplexity.cpp # tools/quantize/CMakeLists.txt # tools/rpc/CMakeLists.txt # tools/rpc/README.md # tools/rpc/rpc-server.cpp # tools/run/CMakeLists.txt # tools/run/README.md # tools/run/linenoise.cpp/linenoise.cpp # tools/run/linenoise.cpp/linenoise.h # tools/run/run.cpp # tools/server/CMakeLists.txt # tools/server/README.md # tools/server/bench/README.md # tools/server/public_simplechat/readme.md # tools/server/tests/README.md # tools/server/themes/README.md # tools/server/themes/buttons-top/README.md # tools/server/themes/wild/README.md # tools/tokenize/CMakeLists.txt # tools/tokenize/tokenize.cpp	2025-05-03 12:15:36 +08:00
Concedo	5d382970ec	glm4 unclamp for all except vulkan	2025-04-30 17:19:38 +08:00
Concedo	9fdec02914	unclamp glm4 in debug	2025-04-30 14:49:52 +08:00
Concedo	c2802af9e8	fix qwen3, fixed sd, fixed glm4	2025-04-29 20:50:46 +08:00

1 2 3 4 5 ...

418 commits