koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-09 16:44:35 +00:00

Author	SHA1	Message	Date
Concedo	e84596ec1a	add config for default gen tokens and bos toggle	2025-03-15 19:53:06 +08:00
Concedo	4212f0b8e8	wip on multiple fixes	2025-03-15 10:50:36 +08:00
Concedo	6a1dd57435	gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3	2025-03-14 17:47:01 +08:00
Concedo	0db4ae6237	traded my ink for a pen	2025-03-14 11:58:15 +08:00
Concedo	52cf1ded0c	remove unwanted print	2025-03-14 00:24:28 +08:00
Concedo	0460d92cc3	disable context shifting for gemma3	2025-03-13 20:28:26 +08:00
Concedo	e75539e8cb	too many issues without BOS (+1 squashed commits) Squashed commits: [7138d941] only print bos alert in debug	2025-03-13 16:48:29 +08:00
Concedo	1ef41c2124	streamline output console log (+1 squashed commits) Squashed commits: [ca474bdd] streamline output console log	2025-03-13 15:33:49 +08:00
Concedo	77debb1b1b	gemma3 vision works, but is using more tokens than expected - may need resizing	2025-03-13 00:31:16 +08:00
Concedo	eb1809c105	add more perf stats	2025-03-12 18:58:27 +08:00
Concedo	b0541f3652	added draft results	2025-03-10 22:03:20 +08:00
Concedo	72bc855e8a	honor add bos token settings from metadata	2025-03-07 22:10:50 +08:00
Concedo	6b7d2349a7	Rewrite history to fix bad vulkan shader commits without increasing repo size added dpe colab (+8 squashed commit) Squashed commit: [b8362da4] updated lite [ed6c037d] move nsigma into the regular sampler stack [ac5f61c6] relative filepath fixed [05fe96ab] export template [ed0a5a3e] nix_example.md: refactor (#1401) * nix_example.md: add override example * nix_example.md: drop graphics example, already basic nixos knowledge * nix_example.md: format * nix_example.md: Vulkan is disabled on macOS Disabled in: `1ccd253acc` * nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities} Fixes: https://github.com/LostRuins/koboldcpp/issues/1367 [675c62f7] AutoGuess: Phi 4 (mini) (#1402) [`4bf56982`] phrasing [`b8c0df04`] Add Rep Pen to Top N Sigma sampler chain (#1397) - place after nsigma and before xtc (+3 squashed commit) Squashed commit: [`87c52b97`] disable VMM from HIP [`ee8906f3`] edit description [`e85c0e69`] Remove Unnecessary Rep Counting (#1394) * stop counting reps * fix range-based initializer * strike that - reverse it	2025-03-05 00:02:20 +08:00
Reithan	62cd9bb0b2	use range neq zero instead of lt (#1388 )	2025-02-24 18:47:19 +08:00
Concedo	f2ac10c014	added nsigma to lite	2025-02-21 15:11:24 +08:00
EquinoxPsychosis	2740af3660	add top n sigma sampler from llama.cpp (#1384 ) * Add N Sigma Sampler * update nsigma sampler chain * xtc position fix * remove stray newline --------- Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>	2025-02-21 14:31:42 +08:00
Concedo	6d7ef10671	Merge branch 'upstream' into concedo_experimental Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902 # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # .gitignore # CONTRIBUTING.md # Makefile # common/CMakeLists.txt # common/arg.cpp # common/common.cpp # examples/main/main.cpp # examples/run/run.cpp # examples/server/tests/README.md # ggml/src/ggml-cuda/mma.cuh # scripts/get_chat_template.py # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-chat.cpp	2025-02-20 23:17:20 +08:00
Concedo	b162c25a5e	fixed moe experts to use detected arch for key	2025-02-10 17:46:08 +08:00
Concedo	d22eca6c47	fix potential crash in autoguess	2025-02-09 12:33:28 +08:00
Concedo	e68a3cf1dc	fixed some functions when no model is loaded	2025-02-08 11:15:26 +08:00
Concedo	8fef9f3fb5	reloading is working correctly.	2025-02-06 22:24:18 +08:00
Concedo	fd84b062f9	allow reuse of clip embds	2025-01-30 19:02:45 +08:00
Concedo	f4e2f4b069	disable context shift when using mrope	2025-01-30 00:36:05 +08:00
Concedo	70f1d8d746	vision can set max res (+1 squashed commits) Squashed commits: [938fc655] vision can set max res	2025-01-30 00:19:49 +08:00
Concedo	0e45d3bb7a	quiet flags now set at load time	2025-01-25 16:46:56 +08:00
Concedo	cca4a934dd	fix for chat templates and drafting	2025-01-23 11:49:40 +08:00
Concedo	0e74db7fd4	fixed another tts bug, clblast selection and quiet mode	2025-01-22 21:36:13 +08:00
Concedo	2a00ee8fa8	broken commit	2025-01-16 21:41:18 +08:00
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Nexes the Elder	3e6ef8e0ef	Probable typo (#1287 )	2024-12-26 11:51:04 +08:00
Concedo	10d4fc637d	fixed a bug with drafting tokens	2024-12-23 11:36:08 +08:00
Concedo	fd5100c382	fix for query param	2024-12-21 10:41:25 +08:00
Concedo	4c56b7cada	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/gbnf-validator/gbnf-validator.cpp # examples/llava/clip.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # ggml/src/ggml-cpu/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-12-21 09:41:49 +08:00
Concedo	b7d3274523	temporarily make qwenv2l use clip on cpu for vulkan and macos	2024-12-21 09:15:31 +08:00
Concedo	bc297da91e	remove unused function	2024-12-16 11:39:52 +08:00
Concedo	00d154b32b	wip on qwen2vl integration, updated msvc runtimes	2024-12-15 23:58:02 +08:00
Concedo	60cd68a39d	draft model sets gpu split instead of id, made mmq default for cli	2024-12-14 23:58:45 +08:00
Concedo	595cc6975f	added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid	2024-12-13 17:11:59 +08:00
Concedo	00a686fc72	fixed fast forwarding context corruption after abort during prompt processing	2024-12-10 22:37:40 +08:00
Concedo	5106816eac	drafted tokens debug prints	2024-12-05 17:05:20 +08:00
Concedo	e93c2427b4	allow incompatible vocab in debugmode	2024-12-01 14:11:03 +08:00
Concedo	32ac3153e4	default speculative set to 8. added more adapter fields	2024-11-30 16:18:27 +08:00
Concedo	e0c59486ee	default to 12 tokens drafted	2024-11-30 11:52:07 +08:00
Concedo	b21d0fe3ac	customizable speculative size	2024-11-30 11:28:19 +08:00
Concedo	f75bbb945f	speculative decoding initial impl completed (+6 squashed commit) Squashed commit: [0a6306ca0] draft wip dont use (will be squashed) [a758a1c9c] wip dont use (will be squashed) [e1994d3ce] wip dont use [f59690d68] wip [77228147d] wip on spec decoding. dont use yet [2445bca54] wip adding speculative decoding (+1 squashed commits) Squashed commits: [50e341bb7] wip adding speculative decoding	2024-11-30 10:41:10 +08:00
Concedo	b9e99c69e8	fixed build	2024-11-26 22:06:55 +08:00
Concedo	62dde8cfb2	ollama sync completions mostly working. stupid api.	2024-11-23 23:31:37 +08:00
Concedo	2c1a06a07d	wip ollama emulation, added detokenize endpoint	2024-11-23 22:48:03 +08:00
Concedo	1dd37933e3	fixed grammar not resetting correctly	2024-11-23 09:55:12 +08:00
kallewoof	547ab2aebb	API: add /props route (#1222 ) * API: add an /extra/chat_template route A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model. * switch to pre-established /props endpoint for chat template * bug-fix (upstream): one-off in string juggling	2024-11-21 10:58:32 +08:00

1 2 3 4 5 ...

356 commits