koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	3fea11675d	websearch integrated into lite, changed to POST	2024-12-30 17:30:41 +08:00
Concedo	6026501ed2	websearch functional	2024-12-30 12:01:51 +08:00
Concedo	709dab6289	improved websearch endpoint	2024-12-29 19:39:16 +08:00
Concedo	5451a8e8a9	updated lite	2024-12-29 17:04:29 +08:00
Concedo	2de1975ca2	improve websearch api	2024-12-28 23:36:40 +08:00
Concedo	baaecd1c65	added a basic websearch proxy	2024-12-28 19:07:00 +08:00
Concedo	29afdb7c90	minor linting	2024-12-28 12:21:35 +08:00
kallewoof	23ec550835	PoC: add chat template heuristics (#1283 ) * PoC: add chat template heuristics The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf). This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to. * gemma 2 heuristic * Phi 4, Llama 3.x heuristics * better qwen vs generic heuristic * cleanup * mistral (generic) heuristic * fix sys msg for mistral * phi 3.5 * mistral v3 * cohere (aya expanse 32b based) * only derive from chat template if AutoGuess * add notes about alpaca fallbacks * added AutoGuess.json dummy * add mistral v7 * switch to using a json list with search strings	2024-12-28 12:15:23 +08:00
Concedo	5f8f483fae	fixed typo (+1 squashed commits) Squashed commits: [b586d187] fixed typo	2024-12-23 21:57:34 +08:00
Concedo	13abf591d2	patch release for drafting fix	2024-12-23 11:40:02 +08:00
Concedo	4c56b7cada	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/gbnf-validator/gbnf-validator.cpp # examples/llava/clip.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # ggml/src/ggml-cpu/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-12-21 09:41:49 +08:00
Concedo	fc52a38a25	handle urls as config download in model param	2024-12-20 10:56:07 +08:00
Concedo	6089421423	always follow pci bus id	2024-12-18 00:46:48 +08:00
Concedo	60cd68a39d	draft model sets gpu split instead of id, made mmq default for cli	2024-12-14 23:58:45 +08:00
Concedo	595cc6975f	added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid	2024-12-13 17:11:59 +08:00
Concedo	a11bba5893	cleanup, fix native build for arm (+28 squashed commit) Squashed commit: [d1f6a4154] bundle library [947ab84b7] undo [0f9aba8d8] test [e9ac93873] test [920438202] test [`1c6d98804`] Revert "quick test" This reverts commit `acf8ec8940`. [`acf8ec894`] quick test [`6a9937233`] undo [`5a263a5bd`] test [`ddfd82bca`] test [`0b30e45da`] test [`c3bfece55`] messed up [`2a4b37fe0`] Revert "test" This reverts commit `80a1fcaeaf`. [`80a1fcaea`] test [`e2aa7d944`] test [`264d80200`] test [`f5b123173`] undo [`1ffacc484`] test [`63c0be926`] undo [`510e0377e`] ofast try fix [`4ac199b20`] try fix sigill [`1bc987ba2`] try fix illegal instruction [`7697252b1`] edit [`f87087b28`] check gcc ver [`e9dfe2cef`] try using qemu to do the pyinstaller [`b411192db`] revert [`25b5301e5`] try using qemu to do the pyinstaller [`58038cddc`] try using qemu to do the pyinstaller	2024-12-10 19:42:23 +08:00
Concedo	e9d2332dd8	improved tool calls and whisper	2024-12-06 14:34:31 +08:00
Concedo	836c06d91a	minor edit	2024-12-06 00:37:38 +08:00
Concedo	d0d1d922de	handle and fix temp paths to chat completions adapter	2024-12-05 17:22:35 +08:00
Concedo	2787fca6b4	refactored library selection, fixed ollama params	2024-12-05 16:47:52 +08:00
Concedo	52cc908f7f	default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change.	2024-12-03 22:44:10 +08:00
Concedo	2ba5949054	updated sdcpp, also set euler as default sampler	2024-12-01 17:00:20 +08:00
Concedo	42228b9746	warning when selecting non gguf models	2024-12-01 13:35:51 +08:00
Concedo	b7cd210cd2	more linting with Ruff (+1 squashed commits) Squashed commits: [43802cfe2] Applied default Ruff linting	2024-12-01 01:23:13 +08:00
Concedo	409e393d10	fixed critical bug in image model loader	2024-11-30 23:28:24 +08:00
Concedo	0028e71993	special handling to resolve incomplete utf8 token sequences in qwen	2024-11-30 16:54:01 +08:00
Concedo	32ac3153e4	default speculative set to 8. added more adapter fields	2024-11-30 16:18:27 +08:00
Concedo	e0c59486ee	default to 12 tokens drafted	2024-11-30 11:52:07 +08:00
Concedo	b21d0fe3ac	customizable speculative size	2024-11-30 11:28:19 +08:00
Concedo	f75bbb945f	speculative decoding initial impl completed (+6 squashed commit) Squashed commit: [0a6306ca0] draft wip dont use (will be squashed) [a758a1c9c] wip dont use (will be squashed) [e1994d3ce] wip dont use [f59690d68] wip [77228147d] wip on spec decoding. dont use yet [2445bca54] wip adding speculative decoding (+1 squashed commits) Squashed commits: [50e341bb7] wip adding speculative decoding	2024-11-30 10:41:10 +08:00
kallewoof	fd320f6682	/props endpoint: provide context size through default_generation_settings (#1237 )	2024-11-26 16:15:27 +08:00
Concedo	1e0792a3ef	comfyui emulation also done	2024-11-24 15:39:03 +08:00
Concedo	9bd27323e7	emulate comfyui txt2img	2024-11-24 11:28:12 +08:00
Concedo	bf28d956ae	ollama chat api done	2024-11-24 00:10:15 +08:00
Concedo	62dde8cfb2	ollama sync completions mostly working. stupid api.	2024-11-23 23:31:37 +08:00
Concedo	2c1a06a07d	wip ollama emulation, added detokenize endpoint	2024-11-23 22:48:03 +08:00
Concedo	c0da7e4dcf	multiplayer activity tracking	2024-11-23 19:59:55 +08:00
Concedo	1dd37933e3	fixed grammar not resetting correctly	2024-11-23 09:55:12 +08:00
Concedo	18f227625b	multiplayer fixes	2024-11-22 19:02:31 +08:00
mkarr	ac6a0cde91	Support chunked encoding. (#1226 ) * Support chunked encoding. The koboldcpp API does not support HTTP chunked encoding. Some HTTP libraries, notable Go's net/http can automatically choose to use chunked encoding. This adds support for chunked encoding within the do_POST() handler. * refactor slightly to add additional safety checks and follow original format --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-11-21 18:24:04 +08:00
Concedo	c2ca2ec2bc	updated docs, fixed a few issues with multiplayer	2024-11-21 18:16:13 +08:00
Concedo	272828cab0	tweaks to chat template	2024-11-21 11:10:30 +08:00
kallewoof	547ab2aebb	API: add /props route (#1222 ) * API: add an /extra/chat_template route A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model. * switch to pre-established /props endpoint for chat template * bug-fix (upstream): one-off in string juggling	2024-11-21 10:58:32 +08:00
Concedo	8ab3eb89a8	updated lite	2024-11-21 10:43:48 +08:00
Concedo	a439dcb38e	multiplayer error handling	2024-11-19 23:31:48 +08:00
Concedo	1b663e10c8	first functional multiplayer	2024-11-19 22:49:28 +08:00
Concedo	14cbd07eaa	more wip multiplayer	2024-11-19 18:09:26 +08:00
Concedo	39124828ab	wip multiplayer	2024-11-17 23:29:25 +08:00
Concedo	a8694698fd	accept gguf text encoders for sd	2024-11-16 17:23:02 +08:00
Concedo	70aee82552	attempts a backflip, but does he stick the landing?	2024-11-16 17:05:45 +08:00

... 5 6 7 8 9 ...

1047 commits