koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	0cb599546e	increase max supported llava images to 8	2025-01-09 22:12:06 +08:00
Concedo	c73d99ccac	updated lite	2025-01-08 13:35:59 +08:00
Concedo	568e476997	added toggle for vae tiling, use custom memory buffer	2025-01-08 13:12:03 +08:00
Concedo	d752846116	fixed ask save file	2025-01-07 22:11:15 +08:00
Concedo	58791612d2	sse3 mode for noavx2 clblast, fixed metadata, added version command	2025-01-06 21:59:05 +08:00
Concedo	9b32482089	fixed bug in aesthetic ui	2025-01-05 18:04:02 +08:00
Concedo	1559d4d2fb	fixed defective websearch	2025-01-04 16:47:38 +08:00
Concedo	e07e73aeb4	updated lite	2025-01-04 10:47:48 +08:00
Concedo	8de44d1e41	refactored some outputs	2024-12-30 22:30:27 +08:00
Concedo	5eb314a04b	websearch length limits and caching	2024-12-30 18:30:54 +08:00
Concedo	3fea11675d	websearch integrated into lite, changed to POST	2024-12-30 17:30:41 +08:00
Concedo	6026501ed2	websearch functional	2024-12-30 12:01:51 +08:00
Concedo	709dab6289	improved websearch endpoint	2024-12-29 19:39:16 +08:00
Concedo	5451a8e8a9	updated lite	2024-12-29 17:04:29 +08:00
Concedo	2de1975ca2	improve websearch api	2024-12-28 23:36:40 +08:00
Concedo	baaecd1c65	added a basic websearch proxy	2024-12-28 19:07:00 +08:00
Concedo	29afdb7c90	minor linting	2024-12-28 12:21:35 +08:00
kallewoof	23ec550835	PoC: add chat template heuristics (#1283 ) * PoC: add chat template heuristics The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf). This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to. * gemma 2 heuristic * Phi 4, Llama 3.x heuristics * better qwen vs generic heuristic * cleanup * mistral (generic) heuristic * fix sys msg for mistral * phi 3.5 * mistral v3 * cohere (aya expanse 32b based) * only derive from chat template if AutoGuess * add notes about alpaca fallbacks * added AutoGuess.json dummy * add mistral v7 * switch to using a json list with search strings	2024-12-28 12:15:23 +08:00
Concedo	5f8f483fae	fixed typo (+1 squashed commits) Squashed commits: [b586d187] fixed typo	2024-12-23 21:57:34 +08:00
Concedo	13abf591d2	patch release for drafting fix	2024-12-23 11:40:02 +08:00
Concedo	4c56b7cada	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/gbnf-validator/gbnf-validator.cpp # examples/llava/clip.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # ggml/src/ggml-cpu/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-12-21 09:41:49 +08:00
Concedo	fc52a38a25	handle urls as config download in model param	2024-12-20 10:56:07 +08:00
Concedo	6089421423	always follow pci bus id	2024-12-18 00:46:48 +08:00
Concedo	60cd68a39d	draft model sets gpu split instead of id, made mmq default for cli	2024-12-14 23:58:45 +08:00
Concedo	595cc6975f	added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid	2024-12-13 17:11:59 +08:00
Concedo	a11bba5893	cleanup, fix native build for arm (+28 squashed commit) Squashed commit: [d1f6a4154] bundle library [947ab84b7] undo [0f9aba8d8] test [e9ac93873] test [920438202] test [`1c6d98804`] Revert "quick test" This reverts commit `acf8ec8940`. [`acf8ec894`] quick test [`6a9937233`] undo [`5a263a5bd`] test [`ddfd82bca`] test [`0b30e45da`] test [`c3bfece55`] messed up [`2a4b37fe0`] Revert "test" This reverts commit `80a1fcaeaf`. [`80a1fcaea`] test [`e2aa7d944`] test [`264d80200`] test [`f5b123173`] undo [`1ffacc484`] test [`63c0be926`] undo [`510e0377e`] ofast try fix [`4ac199b20`] try fix sigill [`1bc987ba2`] try fix illegal instruction [`7697252b1`] edit [`f87087b28`] check gcc ver [`e9dfe2cef`] try using qemu to do the pyinstaller [`b411192db`] revert [`25b5301e5`] try using qemu to do the pyinstaller [`58038cddc`] try using qemu to do the pyinstaller	2024-12-10 19:42:23 +08:00
Concedo	e9d2332dd8	improved tool calls and whisper	2024-12-06 14:34:31 +08:00
Concedo	836c06d91a	minor edit	2024-12-06 00:37:38 +08:00
Concedo	d0d1d922de	handle and fix temp paths to chat completions adapter	2024-12-05 17:22:35 +08:00
Concedo	2787fca6b4	refactored library selection, fixed ollama params	2024-12-05 16:47:52 +08:00
Concedo	52cc908f7f	default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change.	2024-12-03 22:44:10 +08:00
Concedo	2ba5949054	updated sdcpp, also set euler as default sampler	2024-12-01 17:00:20 +08:00
Concedo	42228b9746	warning when selecting non gguf models	2024-12-01 13:35:51 +08:00
Concedo	b7cd210cd2	more linting with Ruff (+1 squashed commits) Squashed commits: [43802cfe2] Applied default Ruff linting	2024-12-01 01:23:13 +08:00
Concedo	409e393d10	fixed critical bug in image model loader	2024-11-30 23:28:24 +08:00
Concedo	0028e71993	special handling to resolve incomplete utf8 token sequences in qwen	2024-11-30 16:54:01 +08:00
Concedo	32ac3153e4	default speculative set to 8. added more adapter fields	2024-11-30 16:18:27 +08:00
Concedo	e0c59486ee	default to 12 tokens drafted	2024-11-30 11:52:07 +08:00
Concedo	b21d0fe3ac	customizable speculative size	2024-11-30 11:28:19 +08:00
Concedo	f75bbb945f	speculative decoding initial impl completed (+6 squashed commit) Squashed commit: [0a6306ca0] draft wip dont use (will be squashed) [a758a1c9c] wip dont use (will be squashed) [e1994d3ce] wip dont use [f59690d68] wip [77228147d] wip on spec decoding. dont use yet [2445bca54] wip adding speculative decoding (+1 squashed commits) Squashed commits: [50e341bb7] wip adding speculative decoding	2024-11-30 10:41:10 +08:00
kallewoof	fd320f6682	/props endpoint: provide context size through default_generation_settings (#1237 )	2024-11-26 16:15:27 +08:00
Concedo	1e0792a3ef	comfyui emulation also done	2024-11-24 15:39:03 +08:00
Concedo	9bd27323e7	emulate comfyui txt2img	2024-11-24 11:28:12 +08:00
Concedo	bf28d956ae	ollama chat api done	2024-11-24 00:10:15 +08:00
Concedo	62dde8cfb2	ollama sync completions mostly working. stupid api.	2024-11-23 23:31:37 +08:00
Concedo	2c1a06a07d	wip ollama emulation, added detokenize endpoint	2024-11-23 22:48:03 +08:00
Concedo	c0da7e4dcf	multiplayer activity tracking	2024-11-23 19:59:55 +08:00
Concedo	1dd37933e3	fixed grammar not resetting correctly	2024-11-23 09:55:12 +08:00
Concedo	18f227625b	multiplayer fixes	2024-11-22 19:02:31 +08:00
mkarr	ac6a0cde91	Support chunked encoding. (#1226 ) * Support chunked encoding. The koboldcpp API does not support HTTP chunked encoding. Some HTTP libraries, notable Go's net/http can automatically choose to use chunked encoding. This adds support for chunked encoding within the do_POST() handler. * refactor slightly to add additional safety checks and follow original format --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-11-21 18:24:04 +08:00

1 2 3 4 5 ...

857 commits