koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 09:04:36 +00:00

Author	SHA1	Message	Date
Concedo	abf527a207	clearer multimodal capability display	2025-07-28 22:54:49 +08:00
Concedo	811463a704	split audio and vision detection separately	2025-07-13 17:47:15 +08:00
Concedo	65ff041827	added more perf stats	2025-06-21 12:12:28 +08:00
Concedo	736030bb9f	save and load state upgraded to 3 available states	2025-06-04 22:09:40 +08:00
Concedo	53f1511396	use a static buffer for kv reloads instead. also, added into lite ui	2025-06-03 22:32:46 +08:00
Concedo	4b57108508	Save KV State and Load KV State to memory added. GUI not yet updated	2025-06-03 17:46:29 +08:00
Concedo	8e1ebc55b5	dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored	2025-06-02 12:49:53 +08:00
Concedo	6a709be50a	replace deprecated	2025-03-27 10:27:20 +08:00
Concedo	3992fb79cc	wip adding embeddings support	2025-03-24 18:01:23 +08:00
Concedo	eb1809c105	add more perf stats	2025-03-12 18:58:27 +08:00
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Concedo	f75bbb945f	speculative decoding initial impl completed (+6 squashed commit) Squashed commit: [0a6306ca0] draft wip dont use (will be squashed) [a758a1c9c] wip dont use (will be squashed) [e1994d3ce] wip dont use [f59690d68] wip [77228147d] wip on spec decoding. dont use yet [2445bca54] wip adding speculative decoding (+1 squashed commits) Squashed commits: [50e341bb7] wip adding speculative decoding	2024-11-30 10:41:10 +08:00
Concedo	2c1a06a07d	wip ollama emulation, added detokenize endpoint	2024-11-23 22:48:03 +08:00
Concedo	272828cab0	tweaks to chat template	2024-11-21 11:10:30 +08:00
kallewoof	547ab2aebb	API: add /props route (#1222 ) * API: add an /extra/chat_template route A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model. * switch to pre-established /props endpoint for chat template * bug-fix (upstream): one-off in string juggling	2024-11-21 10:58:32 +08:00
Concedo	aa26a58085	added logprobs api and logprobs viewer	2024-11-01 00:22:15 +08:00
Concedo	90f5cd0f67	wip logprobs data	2024-10-30 00:59:34 +08:00
Concedo	12fd16bfd4	Merge commit '`df270ef745`' into concedo_experimental # Conflicts: # Makefile # common/CMakeLists.txt # common/common.h # common/sampling.cpp # common/sampling.h # examples/infill/infill.cpp # examples/llama-bench/llama-bench.cpp # examples/quantize-stats/quantize-stats.cpp # examples/server/server.cpp # include/llama.h # src/llama-sampling.cpp # src/llama-sampling.h # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-llama-grammar.cpp # tests/test-sampling.cpp	2024-09-09 17:10:08 +08:00
Concedo	27bbdf7d2a	added link for novita AI, added legacy warning for old GGML models	2024-09-09 11:19:32 +08:00
Concedo	813cf829b5	allow selecting multigpu on vulkan	2024-06-06 18:36:56 +08:00
Concedo	10b148f4c2	added skip bos for tokenize endpoint	2024-06-05 10:49:11 +08:00
Concedo	f24aef8792	initial whisper integration	2024-05-29 23:13:11 +08:00
Concedo	7968bdebbb	added more stats in perf	2024-03-16 16:53:48 +08:00
Concedo	d943c739a8	wip submitting of llava image to backend	2024-03-10 17:14:27 +08:00
Concedo	5a44d4de2b	refactor and clean identifiers for sd, fix cmake	2024-02-29 18:28:45 +08:00
Concedo	524ba12abd	refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr	2024-02-29 14:02:20 +08:00
Concedo	f75e479db0	WIP on sdcpp integration	2024-02-29 00:40:07 +08:00
Concedo	4cd571db89	vulkan multigpu, show uptime	2024-02-08 16:54:38 +08:00
Concedo	ec2dbd99a3	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # flake.lock # llama.cpp	2024-02-07 22:21:32 +08:00
Concedo	481f7a6fbc	warn about unsupported arch	2024-01-26 14:22:43 +08:00
Concedo	2a4a7241e6	Merge branch 'vulkan_test' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # llama.cpp	2024-01-25 23:01:44 +08:00
Concedo	d9a7bd577a	gpu layer offloading disabled for phi models in clblast	2024-01-25 17:40:05 +08:00
Concedo	94e68fe474	added field to show recent seed	2024-01-02 15:35:04 +08:00
Nexesenex	cf360f3e62	Update expose.cpp '#include <cstdint> (#586 )	2023-12-28 15:01:22 +08:00
Concedo	6570a2005b	token count includes ids	2023-12-03 15:44:53 +08:00
Concedo	be92cfa125	added preloadstory	2023-11-10 13:05:22 +08:00
Concedo	7fb809b94b	fixed auto rope scaling (+1 squashed commits) Squashed commits: [b1767874] wip	2023-09-07 14:45:08 +08:00
Concedo	1301bd7e29	Fix to skip GPU offloading so falcon models work correctly	2023-08-30 18:26:41 +08:00
Concedo	b95a4ccb22	added a token counting endpoint, set mmq as default	2023-08-24 20:41:49 +08:00
Concedo	280abaf029	added stop reason in the perf endpoint	2023-07-24 11:55:35 +08:00
Concedo	39dc1a46c4	added token count, updated lite	2023-07-20 14:41:06 +08:00
Concedo	1d1111e10f	expose timing info in web api	2023-07-11 18:56:06 +08:00
callMeMakerRen	4e46673f80	Merge branch 'LostRuins:concedo' into concedo	2023-07-08 09:33:26 +08:00
shutup	1727e652f1	expose some useful info that can be used in statistics of performence	2023-07-07 11:52:58 +08:00
Concedo	27a0907cfa	backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas	2023-07-06 22:33:46 +08:00
Concedo	66a3f4e421	added support for lora base	2023-06-10 19:29:45 +08:00
Concedo	43f7e40470	added extra endpoints for abort gen and polled streaming	2023-06-10 18:13:26 +08:00
SammCheese	e6231c3055	back to http.server, improved implementation	2023-06-09 12:17:55 +02:00
SammCheese	9a8da35ec4	working streaming. TODO: fix lite	2023-06-08 18:34:23 +02:00
SammCheese	97971291e9	draft: token streaming	2023-06-08 18:34:08 +02:00

1 2

96 commits