koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 12:39:09 +00:00

Author	SHA1	Message	Date
AlpinDale	c03302b670	feat: add a primitive form of continuous batching (#2167 ) * feat: add a primitive form of continuous batching * fix: deadlock in batching fallback * fix: windows build * chore: suppress the contbatch arg from --help * feat: batch-aware rep_pen_slope * fix: automatically disable shifting when batching is enabled * fix: mixed-path state corruption * fix: attempt to fully separate the two pipelines * added a semaphore to prevent non-batchable requests from starting while batched requests are running --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2026-05-10 17:50:31 +08:00
Concedo	9be810628e	setenv return int	2026-05-03 13:32:05 +08:00
Concedo	2fb97d9c2c	explicitly set env var internally.	2026-05-03 13:18:50 +08:00
Wagner Bruna	25fab4113e	refactor: handle GGML_VK_VISIBLE_DEVICES at the Python level (#2179 ) All C++ handling code currently: - build a comma-separated list from the info_vulkan array - if GGML_VK_VISIBLE_DEVICES isn't set - set GGML_VK_VISIBLE_DEVICES to the list Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this can be done in the same way at the Python level, before all loading functions. Caveat: load_model had the default `inputs.vulkan_info = "0"`, so the default GPU would be "0" only when loading a text model.	2026-05-02 23:10:29 +08:00
Concedo	993925ba96	gracefully handle bad grammar instead of crashing	2026-03-23 17:00:53 +08:00
Concedo	13db5aee9e	stub files for loading ace step	2026-02-22 23:15:08 +08:00
Concedo	7f485e5287	remove CLBlast, part 1	2026-01-23 13:50:12 +08:00
Concedo	c9c15749e0	wip on adding esrgan upscaling	2026-01-20 00:35:35 +08:00
Llama	95ebfdcde8	Add token ids to logprob data returned by the API (#1928 ) Previously, logprobs only contained the token string and byte data, as well as the log probability itself. For workflows that require the token id, translating from the token bytes to the token id is potentially costly and unreliable. It is simple and inexpensive to expose the numeric token ids directly instead.	2026-01-18 16:30:46 +08:00
Concedo	d2b2224b0d	vulkan env var always take priority	2026-01-17 10:34:45 +08:00
Wagner Bruna	f30da43b7f	sd: get the available schedulers directly from sd.cpp (#1900 ) Avoids a hardcoded list on the Python side.	2025-12-24 21:55:24 +08:00
Concedo	1e083d9c8b	integrate autofit for upstream, removed forceversion	2025-12-17 18:42:47 +08:00
Ruben Garcia	06d39dff73	Fix warnings (#1864 )	2025-11-29 20:18:38 +08:00
Concedo	abf527a207	clearer multimodal capability display	2025-07-28 22:54:49 +08:00
Concedo	811463a704	split audio and vision detection separately	2025-07-13 17:47:15 +08:00
Concedo	65ff041827	added more perf stats	2025-06-21 12:12:28 +08:00
Concedo	736030bb9f	save and load state upgraded to 3 available states	2025-06-04 22:09:40 +08:00
Concedo	53f1511396	use a static buffer for kv reloads instead. also, added into lite ui	2025-06-03 22:32:46 +08:00
Concedo	4b57108508	Save KV State and Load KV State to memory added. GUI not yet updated	2025-06-03 17:46:29 +08:00
Concedo	8e1ebc55b5	dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored	2025-06-02 12:49:53 +08:00
Concedo	6a709be50a	replace deprecated	2025-03-27 10:27:20 +08:00
Concedo	3992fb79cc	wip adding embeddings support	2025-03-24 18:01:23 +08:00
Concedo	eb1809c105	add more perf stats	2025-03-12 18:58:27 +08:00
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Concedo	f75bbb945f	speculative decoding initial impl completed (+6 squashed commit) Squashed commit: [0a6306ca0] draft wip dont use (will be squashed) [a758a1c9c] wip dont use (will be squashed) [e1994d3ce] wip dont use [f59690d68] wip [77228147d] wip on spec decoding. dont use yet [2445bca54] wip adding speculative decoding (+1 squashed commits) Squashed commits: [50e341bb7] wip adding speculative decoding	2024-11-30 10:41:10 +08:00
Concedo	2c1a06a07d	wip ollama emulation, added detokenize endpoint	2024-11-23 22:48:03 +08:00
Concedo	272828cab0	tweaks to chat template	2024-11-21 11:10:30 +08:00
kallewoof	547ab2aebb	API: add /props route (#1222 ) * API: add an /extra/chat_template route A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model. * switch to pre-established /props endpoint for chat template * bug-fix (upstream): one-off in string juggling	2024-11-21 10:58:32 +08:00
Concedo	aa26a58085	added logprobs api and logprobs viewer	2024-11-01 00:22:15 +08:00
Concedo	90f5cd0f67	wip logprobs data	2024-10-30 00:59:34 +08:00
Concedo	12fd16bfd4	Merge commit '`df270ef745`' into concedo_experimental # Conflicts: # Makefile # common/CMakeLists.txt # common/common.h # common/sampling.cpp # common/sampling.h # examples/infill/infill.cpp # examples/llama-bench/llama-bench.cpp # examples/quantize-stats/quantize-stats.cpp # examples/server/server.cpp # include/llama.h # src/llama-sampling.cpp # src/llama-sampling.h # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-llama-grammar.cpp # tests/test-sampling.cpp	2024-09-09 17:10:08 +08:00
Concedo	27bbdf7d2a	added link for novita AI, added legacy warning for old GGML models	2024-09-09 11:19:32 +08:00
Concedo	813cf829b5	allow selecting multigpu on vulkan	2024-06-06 18:36:56 +08:00
Concedo	10b148f4c2	added skip bos for tokenize endpoint	2024-06-05 10:49:11 +08:00
Concedo	f24aef8792	initial whisper integration	2024-05-29 23:13:11 +08:00
Concedo	7968bdebbb	added more stats in perf	2024-03-16 16:53:48 +08:00
Concedo	d943c739a8	wip submitting of llava image to backend	2024-03-10 17:14:27 +08:00
Concedo	5a44d4de2b	refactor and clean identifiers for sd, fix cmake	2024-02-29 18:28:45 +08:00
Concedo	524ba12abd	refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr	2024-02-29 14:02:20 +08:00
Concedo	f75e479db0	WIP on sdcpp integration	2024-02-29 00:40:07 +08:00
Concedo	4cd571db89	vulkan multigpu, show uptime	2024-02-08 16:54:38 +08:00
Concedo	ec2dbd99a3	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # flake.lock # llama.cpp	2024-02-07 22:21:32 +08:00
Concedo	481f7a6fbc	warn about unsupported arch	2024-01-26 14:22:43 +08:00
Concedo	2a4a7241e6	Merge branch 'vulkan_test' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # llama.cpp	2024-01-25 23:01:44 +08:00
Concedo	d9a7bd577a	gpu layer offloading disabled for phi models in clblast	2024-01-25 17:40:05 +08:00
Concedo	94e68fe474	added field to show recent seed	2024-01-02 15:35:04 +08:00
Nexesenex	cf360f3e62	Update expose.cpp '#include <cstdint> (#586 )	2023-12-28 15:01:22 +08:00
Concedo	6570a2005b	token count includes ids	2023-12-03 15:44:53 +08:00
Concedo	be92cfa125	added preloadstory	2023-11-10 13:05:22 +08:00
Concedo	7fb809b94b	fixed auto rope scaling (+1 squashed commits) Squashed commits: [b1767874] wip	2023-09-07 14:45:08 +08:00

1 2 3

109 commits