koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 09:04:36 +00:00

Author	SHA1	Message	Date
Concedo	6039791adf	minor bugfixes	2025-06-21 18:41:28 +08:00
Concedo	65ff041827	added more perf stats	2025-06-21 12:12:28 +08:00
Wagner Bruna	08adfb53c9	Configurable VAE threshold limit (#1601 ) * add backend support for changing the VAE tiling threshold * trigger VAE tiling by image area instead of dimensions I've tested with GGML_VULKAN_MEMORY_DEBUG all resolutions with the same 768x768 area (even extremes like 64x9216), and many below that: all consistently allocate 6656 bytes per image pixel. As tiling is primarily useful to avoid excessive memory usage, it seems reasonable to enable VAE tiling based on area rather than maximum image side. However, as there is currently no user interface option to change it back to a lower value, it's best to maintain the default behavior for now. * replace the notile option with a configurable threshold This allows selecting a lower threshold value, reducing the peak memory usage. The legacy sdnotile parameter gets automatically converted to the new parameter, if it's the only one supplied. * simplify tiling checks, 768 default visible in launcher --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2025-06-21 10:14:57 +08:00
Concedo	2ba7803b95	replace_instruct_placeholders is now default	2025-06-20 22:11:58 +08:00
Concedo	4e40f2aaf4	added photomaker face cloning	2025-06-20 21:33:36 +08:00
Concedo	21881a861d	rename restrict square to sdclampedsoft	2025-06-20 15:39:55 +08:00
Concedo	924dfa7cd3	bump version	2025-06-18 21:37:24 +08:00
Concedo	e0a7694328	try to set cuda pcie order first thing	2025-06-18 20:25:38 +08:00
Concedo	a8d33ebb0d	increase genamt hardlimit from 0.1 to 0.2 ratio	2025-06-18 19:34:12 +08:00
Concedo	40443a98f5	show available RAM, fixed SD vae tiling noise	2025-06-18 18:44:50 +08:00
Concedo	7966bdd1ad	allow embeddings model to use gpu	2025-06-18 00:46:30 +08:00
Concedo	ab29be54c4	comfyui compat - serve temporary upload endpoint for img2img	2025-06-16 23:18:47 +08:00
Concedo	861a2f5275	terminal title	2025-06-15 21:51:44 +08:00
Concedo	238be98efa	Allow override config for gguf files when reloading in admin mode, updated lite, fixed typo (+1 squashed commits) Squashed commits: [fe14845cc] Allow override config for gguf files when reloading in admin mode, updated lite (+2 squashed commit) Squashed commit: [9ded66aa5] Allow override config for gguf files when reloading in admin mode [9597f6a34] update lite	2025-06-14 12:00:20 +08:00
Wagner Bruna	f6d2d1ce5c	configurable resolution limit (#1586 ) * refactor image gen configuration screen * make image size limit configurable * fix resolution limits and keep dimensions closer to the original ratio * use 0.0 for the configured default image size limit This prevents the current default value from being saved into the config files, in case we later decide to adopt a different value. * export image model version when loading * restore model-specific default image size limit * change the image area restriction to be specified by a square side * move image resolution limits down to the C++ level * Revert "export image model version when loading" This reverts commit `fa65b23de3`. * Linting Fixes: PY: - Inconsistent var name sd_restrict_square -> sd_restrict_square_var - GUI swap back to using absolute row numbers for now. - fstring fix - size_limit -> side_limit inconsistency C++: - roundup_64 standalone function - refactor sd_fix_resolution variable names for clarity - move "anti crashing" hard total megapixel limit always to be applied after soft total megapixel limit instead of conditionally only when sd_restrict_square is unset * allow unsafe resolutions if debugmode is on --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2025-06-13 20:05:20 +08:00
Concedo	1cbe716e45	allow setting maingpu	2025-06-12 17:53:43 +08:00
henk717	f151648f03	Pyinstaller launcher and dependency updates This PR adds a new launcher executable to the unpack feature, eliminating the need to have python and its dependencies in the unpacked version. It also does a few dependency changes to help future proof.	2025-06-10 23:08:02 +08:00
Concedo	8386546e08	Switched VS2019 for revert cu12.1 build, hopefully solves dll issues try change order (+3 squashed commit) Squashed commit: [457f02507] try newer jimver [`64af28862`] windows pyinstaller shim. the final loader will be moved into the packed directory later. [`0272ecf2d`] try alternative way of getting cuda toolkit 12.4 since jimver wont work, also fix rocm try again (+3 squashed commit) Squashed commit: [133e81633] try without pwsh [4d99cefba] try without pwsh [bdfa91e7d] try alternative way of getting cuda toolkit 12.4, also fix rocm	2025-06-10 23:08:02 +08:00
Concedo	7d8aa31f1f	fixed embeddings, added new parameter to limit max embeddings context	2025-06-10 01:11:55 +08:00
Concedo	8780b33c64	consolidate imports	2025-06-09 17:48:54 +08:00
Concedo	82d7c53b85	embeddings handle base64	2025-06-09 00:26:40 +08:00
Concedo	7f57846c2f	update bundled vcrts	2025-06-08 19:39:42 +08:00
Concedo	dcf88d6e78	Revert "make tts use gpu by default. use --ttscpu to disable" This reverts commit `669f80265b`.	2025-06-08 17:08:04 +08:00
Concedo	669f80265b	make tts use gpu by default. use --ttscpu to disable	2025-06-08 17:06:19 +08:00
Concedo	cfcdfd69bd	allow embeddings models to use mmap	2025-06-07 10:14:00 +08:00
Concedo	6effb65cfe	change singleinstance order	2025-06-06 21:20:30 +08:00
Concedo	740f91e3fd	lower aria interval	2025-06-06 17:43:38 +08:00
Concedo	9cf32e5fee	step limits over adapter for sd	2025-06-06 14:12:43 +08:00
Concedo	f6bbc350f2	various qol fixes	2025-06-05 10:26:02 +08:00
Concedo	736030bb9f	save and load state upgraded to 3 available states	2025-06-04 22:09:40 +08:00
Concedo	06d2bc3404	ollama compat fixes	2025-06-04 19:22:29 +08:00
Concedo	53f1511396	use a static buffer for kv reloads instead. also, added into lite ui	2025-06-03 22:32:46 +08:00
Concedo	4b57108508	Save KV State and Load KV State to memory added. GUI not yet updated	2025-06-03 17:46:29 +08:00
Concedo	6ce85c54d6	not working correctly	2025-06-02 22:12:10 +08:00
Concedo	8e1ebc55b5	dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored	2025-06-02 12:49:53 +08:00
Concedo	51dc1cf920	added scale for text lora	2025-06-02 00:13:42 +08:00
Concedo	74ef097c4a	added ability to set koboldcpp as default handler for gguf and kcpps	2025-06-01 22:36:41 +08:00
Concedo	f3bb947a13	cuda use wmma flash attention for turing (+1 squashed commits) Squashed commits: [3c5112398] 117 (+10 squashed commit) Squashed commit: [4f01bb2d4] 117 graphs 80v [7549034ea] 117 graphs [dabf9cb99] checking if cuda 11.5.2 works [ba7ccdb7a] another try cu11.7 only [752cf2ae5] increase aria2c download log rate [dc4f198fd] test send turing to wmma flash attention [496a22e83] temp build test cu11.7.0 [ca759c424] temp build test cu11.7 [c46ada17c] test build: enable virtual80 for oldcpu [3ccfd939a] test build: with cuda graphs for all	2025-06-01 11:41:45 +08:00
Concedo	b08dca65ed	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/CMakeLists.txt # common/arg.cpp # common/chat.cpp # examples/parallel/README.md # examples/parallel/parallel.cpp # ggml/cmake/common.cmake # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/rope.cpp # models/ggml-vocab-bert-bge.gguf.inp # models/ggml-vocab-bert-bge.gguf.out # models/ggml-vocab-command-r.gguf.inp # models/ggml-vocab-command-r.gguf.out # models/ggml-vocab-deepseek-coder.gguf.inp # models/ggml-vocab-deepseek-coder.gguf.out # models/ggml-vocab-deepseek-llm.gguf.inp # models/ggml-vocab-deepseek-llm.gguf.out # models/ggml-vocab-falcon.gguf.inp # models/ggml-vocab-falcon.gguf.out # models/ggml-vocab-gpt-2.gguf.inp # models/ggml-vocab-gpt-2.gguf.out # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # models/ggml-vocab-llama-spm.gguf.inp # models/ggml-vocab-llama-spm.gguf.out # models/ggml-vocab-mpt.gguf.inp # models/ggml-vocab-mpt.gguf.out # models/ggml-vocab-phi-3.gguf.inp # models/ggml-vocab-phi-3.gguf.out # models/ggml-vocab-qwen2.gguf.inp # models/ggml-vocab-qwen2.gguf.out # models/ggml-vocab-refact.gguf.inp # models/ggml-vocab-refact.gguf.out # models/ggml-vocab-starcoder.gguf.inp # models/ggml-vocab-starcoder.gguf.out # requirements/requirements-gguf_editor_gui.txt # tests/CMakeLists.txt # tests/test-chat.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp # tools/mtmd/CMakeLists.txt # tools/run/run.cpp # tools/server/CMakeLists.txt	2025-05-31 13:04:21 +08:00
Concedo	c923e9fe46	added option to unload model from admin control	2025-05-31 11:51:09 +08:00
Concedo	08e0745e7e	added singleinstance flag and local shutdown api	2025-05-31 11:37:32 +08:00
Concedo	6529326c59	allow temperatures up to 1.0 when function calling	2025-05-30 15:59:18 +08:00
Concedo	c881bb7348	match a few common oai voices	2025-05-29 23:29:17 +08:00
Concedo	26bf5b446d	fixed thread count <=0 , fixed clip skip <= 0	2025-05-28 00:38:15 +08:00
Concedo	f97bbdde00	fix to allow all EOGs to trigger a stop, occam's glm4 fix,	2025-05-24 22:55:11 +08:00
Concedo	ec04115ae9	swa options now available	2025-05-24 11:50:37 +08:00
Concedo	748dfcc2e4	massively improved tool calling	2025-05-24 02:26:11 +08:00
Concedo	c4df151298	experimental swa flag	2025-05-23 21:33:26 +08:00
Concedo	499283c63a	rename define to match upstream	2025-05-23 17:10:12 +08:00
Concedo	e68a5f448c	add ddim sampler	2025-05-22 21:28:01 +08:00

1 2 3 4 5 ...

1059 commits