koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 09:04:36 +00:00

Author	SHA1	Message	Date
Concedo	cfcdfd69bd	allow embeddings models to use mmap	2025-06-07 10:14:00 +08:00
Concedo	6effb65cfe	change singleinstance order	2025-06-06 21:20:30 +08:00
Concedo	740f91e3fd	lower aria interval	2025-06-06 17:43:38 +08:00
Concedo	9cf32e5fee	step limits over adapter for sd	2025-06-06 14:12:43 +08:00
Concedo	f6bbc350f2	various qol fixes	2025-06-05 10:26:02 +08:00
Concedo	736030bb9f	save and load state upgraded to 3 available states	2025-06-04 22:09:40 +08:00
Concedo	06d2bc3404	ollama compat fixes	2025-06-04 19:22:29 +08:00
Concedo	53f1511396	use a static buffer for kv reloads instead. also, added into lite ui	2025-06-03 22:32:46 +08:00
Concedo	4b57108508	Save KV State and Load KV State to memory added. GUI not yet updated	2025-06-03 17:46:29 +08:00
Concedo	6ce85c54d6	not working correctly	2025-06-02 22:12:10 +08:00
Concedo	8e1ebc55b5	dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored	2025-06-02 12:49:53 +08:00
Concedo	51dc1cf920	added scale for text lora	2025-06-02 00:13:42 +08:00
Concedo	74ef097c4a	added ability to set koboldcpp as default handler for gguf and kcpps	2025-06-01 22:36:41 +08:00
Concedo	f3bb947a13	cuda use wmma flash attention for turing (+1 squashed commits) Squashed commits: [3c5112398] 117 (+10 squashed commit) Squashed commit: [4f01bb2d4] 117 graphs 80v [7549034ea] 117 graphs [dabf9cb99] checking if cuda 11.5.2 works [ba7ccdb7a] another try cu11.7 only [752cf2ae5] increase aria2c download log rate [dc4f198fd] test send turing to wmma flash attention [496a22e83] temp build test cu11.7.0 [ca759c424] temp build test cu11.7 [c46ada17c] test build: enable virtual80 for oldcpu [3ccfd939a] test build: with cuda graphs for all	2025-06-01 11:41:45 +08:00
Concedo	b08dca65ed	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/CMakeLists.txt # common/arg.cpp # common/chat.cpp # examples/parallel/README.md # examples/parallel/parallel.cpp # ggml/cmake/common.cmake # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/rope.cpp # models/ggml-vocab-bert-bge.gguf.inp # models/ggml-vocab-bert-bge.gguf.out # models/ggml-vocab-command-r.gguf.inp # models/ggml-vocab-command-r.gguf.out # models/ggml-vocab-deepseek-coder.gguf.inp # models/ggml-vocab-deepseek-coder.gguf.out # models/ggml-vocab-deepseek-llm.gguf.inp # models/ggml-vocab-deepseek-llm.gguf.out # models/ggml-vocab-falcon.gguf.inp # models/ggml-vocab-falcon.gguf.out # models/ggml-vocab-gpt-2.gguf.inp # models/ggml-vocab-gpt-2.gguf.out # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # models/ggml-vocab-llama-spm.gguf.inp # models/ggml-vocab-llama-spm.gguf.out # models/ggml-vocab-mpt.gguf.inp # models/ggml-vocab-mpt.gguf.out # models/ggml-vocab-phi-3.gguf.inp # models/ggml-vocab-phi-3.gguf.out # models/ggml-vocab-qwen2.gguf.inp # models/ggml-vocab-qwen2.gguf.out # models/ggml-vocab-refact.gguf.inp # models/ggml-vocab-refact.gguf.out # models/ggml-vocab-starcoder.gguf.inp # models/ggml-vocab-starcoder.gguf.out # requirements/requirements-gguf_editor_gui.txt # tests/CMakeLists.txt # tests/test-chat.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp # tools/mtmd/CMakeLists.txt # tools/run/run.cpp # tools/server/CMakeLists.txt	2025-05-31 13:04:21 +08:00
Concedo	c923e9fe46	added option to unload model from admin control	2025-05-31 11:51:09 +08:00
Concedo	08e0745e7e	added singleinstance flag and local shutdown api	2025-05-31 11:37:32 +08:00
Concedo	6529326c59	allow temperatures up to 1.0 when function calling	2025-05-30 15:59:18 +08:00
Concedo	c881bb7348	match a few common oai voices	2025-05-29 23:29:17 +08:00
Concedo	26bf5b446d	fixed thread count <=0 , fixed clip skip <= 0	2025-05-28 00:38:15 +08:00
Concedo	f97bbdde00	fix to allow all EOGs to trigger a stop, occam's glm4 fix,	2025-05-24 22:55:11 +08:00
Concedo	ec04115ae9	swa options now available	2025-05-24 11:50:37 +08:00
Concedo	748dfcc2e4	massively improved tool calling	2025-05-24 02:26:11 +08:00
Concedo	c4df151298	experimental swa flag	2025-05-23 21:33:26 +08:00
Concedo	499283c63a	rename define to match upstream	2025-05-23 17:10:12 +08:00
Concedo	e68a5f448c	add ddim sampler	2025-05-22 21:28:01 +08:00
Concedo	f125e724eb	fix off-by-one npast during some instances of fast forwarding	2025-05-22 19:51:21 +08:00
Concedo	440350327c	set random range for seed	2025-05-21 23:47:18 +08:00
Wagner Bruna	5d0cfc9db3	store on the image the actual random seed, for reproducibility (#1549 )	2025-05-21 23:40:47 +08:00
Concedo	8b6dfbd1be	disabling the gMask prefix for glm-4 completions	2025-05-21 17:29:24 +08:00
Concedo	49305942ab	try disabling the gMask prefix for glm-4 completions	2025-05-21 16:47:08 +08:00
Concedo	5f4923bf24	backend tag replacement for endtags. view results with debug mode.	2025-05-19 23:14:43 +08:00
Concedo	710c747b60	minor noscript edit	2025-05-19 17:51:44 +08:00
Concedo	c546cb638e	disable showgui if skiplauncher is used	2025-05-18 01:42:14 +08:00
Concedo	ca4274e384	added size info into HF searcher	2025-05-17 00:31:54 +08:00
Concedo	5ccd4b2bf5	horde default max ctx matches main ctx	2025-05-15 10:26:20 +08:00
Concedo	c5ea7fad93	updated lite, only show processed input in debugmode	2025-05-14 17:46:54 +08:00
Concedo	21e31e255b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # README.md # build-xcframework.sh # common/CMakeLists.txt # examples/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-metal/ggml-metal.m # ggml/src/ggml-metal/ggml-metal.metal # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # scripts/compare-llama-bench.py # src/CMakeLists.txt # src/llama-model.cpp # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-opt.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/clip.cpp # tools/rpc/rpc-server.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-05-13 00:28:35 +08:00
Concedo	40eb3a54c4	rename some toolip texts	2025-05-11 22:50:40 +08:00
Concedo	1eb6d25010	truncate middle instead of end for long strings	2025-05-11 20:26:17 +08:00
Concedo	48c3682c2c	improve search	2025-05-10 19:25:26 +08:00
Concedo	50e1064ffe	better passthrough handling	2025-05-10 19:11:09 +08:00
Concedo	c4a0b323f0	remove fa restrictions for vulkan	2025-05-09 17:34:14 +08:00
Concedo	b6220669f4	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # Makefile # examples/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/convert.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # scripts/sync-ggml.last	2025-05-08 23:07:33 +08:00
Concedo	7c5d47f688	multigpu warning only once	2025-05-08 00:55:09 +08:00
Concedo	fa22c1a5a4	fixed cfg scale, but turns out it sucks. embedded aria2c into pyinstaller	2025-05-07 18:30:36 +08:00
Concedo	a5b6f372a3	cfg scale wip	2025-05-07 00:36:00 +08:00
Concedo	0fa435b2a6	Merge commit '`9b61acf060`' into concedo_experimental # Conflicts: # Makefile # docs/multimodal/MobileVLM.md # docs/multimodal/glmedge.md # docs/multimodal/llava.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # requirements/requirements-all.txt # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/android/adb_run.sh # tools/mtmd/android/build_64.sh # tools/mtmd/clip-quantize-cli.cpp	2025-05-06 23:34:21 +08:00
Concedo	38a8778f24	wip cfg scale	2025-05-06 23:06:25 +08:00
Concedo	13cee48740	embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [b9b695217] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [90b5d389d] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits) Squashed commits: [fbbaa989f] embed aria2c for windows	2025-05-06 18:56:02 +08:00

1 2 3 4 5 ...

1085 commits