koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 04:09:19 +00:00

Author	SHA1	Message	Date
Concedo	80ce8a50b3	allow token bans and eos handling in	2026-05-16 15:20:46 +08:00
askmyteapot	3174fccb83	Fix linker error for kcpp_permit_any_repack (#2210 ) Fixes `gpttype_adapter.lib(gpttype_adapter.obj) : error LNK2019: unresolved external symbol "int kcpp_permit_any_repack"` It's a `bool`, not an `int`	2026-05-16 08:55:56 +08:00
Concedo	66a7b5e5de	prevent repacking if mmap is used, reduces memory footprint	2026-05-15 16:40:53 +08:00
Concedo	286e62267e	adjust batching eligibility	2026-05-11 21:54:32 +08:00
Concedo	bfaddd7a3b	added support for added memory and gemma and glm prompt fixes for batching mode	2026-05-10 23:39:03 +08:00
Concedo	33ca75d56f	ci for tools upload, minor function reordering	2026-05-10 23:10:43 +08:00
AlpinDale	c03302b670	feat: add a primitive form of continuous batching (#2167 ) * feat: add a primitive form of continuous batching * fix: deadlock in batching fallback * fix: windows build * chore: suppress the contbatch arg from --help * feat: batch-aware rep_pen_slope * fix: automatically disable shifting when batching is enabled * fix: mixed-path state corruption * fix: attempt to fully separate the two pipelines * added a semaphore to prevent non-batchable requests from starting while batched requests are running --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2026-05-10 17:50:31 +08:00
Concedo	eb30b29d69	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/gguf-publish.yml # CODEOWNERS # examples/sycl/test.sh # pyproject.toml # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md	2026-05-08 14:48:57 +08:00
Concedo	950676fdb7	split utils.cpp into 2 files to support sd.cpp	2026-05-04 15:04:12 +08:00
Concedo	8b62e7b667	allow splitmode to be set independently, enable tensor parallelism	2026-05-02 16:41:28 +08:00
Concedo	0755f27372	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/openvino.Dockerfile # .github/workflows/build-self-hosted.yml # .github/workflows/build.yml # common/chat.cpp # docs/backend/OPENVINO.md # examples/speculative-simple/speculative-simple.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/libggml-htp.inf # ggml/src/ggml-openvino/ggml-decoder.cpp # ggml/src/ggml-openvino/ggml-openvino-extra.cpp # ggml/src/ggml-openvino/ggml-openvino.cpp # ggml/src/ggml-openvino/ggml-quants.cpp # ggml/src/ggml-openvino/openvino/op/rope.cpp # ggml/src/ggml-openvino/openvino/op_table.cpp # ggml/src/ggml-openvino/openvino/op_table.h # ggml/src/ggml-openvino/openvino/translate_session.cpp # ggml/src/ggml-openvino/openvino/utils.cpp # ggml/src/ggml-openvino/openvino/utils.h # ggml/src/ggml-openvino/utils.cpp # ggml/src/ggml-openvino/utils.h # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/convert.hpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/set_rows.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync_vendor.py # tests/CMakeLists.txt # tests/test-chat.cpp # tools/cli/cli.cpp # tools/mtmd/CMakeLists.txt # tools/server/CMakeLists.txt	2026-04-23 00:55:05 +08:00
Concedo	becf70d49b	fixed logspam for fit	2026-04-23 00:43:09 +08:00
Concedo	96ec87127a	updated colab, handle connection dropping during prompt processing	2026-04-21 21:46:13 +08:00
Concedo	4629b49afb	updated to handle changes for clip_is_mrope	2026-04-21 19:34:32 +08:00
Concedo	19a12bb080	Merge branch 'upstream' into concedo_experimental # Conflicts: # CODEOWNERS # common/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # scripts/sync-ggml.last # tools/cli/cli.cpp # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp	2026-04-21 18:53:03 +08:00
Concedo	1feba4e4ea	fixed koboldcpp.sh, fixed vision max/min when one param is missing, fixed processing count wrong, updated lite	2026-04-21 18:36:47 +08:00
Concedo	71b4107bb6	fixed terminal logs	2026-04-19 11:31:12 +08:00
Concedo	e5eab545f3	handle override jinja template	2026-04-19 00:30:28 +08:00
Concedo	17c754a5fc	improved reasoning budget	2026-04-18 17:19:09 +08:00
Concedo	0b37cb9a57	added preliminary support for reasoning budget	2026-04-18 11:56:33 +08:00
Concedo	9a38091207	support q5_1 kv	2026-04-17 17:06:15 +08:00
Concedo	cccb45a00a	summary outputs include processed amt	2026-04-17 14:22:51 +08:00
Concedo	64ce5fca15	better approach when SWA window exceeded, simply refill the window. this is not 100% correct but good enough for fastforward users. Disable FF or increase window if not good enough	2026-04-17 11:44:13 +08:00
Concedo	b5e317e015	SWA fix attempt 2	2026-04-17 00:33:45 +08:00
Concedo	ae292c496e	handle SWA conflicting with rewind, increased default SWA padding.	2026-04-16 17:00:26 +08:00
Concedo	0251c6dbde	added swa padding controls	2026-04-16 16:21:48 +08:00
Concedo	535df844dd	touchup for min/max tokens ui	2026-04-16 14:56:22 +08:00
Llama	c592bd01da	Pass img_min_params and img_max_params to ctx_clip_params (#2133 ) * Pass img_min_params and img_max_params to ctx_clip_params These values determine the minimum and maximum size (in tokens) of vision embeddings. The default value of -1 uses a model-dependent default size, for example for Gemma 4 the default is a 280 token embedding. For higher quality results (at the cost of using more memory and slower speed) you can increase the size of the embedding to 1120 tokens. * Change dict to mydict to match change to method	2026-04-16 12:27:06 +08:00
Concedo	9c0b9b0bb1	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/development/HOWTO-add-model.md # docs/multimodal.md # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/gated_delta_net.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/upscale.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # tests/test-backend-ops.cpp # tests/test-llama-archs.cpp # tools/mtmd/CMakeLists.txt	2026-04-14 20:06:04 +08:00
Concedo	ae60ea0009	handle updated gemma templates	2026-04-14 00:29:58 +08:00
Concedo	5361b45fba	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # requirements/requirements-tool_bench.txt	2026-04-12 16:22:26 +08:00
Concedo	919a010ebc	support outro but don't actually use it yet	2026-04-10 23:18:30 +08:00
Concedo	a1fc912452	try don't trigger the magic if input is following the jinja template.exactly for thinking models (+1 squashed commits) Squashed commits: [5542e81dc] try don't trigger the magic if input is following the jinja template.	2026-04-07 21:47:38 +08:00
Concedo	2d3fe0c113	revert my tweak, switch back to henk's original implementation for now, we can explore this again next time.	2026-04-07 19:00:58 +08:00
Concedo	e991bc044e	updated lite, modify henk fix to allow triggering on missing close only	2026-04-06 23:41:45 +08:00
Concedo	82cc19e055	calculate some fields before autofit for more accurate estimate	2026-04-06 20:44:37 +08:00
Concedo	f6e712d919	universal gemma4 fix, add memory check	2026-04-06 19:20:44 +08:00
henk717	4e30294cb1	Henk's Gemma4 31B Magic (#2096 )	2026-04-06 18:49:19 +08:00
Concedo	6c937c05d9	improve ncmoe / moecpu regex	2026-04-04 23:53:13 +08:00
Concedo	db8bc40731	add some warnings if shifting fails	2026-04-04 23:16:26 +08:00
Concedo	eb3422996a	BOS fix for gemma4	2026-04-04 22:15:01 +08:00
Concedo	97f785efce	ensure BOS on vision prefix	2026-04-03 16:20:36 +08:00
Concedo	e8cffa37c8	fixed gemma4v image crashing on encode, however images are not yet working correctly	2026-04-03 15:56:35 +08:00
Concedo	0c2b679ea3	support bf16 quantkv cache type	2026-03-28 00:01:17 +08:00
Concedo	c91f350ed5	increase max images, take images from the end instead of beginning if too many images	2026-03-26 23:03:52 +08:00
Concedo	993925ba96	gracefully handle bad grammar instead of crashing	2026-03-23 17:00:53 +08:00
Concedo	07327b6c10	double n_batch size when pipeline parallel is enabled, keep u_batch the same	2026-03-21 11:22:10 +08:00
Concedo	3113e3a643	move main device print	2026-03-21 10:47:21 +08:00
Concedo	f579939057	updated lite, change smartcache snapshot behavior to conserve slots	2026-03-15 15:15:39 +08:00
Concedo	fcdf2f40d5	no need snapshot after gen is complete.	2026-03-15 12:34:48 +08:00

1 2 3 4 5 ...

564 commits