koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 20:31:01 +00:00

Author	SHA1	Message	Date
Concedo	bfa2ae7744	fixed smartcache bug when used with images	2026-01-02 00:35:05 +08:00
Concedo	774841ffd6	clear the images array from kcpp chat completions	2026-01-01 22:51:00 +08:00
Concedo	51edb6ae61	allow clip fa for anything besides cuda on gpu	2026-01-01 21:09:51 +08:00
Concedo	442fa7cd7c	support for circular textures in sdcpp	2026-01-01 16:34:09 +08:00
Concedo	54e419f587	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # docs/ops.md # docs/ops/Metal.csv # ggml/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # grammars/README.md # models/templates/llama-cpp-deepseek-r1.jinja # scripts/sync-ggml.last # tests/test-chat.cpp	2026-01-01 15:34:10 +08:00
Concedo	66ccf8f6b8	Merge commit '`f14f4e421b`' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # AGENTS.md # CONTRIBUTING.md # docs/build.md # examples/llama.android/app/build.gradle.kts # examples/llama.android/app/src/main/java/com/example/llama/MainActivity.kt # examples/llama.android/app/src/main/res/layout/activity_main.xml # examples/llama.android/gradle/libs.versions.toml # examples/llama.android/lib/src/main/cpp/ai_chat.cpp # examples/llama.android/lib/src/main/java/com/arm/aichat/InferenceEngine.kt # examples/llama.android/lib/src/main/java/com/arm/aichat/internal/InferenceEngineImpl.kt # examples/model-conversion/scripts/causal/compare-embeddings-logits.sh # examples/model-conversion/scripts/embedding/run-original-model.py # examples/retrieval/retrieval.cpp # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/kleidiai/kernels.cpp # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-cuda/mmq.cu # ggml/src/ggml-cuda/mmq.cuh # src/CMakeLists.txt # tools/llama-bench/llama-bench.cpp # tools/server/CMakeLists.txt	2026-01-01 15:20:56 +08:00
triplenom	9e10bd2eaf	llama: handle short reads in direct I/O path (#18504 )	2026-01-01 10:24:43 +08:00
Anri Lombard	4cd162a123	chat: make tool description and parameters optional per OpenAI spec (#18478 ) * chat: make tool description and parameters optional per OpenAI spec Per the OpenAI API specification, both 'description' and 'parameters' fields in tool function definitions are optional. Previously, the parser would throw an exception if these fields were missing. Attempts to fix #17667 * refactor: use value() for cleaner optional field access	2025-12-31 17:21:37 -06:00
Concedo	03df0c40f3	if gendefaults is set, horde has debug flag	2026-01-01 00:54:57 +08:00
Georgi Gerganov	13814eb370	sync : ggml	2025-12-31 18:54:43 +02:00
Georgi Gerganov	54f67b9b66	ggml : bump version to 0.9.5 (ggml/1410)	2025-12-31 18:54:43 +02:00
Anri Lombard	33ded988ba	quantize: prevent input/output file collision (#18451 ) Check if input and output files are the same before quantizing to prevent file corruption when mmap reads from a file being written to. Fixes #12753	2025-12-31 23:29:03 +08:00
Concedo	4c3cf7ba56	updated lite	2025-12-31 23:07:25 +08:00
Sigbjørn Skjæret	0db8109849	convert : lint fix (#18507 )	2025-12-31 14:28:21 +01:00
Henry147147	9b8329de7a	mtmd : Adding support for Nvidia Music Flamingo Model (#18470 ) * Inital commit, debugging q5_k_s quant * Made hf_to_gguf extend whisper to reduce code duplication * addressed convert_hf_to_gguf pull request issue --------- Co-authored-by: Henry D <henrydorsey147@gmail.com>	2025-12-31 12:13:23 +01:00
Concedo	76ef726ec8	adaptive p sharpness to 10.0f	2025-12-31 17:28:30 +08:00
gatbontonpc	9a6369bb60	metal : add count_equal op (#18314 ) * add count equal for metal * remove trailing whitespace * updated doc ops table * changed shmem to i32 * added multi tg and templating * removed BLAS support from Metal docs * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add memset to set dst to 0 * metal : cleanup --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-31 10:39:48 +02:00
Johannes Gäßler	ecc343de63	CUDA: fix KQ max calculation (#18487 )	2025-12-31 09:37:00 +01:00
Georgi Gerganov	01ade96e71	metal : remove BF16 x F16 kernels (#18456 )	2025-12-31 09:53:48 +02:00
Aman Gupta	7bcaf815c2	sycl: add newline at the end of CMakeLists.txt (#18503 )	2025-12-31 14:23:44 +08:00
Rahul Sathe	c8a3798041	Work around broken IntelSYCLConfig.cmake in Intel oneAPI 2025.x (#18345 ) * cmake: work around broken IntelSYCLConfig.cmake in oneAPI 2025.x * [AI] sycl: auto-detect and skip incompatible IntelSYCL package Automatically detect compiler versions with incompatible IntelSYCL CMake configuration files and fall back to manual SYCL flags instead of requiring users to set options manually. Fixes build failures with oneAPI 2025.x where IntelSYCLConfig.cmake has SYCL_FEATURE_TEST_EXTRACT invocation errors. * refactor: improve SYCL provider handling and error messages in CMake configuration * refactor: enhance SYCL provider validation and error handling in CMake configuration * ggml-sycl: wrap find_package(IntelSYCL) to prevent build crashes	2025-12-31 09:08:44 +08:00
Sigbjørn Skjæret	4849661d98	docker : add CUDA 13.1 image build (#18441 ) * add updated cuda-new.Dockerfile for Ubuntu 24.04 compatibilty * add cuda13 build	2025-12-30 22:28:53 +01:00
Bart Louwers	6e0c8cbc40	docs : document that JSON Schema is not available to model when using response_format (#18492 ) * Document unsupported JSON Schema annotations Add note about unsupported JSON Schema annotations. * Update README.md * Update README.md * Update README.md	2025-12-30 15:13:49 -06:00
Aldehir Rojas	0f89d2ecf1	common : default content to an empty string (#18485 ) * common : default content to an empty string * common : fix tests that break when content != null	2025-12-30 12:00:57 -06:00
Daniel Bevenius	ac1d0eb7bf	llama : fix typo in comment in llama-kv-cache.h [no ci] (#18489 )	2025-12-30 17:20:14 +01:00
Xuan-Son Nguyen	cd78e57c3a	lora: count lora nodes in graph_max_nodes (#18469 ) * lora: count lora nodes in graph_max_nodes * 3 nodes per weight * 4 nodes * keep track n_lora_nodes from llama_model * fix assert * rm redundant header * common: load adapters before context creation * use 6 nodes	2025-12-30 15:53:12 +01:00
Concedo	20ea081594	updated lite (+3 squashed commit) Squashed commit: [605fef9ca] updated lite [dad606fad] updated sdui [22246d7eb] updated lite	2025-12-30 22:38:56 +08:00
Jay Zenith	c32fa21db8	sampling: reuse token data buffer in llama_sampler_sample (#18365 ) * sampling: reuse token data buffer in llama_sampler_sample * move cur buffer before timing section, after samplers * minor : fix build --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-30 16:27:49 +02:00
Jeff Bolz	f14f4e421b	server: fix files built redundantly (#18474 )	2025-12-30 13:11:13 +01:00
Charles Xu	2d6c00a9b8	kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458 ) * kleidiai: add and integrate SVE 256-bit vector-length kernel * updated for review comments	2025-12-30 14:04:53 +02:00
Aman Gupta	d77d7c5c06	CUDA: add log line when mxfp4 acceleration is used (#18483 ) * CUDA: add log line when mxfp4 acceleration is used * add in backend_get_features	2025-12-30 17:40:46 +08:00
Daniel Bevenius	a864fb1c14	model-conversion : use CONVERTED_MODEL for compare-embeddings (#18461 ) This commit updates the causal model verification script to use the CONVERTED_MODEL environment variable instead of using the MODEL_PATH (the original model path) as the basis for the converted model file name. The motivation for this that currently if the converted model file name differs from the original model directory/name the verification script will look for the wrong .bin file that was generating when running the converted model. This similar to the change made for the embeddings models script in Commit `db81d5ec4b` ("model-conversion : use CONVERTED_EMBEDDING_MODEL for embedding_verify_logits (#18079)"), but we also verify the embeddings of for causal models as well.	2025-12-30 10:13:12 +01:00
Xuan-Son Nguyen	51a48720b8	webui: fix prompt progress ETA calculation (#18468 ) * webui: fix prompt progress ETA calculation * handle case done === 0	2025-12-29 21:42:11 +01:00
Pascal	c9a3b40d65	Webui/prompt processing progress (#18300 ) * webui: display prompt preprocessing progress * webui: add percentage/ETA and exclude cached tokens from progress Address review feedback from ngxson * webui: add minutes and first chunk (0%) case * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: address review feedback from allozaur * chore: update webui build output * webui: address review feedback from allozaur * nit * chore: update webui build output * feat: Enhance chat processing state * feat: Improve chat processing statistics UI * chore: update webui build output * feat: Add live generation statistics to processing state hook * feat: Persist prompt processing stats in hook for better UX * refactor: Enhance ChatMessageStatistics for live stream display * feat: Implement enhanced live chat statistics into assistant message * chore: update webui build output * fix: Proper tab for each stage of prompt processing/generation * chore: update webui build output * fix: Improved ETA calculation & display logic * chore: update webui build output * feat: Simplify logic & remove ETA from prompt progress * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-12-29 19:32:21 +01:00
Johannes Gäßler	0bd1212a43	CUDA: fix replacment of bad archs in CMake (#18457 )	2025-12-29 17:58:20 +01:00
wbtek	5b1248c9af	server : Cmdline arg -to changes http read timeout from current 600sec default (#18279 ) * Prevent crash if TTFT >300sec, boosted to 90 days * server : allow configurable HTTP timeouts for child models * server : pass needed timeouts from params only --------- Co-authored-by: Greg Slocum <fromgit@wbtek.slocum.net>	2025-12-29 17:12:48 +01:00
Xuan-Son Nguyen	3595ae5963	contributing: tighten AI usage policy (#18388 ) * contributing: tighten AI usage policy * refactor AGENTS.md * proofreading * update contributing * add claude.md * add trailing newline * add note about dishonest practices * rm point about dishonest * rm requirement watermarking * add .gemini/settings.json * allow initially AI-generated content * revise * Update CONTRIBUTING.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * improve * trailing space * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * update --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-12-29 16:01:32 +01:00
Naco Siren	c1366056f6	android: routine maintenance - Dec 2025 (#18338 ) * Fix `msg` typo * Fix thread safety in destroy() to support generation abortion in lifecycle callbacks. * UI polish: stack new message change from below; fix GGUF margin not in view port * Bug fixes: rare racing condition when main thread updating view and and default thread updating messages at the same time; user input not disabled during generation. * Bump dependencies' versions; Deprecated outdated dsl usage.	2025-12-29 15:51:13 +02:00
Georgi Gerganov	2a85f720b8	server : handle closed connection for tasks (#18459 )	2025-12-29 15:34:41 +02:00
Daniel Bevenius	7cbec34a63	model-conversion : add device option to embd run orig model (#18386 ) This commit refactors the original model embedding script to include a device selection option. Users can now specify the device (cpu, cuda, mps, auto) via command-line arguments. It also refactors the code to be more structured.	2025-12-29 13:37:02 +01:00
Héctor Estrada Moreno	0c8986403b	retrieval : use at most n_seq_max chunks (#18400 )	2025-12-29 13:21:13 +02:00
Concedo	329c0e7e32	mini qol to prevent fake tool calls	2025-12-29 17:54:27 +08:00
o7si	daa242dfc8	common: fix return value check for setpriority (#18412 ) * common: fix return value check for setpriority * tools: add logging for process priority setting	2025-12-29 11:07:49 +02:00
Johannes Gäßler	e70e640db3	CUDA: Blackwell features for non-native builds (#18436 )	2025-12-29 09:35:42 +01:00
Aman Gupta	5fa66c6e67	cuda: fix race condition in cumsum (#18448 ) * ggml-cuda: fix race condition in cumsum * remove unneccesary sync_threads	2025-12-29 14:07:17 +08:00
Tim Neumann	382808c14b	ci : re-enable rocm build on amd64 (#18439 ) This was disabled in #9340 due to compiler crash, but seems to build now as confirmed by the latest comments in #11913. I've also managed to build the image with `docker build -f .devops/rocm.Dockerfile .` (for all three stages, `full`, `server` and `light`). A quick attempt at trying to build an arm64 image failed. Since none of the other images are build for arm, I only enabled the amd64 one. The `runs_on` option was added to match the other entries.	2025-12-29 00:29:23 +01:00
uvos	4ffc47cb20	HIP: Use mmq on MFMA devices for MUL_MAT_ID in cases where a lot of splits would be generated (#18202 )	2025-12-28 20:12:55 +01:00
momonga	9c675c7140	model : Plamo3 support (#17304 ) * plamo3 * fix plamo3 * clean code * clean up the code * fix diff * clean up the code * clean up the code * clean up the code * clean up the code * clean up the code * clean up the code * add chat_template if exist * clean up the code * fix cpu-backend * chore: whitespace trim fix + typo fix * Fix: address review feedback * restore `FREQ_BASE_SWA` constant * Fix: address review feedback2 * Fix:typecheck * Fix: address review feedback3 * final cleanup --------- Co-authored-by: mmngays <146910567+mmngays@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-28 17:28:31 +01:00
Concedo	0e26e4d354	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/011-bug-results.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp	2025-12-28 23:47:55 +08:00
Concedo	58d8635827	fixed autofit	2025-12-28 23:15:06 +08:00

1 2 3 4 5 ...

11084 commits