koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-12 18:09:42 +00:00

Author	SHA1	Message	Date
Xuan-Son Nguyen	0364178ca2	clip : refactor clip_init, add tests (#12757 ) * refactor clip_init * fix loading file * fix style * test ok * better test with report * add missing headers * clarify * add KEY_MM_PATCH_MERGE_TYPE * remove bool has_* pattern * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/clip.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * use ggml_soft_max_ext * refactor logging system * add minicpm-v-o 2.6 for testing * use nullptr everywhere * fix Yi-VL model --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-05 17:17:40 +02:00
Nauful Shaikh	b772394297	server : webui : Upgrade daisyui, tailwindcss. (#12735 ) * Upgrade daisyui, tailwindcss. * Switch to all themes. * Revert a change. * Update formatting. * Install packages before npm build. * Revert "Install packages before npm build." This reverts commit 336c5147e614e60993162794ba9d9d4629a916f8. * Add index.html.gz * run build --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-04-04 16:09:52 +02:00
nick huang	23106f94ea	gguf-split : --merge now respects --dry-run option (#12681 ) * gguf-split now respects dry-run option * removing trailing space	2025-04-04 16:09:12 +02:00
Concedo	103d60ed2c	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/common.cpp # examples/batched-bench/batched-bench.cpp # examples/batched/batched.cpp # examples/export-lora/export-lora.cpp # examples/gritlm/gritlm.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-vulkan/CMakeLists.txt # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp	2025-04-03 18:57:49 +08:00
Georgi Gerganov	a10b36c91a	llama : refactor kv cache guard (#12695 ) * llama : refactor kv cache guard ggml-ci * cont : fix comment [no ci] * llama : fix kv_cache restore logic ggml-ci * context : simplify kv cache updates ggml-ci * cont : better name [no ci] * llama : fix llama_decode return code when could not find KV slot ggml-ci * context : change log err -> warn [no ci] * kv-cache : add comment + warning	2025-04-02 14:32:59 +03:00
Xuan-Son Nguyen	42eb248f46	common : remove json.hpp from common.cpp (#12697 ) * common : remove json.hpp from common.cpp * fix comment	2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen	267c1399f1	common : refactor downloading system, handle mmproj with -hf option (#12694 ) * (wip) refactor downloading system [no ci] * fix all examples * fix mmproj with -hf * gemma3: update readme * only handle mmproj in llava example * fix multi-shard download * windows: fix problem with std::min and std::max * fix 2	2025-04-01 23:44:05 +02:00
Concedo	9e182b3e78	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/backend/SYCL.md # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/ggml-vulkan.cpp # scripts/sync-ggml.last # tests/test-chat-template.cpp	2025-04-01 20:16:07 +08:00
Sigbjørn Skjæret	1a85949067	llava : proper description fix (#12668 )	2025-03-31 11:28:30 +02:00
Sigbjørn Skjæret	f52d59d771	llava : fix clip loading GGUFs with missing description (#12660 )	2025-03-31 11:07:07 +02:00
Concedo	1ebadc515e	add streaming support for oai tools (+2 squashed commit) Squashed commit: [4d080b37] qwen2.5vl surgery script [4bebe7e5] add streaming support for oai tools	2025-03-31 16:49:15 +08:00
marcoStocchi	52de2e5949	tts : remove printfs (#12640 ) * tts.cpp : llama tokens console output is done using LOG_INF instead of printf(). Therefore the options '--log-disable' and '--log-file' have now uniform impact on all output.	2025-03-31 11:20:30 +03:00
Concedo	911669087a	add tentative support for qwen2.5vl vision from HimariO fork	2025-03-29 22:52:43 +08:00
Concedo	396875e1c4	update api docs and lite	2025-03-29 15:39:25 +08:00
Benson Wong	5d01670266	server : include speculative decoding stats when timings_per_token is enabled (#12603 ) * Include speculative decoding stats when timings_per_token is true New fields added to the `timings` object: - draft_n : number of draft tokens generated - draft_accepted_n : number of draft tokens accepted - draft_accept_ratio: ratio of accepted/generated * Remove redundant draft_accept_ratio var * add draft acceptance rate to server console output	2025-03-28 10:05:44 +02:00
Radoslav Gerganov	ef03229ff4	rpc : update README for cache usage (#12620 )	2025-03-28 09:44:13 +02:00
Radoslav Gerganov	ab6ab8f809	rpc : send hash when tensor data is above some fixed threshold (#12496 ) * rpc : send hash when tensor data is above some fixed threshold ref #10095 * rpc : put cache under $HOME/.cache/llama.cpp * try to fix win32 build * another try to fix win32 build * remove llama as dependency	2025-03-28 08:18:04 +02:00
Piotr	2099a9d5db	server : Support listening on a unix socket (#12613 ) * server : Bump cpp-httplib to include AF_UNIX windows support Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> * server : Allow running the server example on a unix socket Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> --------- Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-03-27 23:41:04 +01:00
Ivy233	02082f1519	clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 ) * [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize. After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA. * [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.	2025-03-26 15:06:04 +01:00
Eric Curtin	ef19c71769	run: de-duplicate fmt and format functions and optimize (#11596 )	2025-03-25 18:46:11 +01:00
Concedo	ea358369cc	Merge branch 'upstream' into concedo_experimental # Conflicts: # ci/README.md # ci/run.sh # docs/backend/CUDA-FEDORA.md # docs/build.md # docs/install.md # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/common.cuh # tests/test-backend-ops.cpp	2025-03-26 00:18:01 +08:00
Marius Gerdes	77f9c6bbe5	server : Add verbose output to OAI compatible chat endpoint. (#12246 ) Add verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.	2025-03-23 19:30:26 +01:00
Concedo	7030ebf401	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/backend/SYCL.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp # ggml/src/ggml-sycl/CMakeLists.txt # tests/test-backend-ops.cpp	2025-03-22 00:32:42 +08:00
Concedo	c1e58419c7	support for voice cloning is done (+2 squashed commit) Squashed commit: [e7301628] support for voice cloning is done [1653c576] wip adding voice cloning	2025-03-21 22:28:59 +08:00
marcoStocchi	ea1518e839	llama-tts : avoid crashes related to bad model file paths (#12482 )	2025-03-21 11:12:45 +02:00
Woof Dog	e04643063b	webui : Prevent rerendering on textarea input (#12299 ) * webui: Make textarea uncontrolled to eliminate devastating lag * Update index.html.gz * use signal-style implementation * rm console log * no duplicated savedInitValue set --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-03-20 15:57:43 +01:00
Concedo	0c90d2ebcf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # cmake/common.cmake # docs/backend/SYCL.md # examples/main/README.md # examples/speculative/speculative.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # tests/test-backend-ops.cpp	2025-03-19 19:27:11 +08:00
Georgi Gerganov	c6af2161b2	speculative : fix seg fault in certain cases (#12454 )	2025-03-18 19:35:11 +02:00
Georgi Gerganov	810e0af3f5	server : fix warmup draft cache type (#12446 ) ggml-ci	2025-03-18 12:05:42 +02:00
Sigbjørn Skjæret	60c902926c	docs : bring llama-cli conversation/template docs up-to-date (#12426 )	2025-03-17 21:14:32 +01:00
Concedo	5d7c5e9e33	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/tts/tts.cpp	2025-03-16 15:42:39 +08:00
marcoStocchi	f4c3dd5daa	llama-tts : add '-o' option (#12398 ) * added -o option to specify an output file name * llama-tts returns ENOENT in case of file write error note : PR #12042 is closed as superseded with this one.	2025-03-15 17:23:11 +01:00
Concedo	67851e5415	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/run/run.cpp # ggml/src/ggml-cann/aclnn_ops.cpp	2025-03-15 19:54:19 +08:00
Concedo	bfc30066c9	fixed a clip processing bug	2025-03-15 17:49:49 +08:00
Eric Curtin	9f2250ba72	Add CLI arg to llama-run to adjust the number of threads used (#12370 ) We default to 4, sometimes we want to manually adjust this Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-03-14 16:41:20 +00:00
Victor	add2a3aa5a	server: fix "--grammar-file" parameter (#12285 )	2025-03-14 11:21:17 +01:00
Concedo	0db4ae6237	traded my ink for a pen	2025-03-14 11:58:15 +08:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Ishaan Gandhi	2048b5913d	server : fix crash when using verbose output with input tokens that are not in printable range (#12178 ) (#12338 ) * Fix DOS index bug * Remove new APIs * remove extra line * Remove from API * Add extra newline * Update examples/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-03-13 11:10:05 +01:00
Concedo	1ef41c2124	streamline output console log (+1 squashed commits) Squashed commits: [ca474bdd] streamline output console log	2025-03-13 15:33:49 +08:00
Concedo	16137f4281	gemma3 now works correctly	2025-03-13 14:34:18 +08:00
Concedo	77debb1b1b	gemma3 vision works, but is using more tokens than expected - may need resizing	2025-03-13 00:31:16 +08:00
Daniel Bevenius	80a02aa858	llama.swiftui : fix xcframework dir in README [no ci] (#12353 ) This commit fixes the path to the xcframework in the README file which I had forgotten to change after renaming the build directory.	2025-03-12 13:45:32 +01:00
Xuan-Son Nguyen	7841fc723e	llama : Add Gemma 3 support (+ experimental vision capability) (#12343 ) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64	2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen	96e1280839	clip : bring back GPU support (#12322 ) * clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up	2025-03-11 09:20:16 +01:00
marcoStocchi	6ef79a67ca	common : refactor '-o' option (#12278 ) As discussed in PR 'llama-tts : add -o option' (#12042): * common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option. * cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.	2025-03-10 13:34:13 +02:00
Olivier Chafik	be421fc429	`tool-call`: ensure there's always a non-empty tool call id (#12292 )	2025-03-10 09:45:29 +00:00
Olivier Chafik	2b3a25c212	`sampler`: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 ) * Fix typo in lazy grammar handling (fixes trigger tokens) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-10 09:44:42 +00:00
tc-mb	8352cdc87b	llava : fix bug in minicpm-v code (#11513 ) * fix bug in minicpm-v code * update readme of minicpm-v	2025-03-10 10:33:24 +02:00
Concedo	6b7c3ae1d3	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # AUTHORS # README.md # ci/run.sh # docs/build.md # ggml/src/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # scripts/sync-ggml.last	2025-03-10 10:32:41 +08:00

1 2 3 4 5 ...

1703 commits