koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-16 19:59:16 +00:00

Author	SHA1	Message	Date
HimariO	b28ad7ecca	fix attn weight scaling after rebase	2025-04-07 22:07:56 +08:00
HimariO	223edef897	remove commented-out code blocks	2025-04-07 21:52:37 +08:00
HimariO	dde96b4774	remove not so often use `qwen2vl-cli` debug functions	2025-04-07 21:52:37 +08:00
HimariO	8fcf682b28	ignore transformers Qwen2_5_xxx type check	2025-04-07 21:52:37 +08:00
HimariO	fdae70a832	cleaning up	2025-04-07 21:52:37 +08:00
HimariO	c891300c1e	move position id remap out of ggml to avoid int32 cuda operations	2025-04-07 21:52:37 +08:00
HimariO	e18f6a3238	fix few incorrect tensor memory layout	2025-04-07 21:52:37 +08:00
HimariO	ecd673f0c5	add debug utils	2025-04-07 21:51:18 +08:00
HimariO	9c827814e6	handle window attention inputs	2025-04-07 21:51:18 +08:00
HimariO	9c7cc6de9c	implment vision model architecture, gguf convertor	2025-04-07 21:46:06 +08:00
Concedo	a3f7de7142	fixed outetts docs	2025-04-07 21:31:43 +08:00
Xuan-Son Nguyen	bd3f59f812	cmake : enable curl by default (#12761 ) * cmake : enable curl by default * no curl if no examples * fix build * fix build-linux-cross * add windows-setup-curl * fix * shell * fix path * fix windows-latest-cmake* * run: include_directories * LLAMA_RUN_EXTRA_LIBS * sycl: no llama_curl * no test-arg-parser on windows * clarification * try riscv64 / arm64 * windows: include libcurl inside release binary * add msg * fix mac / ios / android build * will this fix xcode? * try clearing the cache * add bunch of licenses * revert clear cache * fix xcode * fix xcode (2) * fix typo	2025-04-07 13:35:19 +02:00
Concedo	5edbacdd0e	fix tools (+3 squashed commit) Squashed commit: [95a489ee] fix tools build [1d3d3451] add accelerate [`2837705c`] edit a line	2025-04-06 21:30:48 +08:00
Sergey Fedorov	f1e3eb4249	common : fix includes in arg.cpp and gemma3-cli.cpp (#12766 ) * arg.cpp: add a missing include * gemma3-cli.cpp: fix cinttypes include	2025-04-05 17:46:00 +02:00
Xuan-Son Nguyen	0364178ca2	clip : refactor clip_init, add tests (#12757 ) * refactor clip_init * fix loading file * fix style * test ok * better test with report * add missing headers * clarify * add KEY_MM_PATCH_MERGE_TYPE * remove bool has_* pattern * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/clip.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * use ggml_soft_max_ext * refactor logging system * add minicpm-v-o 2.6 for testing * use nullptr everywhere * fix Yi-VL model --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-05 17:17:40 +02:00
Nauful Shaikh	b772394297	server : webui : Upgrade daisyui, tailwindcss. (#12735 ) * Upgrade daisyui, tailwindcss. * Switch to all themes. * Revert a change. * Update formatting. * Install packages before npm build. * Revert "Install packages before npm build." This reverts commit 336c5147e614e60993162794ba9d9d4629a916f8. * Add index.html.gz * run build --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-04-04 16:09:52 +02:00
nick huang	23106f94ea	gguf-split : --merge now respects --dry-run option (#12681 ) * gguf-split now respects dry-run option * removing trailing space	2025-04-04 16:09:12 +02:00
Concedo	103d60ed2c	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/common.cpp # examples/batched-bench/batched-bench.cpp # examples/batched/batched.cpp # examples/export-lora/export-lora.cpp # examples/gritlm/gritlm.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-vulkan/CMakeLists.txt # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp	2025-04-03 18:57:49 +08:00
Georgi Gerganov	a10b36c91a	llama : refactor kv cache guard (#12695 ) * llama : refactor kv cache guard ggml-ci * cont : fix comment [no ci] * llama : fix kv_cache restore logic ggml-ci * context : simplify kv cache updates ggml-ci * cont : better name [no ci] * llama : fix llama_decode return code when could not find KV slot ggml-ci * context : change log err -> warn [no ci] * kv-cache : add comment + warning	2025-04-02 14:32:59 +03:00
Xuan-Son Nguyen	42eb248f46	common : remove json.hpp from common.cpp (#12697 ) * common : remove json.hpp from common.cpp * fix comment	2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen	267c1399f1	common : refactor downloading system, handle mmproj with -hf option (#12694 ) * (wip) refactor downloading system [no ci] * fix all examples * fix mmproj with -hf * gemma3: update readme * only handle mmproj in llava example * fix multi-shard download * windows: fix problem with std::min and std::max * fix 2	2025-04-01 23:44:05 +02:00
Concedo	9e182b3e78	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/backend/SYCL.md # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/ggml-vulkan.cpp # scripts/sync-ggml.last # tests/test-chat-template.cpp	2025-04-01 20:16:07 +08:00
Sigbjørn Skjæret	1a85949067	llava : proper description fix (#12668 )	2025-03-31 11:28:30 +02:00
Sigbjørn Skjæret	f52d59d771	llava : fix clip loading GGUFs with missing description (#12660 )	2025-03-31 11:07:07 +02:00
Concedo	1ebadc515e	add streaming support for oai tools (+2 squashed commit) Squashed commit: [4d080b37] qwen2.5vl surgery script [4bebe7e5] add streaming support for oai tools	2025-03-31 16:49:15 +08:00
marcoStocchi	52de2e5949	tts : remove printfs (#12640 ) * tts.cpp : llama tokens console output is done using LOG_INF instead of printf(). Therefore the options '--log-disable' and '--log-file' have now uniform impact on all output.	2025-03-31 11:20:30 +03:00
Concedo	911669087a	add tentative support for qwen2.5vl vision from HimariO fork	2025-03-29 22:52:43 +08:00
Concedo	396875e1c4	update api docs and lite	2025-03-29 15:39:25 +08:00
Benson Wong	5d01670266	server : include speculative decoding stats when timings_per_token is enabled (#12603 ) * Include speculative decoding stats when timings_per_token is true New fields added to the `timings` object: - draft_n : number of draft tokens generated - draft_accepted_n : number of draft tokens accepted - draft_accept_ratio: ratio of accepted/generated * Remove redundant draft_accept_ratio var * add draft acceptance rate to server console output	2025-03-28 10:05:44 +02:00
Radoslav Gerganov	ef03229ff4	rpc : update README for cache usage (#12620 )	2025-03-28 09:44:13 +02:00
Radoslav Gerganov	ab6ab8f809	rpc : send hash when tensor data is above some fixed threshold (#12496 ) * rpc : send hash when tensor data is above some fixed threshold ref #10095 * rpc : put cache under $HOME/.cache/llama.cpp * try to fix win32 build * another try to fix win32 build * remove llama as dependency	2025-03-28 08:18:04 +02:00
Piotr	2099a9d5db	server : Support listening on a unix socket (#12613 ) * server : Bump cpp-httplib to include AF_UNIX windows support Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> * server : Allow running the server example on a unix socket Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> --------- Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-03-27 23:41:04 +01:00
Ivy233	02082f1519	clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 ) * [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize. After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA. * [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.	2025-03-26 15:06:04 +01:00
Eric Curtin	ef19c71769	run: de-duplicate fmt and format functions and optimize (#11596 )	2025-03-25 18:46:11 +01:00
Concedo	ea358369cc	Merge branch 'upstream' into concedo_experimental # Conflicts: # ci/README.md # ci/run.sh # docs/backend/CUDA-FEDORA.md # docs/build.md # docs/install.md # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/common.cuh # tests/test-backend-ops.cpp	2025-03-26 00:18:01 +08:00
Marius Gerdes	77f9c6bbe5	server : Add verbose output to OAI compatible chat endpoint. (#12246 ) Add verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.	2025-03-23 19:30:26 +01:00
Concedo	7030ebf401	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/backend/SYCL.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp # ggml/src/ggml-sycl/CMakeLists.txt # tests/test-backend-ops.cpp	2025-03-22 00:32:42 +08:00
Concedo	c1e58419c7	support for voice cloning is done (+2 squashed commit) Squashed commit: [e7301628] support for voice cloning is done [1653c576] wip adding voice cloning	2025-03-21 22:28:59 +08:00
marcoStocchi	ea1518e839	llama-tts : avoid crashes related to bad model file paths (#12482 )	2025-03-21 11:12:45 +02:00
Woof Dog	e04643063b	webui : Prevent rerendering on textarea input (#12299 ) * webui: Make textarea uncontrolled to eliminate devastating lag * Update index.html.gz * use signal-style implementation * rm console log * no duplicated savedInitValue set --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-03-20 15:57:43 +01:00
Concedo	0c90d2ebcf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # cmake/common.cmake # docs/backend/SYCL.md # examples/main/README.md # examples/speculative/speculative.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # tests/test-backend-ops.cpp	2025-03-19 19:27:11 +08:00
Georgi Gerganov	c6af2161b2	speculative : fix seg fault in certain cases (#12454 )	2025-03-18 19:35:11 +02:00
Georgi Gerganov	810e0af3f5	server : fix warmup draft cache type (#12446 ) ggml-ci	2025-03-18 12:05:42 +02:00
Sigbjørn Skjæret	60c902926c	docs : bring llama-cli conversation/template docs up-to-date (#12426 )	2025-03-17 21:14:32 +01:00
Concedo	5d7c5e9e33	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/tts/tts.cpp	2025-03-16 15:42:39 +08:00
marcoStocchi	f4c3dd5daa	llama-tts : add '-o' option (#12398 ) * added -o option to specify an output file name * llama-tts returns ENOENT in case of file write error note : PR #12042 is closed as superseded with this one.	2025-03-15 17:23:11 +01:00
Concedo	67851e5415	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/run/run.cpp # ggml/src/ggml-cann/aclnn_ops.cpp	2025-03-15 19:54:19 +08:00
Concedo	bfc30066c9	fixed a clip processing bug	2025-03-15 17:49:49 +08:00
Eric Curtin	9f2250ba72	Add CLI arg to llama-run to adjust the number of threads used (#12370 ) We default to 4, sometimes we want to manually adjust this Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-03-14 16:41:20 +00:00
Victor	add2a3aa5a	server: fix "--grammar-file" parameter (#12285 )	2025-03-14 11:21:17 +01:00

... 4 5 6 7 8 ...

1917 commits