koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 09:59:50 +00:00

Author	SHA1	Message	Date
Concedo	59300dbdf5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/actions/windows-setup-curl/action.yml # .github/workflows/build-linux-cross.yml # README.md # common/CMakeLists.txt # examples/parallel/README.md # examples/parallel/parallel.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # tools/server/README.md	2025-05-18 23:27:53 +08:00
Isaac McFadyen	6a2bc8bfb7	server : added --no-prefill-assistant flag (#13608 ) * added no-prefill-assistant flag * reworded documentation comment * updated server README.md	2025-05-17 23:59:48 +02:00
Xuan-Son Nguyen	6aa892ec2a	server : do not return error out of context (with ctx shift disabled) (#13577 )	2025-05-16 21:50:00 +02:00
Xuan-Son Nguyen	aea9f8b4e7	webui : improve accessibility for visually impaired people (#13551 ) * webui : improve accessibility for visually impaired people * add a11y for extra contents * fix some labels being read twice * add skip to main content	2025-05-16 21:49:01 +02:00
Concedo	e5d26a2356	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/CMakeLists.txt # docs/backend/SYCL.md # ggml/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # ggml/src/gguf.cpp # scripts/compare-llama-bench.py # tests/CMakeLists.txt # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-05-16 15:30:31 +08:00
Concedo	6cafc0e73e	Merge commit '`71bdbdb587`' into concedo_experimental # Conflicts: # ggml/src/ggml-cpu/CMakeLists.txt # tools/batched-bench/batched-bench.cpp # tools/mtmd/clip.h	2025-05-16 15:25:15 +08:00
Concedo	12e6928ec2	i'm gonna regret this, aren't i?	2025-05-15 23:59:55 +08:00
Concedo	7a76e237b8	fixed clip quantize again	2025-05-15 23:22:12 +08:00
Diego Devesa	6c8b91500e	llama-bench : fix -ot with dl backends (#13563 )	2025-05-15 15:46:55 +02:00
Xuan-Son Nguyen	3cc1f1f1d2	webui : handle PDF input (as text or image) + convert pasted long content to file (#13562 ) * webui : handle PDF input (as text or image) * handle the case where pdf image + server without mtmd * fix bug missing pages	2025-05-15 14:24:50 +02:00
Piotr Wilkin (ilintar)	c753d7bed0	server : proper error handling for missing elements in messages array (OpenAI compatible backend) (#13540 )	2025-05-15 08:40:58 +02:00
Georgi Gerganov	b2838049cc	bench : handle decode errors (#13548 ) ggml-ci	2025-05-15 05:57:02 +03:00
Olivier Chafik	aa48e373f2	`server`: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 ) * Inject date_string in llama 3.x + fix for functionary v2 https://github.com/ggml-org/llama.cpp/issues/12729 * move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-05-15 02:39:51 +01:00
Olivier Chafik	3198405e98	`common`: add partial regex support (#12808 ) * move string_find_partial_stop & string_ends_with to common * add common_regex (supports partial matches) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * partial regex: add missing iterator end checks * string utils: use string_views * direct throw to avoid ggml.h include * regex-partial: replace missed ggml_asserts --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-14 19:50:57 +01:00
Georgi Gerganov	053174436f	server : passthrough the /models endpoint during loading (#13535 ) * server : passthrough the /models endpoint during loading * server : update readme + return json for "meta" field	2025-05-14 15:42:10 +03:00
Xuan-Son Nguyen	360a9c98e1	server : fix cache_tokens bug with no cache_prompt (#13533 )	2025-05-14 13:35:07 +02:00
Xuan-Son Nguyen	bb1681fbd5	webui : use fflate for more deterministic gzip compress (#13525 ) * webui : use pako for more deterministic gzip compress * simpler code * use fflate instead of pako	2025-05-14 10:26:12 +02:00
Luca Stefani	d486dd3e8e	webui: Allow pasting file from clipboard (#13526 ) * server: Allow pasting file from clipboard * server: Prevent default action on file paste * update build * format then build combined --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-14 10:07:31 +02:00
Ed Addario	e5c834f718	quantize : improve tensor-type pattern matching (#13033 )	2025-05-13 19:12:31 +02:00
Xuan-Son Nguyen	71bdbdb587	clip : clip.h become private API (⚠️ breaking change) (#13510 )	2025-05-13 17:07:21 +02:00
Georgi Gerganov	b89d605a91	batched-bench : fix pp batch contents (#13492 )	2025-05-13 18:01:53 +03:00
Xuan-Son Nguyen	b4726345ac	mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change) (#13460 ) * mtmd : remove libllava, remove clip-quantize-cli * rm clip_model_quantize	2025-05-13 15:33:58 +02:00
Concedo	11984f1040	fixed autoguess adapters, fixed tool builds	2025-05-13 19:38:56 +08:00
Diego Devesa	cf0a43bb64	llama-bench : add defrag-thold, check for invalid ranges (#13487 )	2025-05-13 00:31:37 +02:00
Concedo	21e31e255b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # README.md # build-xcframework.sh # common/CMakeLists.txt # examples/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-metal/ggml-metal.m # ggml/src/ggml-metal/ggml-metal.metal # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # scripts/compare-llama-bench.py # src/CMakeLists.txt # src/llama-model.cpp # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-opt.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/clip.cpp # tools/rpc/rpc-server.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-05-13 00:28:35 +08:00
Xuan-Son Nguyen	de4c07f937	clip : cap max image size 1024 for qwen vl model (#13478 )	2025-05-12 15:06:51 +02:00
Anudit Nagar	91159ee9df	server : allow content to be null in oaicompat_completion_params_parse (#13477 )	2025-05-12 13:56:42 +02:00
Diego Devesa	22cdab343b	llama-bench : accept ranges for integer parameters (#13410 )	2025-05-12 13:08:22 +02:00
City	c104023994	mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459 )	2025-05-12 00:39:06 +02:00
Anthony Umfer	9a390c4829	tools : fix uninitialized llama_batch in server (#13436 ) * add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free() * Update tools/server/server.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * use C++11 initializer syntax * switch from Copy-list-initialization to Direct-list-initialization --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-05-11 17:08:26 +02:00
David Huang	7f323a589f	Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386 )	2025-05-11 14:18:39 +02:00
City	3eac209319	mtmd : support InternVL 3 38B and 78B mmproj (#13443 ) * Support InternVL 3 38B and 78B mmproj * Swap norms in clip.cpp * Group variables together	2025-05-11 11:35:52 +02:00
Xuan-Son Nguyen	a634d75d1b	mtmd : move helpers to dedicated file (#13442 ) * mtmd : move helpers to dedicated file * fix windows build * rm redundant include	2025-05-11 11:34:23 +02:00
Concedo	f841b29c41	fixed unicode paths	2025-05-11 14:05:54 +08:00
Xuan-Son Nguyen	15e6125a39	mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434 ) * mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl * fix typo	2025-05-10 19:57:54 +02:00
Xuan-Son Nguyen	3b24d26c22	server : update docs (#13432 )	2025-05-10 18:44:49 +02:00
Xuan-Son Nguyen	053367d149	mtmd : support InternVL 2.5 and 3 (#13422 ) * convert : internvl support * InternVL3-1B working * fix regression * rm mobilevlm from test * fix conversion * add test for internvl * add to list of pre-quant * restore boi/eoi check * add clarify comment for norm eps	2025-05-10 16:26:42 +02:00
Xuan-Son Nguyen	33eff40240	server : vision support via libmtmd (#12898 ) * server : (experimental) vision support via libmtmd * mtmd : add more api around mtmd_image_tokens * mtmd : add more api around mtmd_image_tokens * mtmd : ability to calc image hash * shared_ptr for mtmd_image_tokens * move hash to user-define ID (fixed) * abstract out the batch management * small fix * refactor logic adding tokens to batch * implement hashing image * use FNV hash, now hash bitmap instead of file data * allow decoding image embedding to be split into batches * rm whitespace * disable some features when mtmd is on * fix --no-mmproj-offload * mtmd_context_params no timings * refactor server_inp to server_tokens * fix the failing test case * init * wip * working version * add mtmd::bitmaps * add test target * rm redundant define * test: mtmd_input_chunks_free * rm outdated comment * fix merging issue * explicitly create mtmd::input_chunks * mtmd_input_chunk_copy * add clone() * improve server_input struct * clip : fix confused naming ffn_up and ffn_down * rm ffn_i/o/g naming * rename n_embd, n_ff * small fix * no check n_ff * fix detokenize * add const to various places * add warning about breaking changes * add c api * helper: use mtmd_image_tokens_get_n_pos * fix ctx_shift * fix name shadowing * more strict condition * support remote image_url * remote image_url log * add CI test * do not log base64 * add "has_multimodal" to /props * remove dangling image * speculative: use slot.cache_tokens.insert * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * rm can_be_detokenized * on prmpt processing done, assert cache_tokens.size * handle_completions_impl returns void * adapt the new web ui * update docs and hot topics * rm assert * small fix (2) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-09 19:29:37 +02:00
Concedo	6bb44391bd	Merge commit '`5c86c9ed3e`' into concedo_experimental # Conflicts: # tools/imatrix/imatrix.cpp # tools/mtmd/README.md # tools/run/README.md # tools/run/run.cpp	2025-05-10 00:30:18 +08:00
Diego Devesa	27ebfcacba	llama : do not crash if there is no CPU backend (#13395 ) * llama : do not crash if there is no CPU backend * add checks to examples	2025-05-09 13:02:07 +02:00
Bartowski	efb8b47eda	imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389 ) * Add --parse-special for enabling parsing of special tokens in imatrix calculation * whitespace	2025-05-09 11:53:58 +02:00
R0CKSTAR	0527771dd8	llama-run: add support for downloading models from ModelScope (#13370 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-05-09 10:25:50 +01:00
Concedo	42f6930e13	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-rpc/ggml-rpc.cpp	2025-05-09 17:18:14 +08:00
Xuan-Son Nguyen	2189fd3b63	mtmd : fix batch_view for m-rope (#13397 ) * mtmd : fix batch_view for m-rope * nits : fix comment	2025-05-09 11:18:02 +02:00
Xuan-Son Nguyen	3f96aeff39	llama : one-off chat template fix for Mistral-Small-2503 (#13398 ) * llama : one-off chat template fix for Mistral-Small-2503 * update readme * add mistral-v7-tekken	2025-05-09 11:17:51 +02:00
Xuan-Son Nguyen	d9c4accaff	server : (webui) rename has_multimodal --> modalities (#13393 ) * server : (webui) rename has_multimodal --> modalities * allow converting SVG to PNG * less complicated code	2025-05-09 09:06:37 +02:00
Concedo	2f5f4ee65a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # common/CMakeLists.txt	2025-05-09 14:18:20 +08:00
Matt Clayton	f05a6d71a0	mtmd : Expose helper_decode_image_chunk (#13366 ) * mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free * Slim down * Cleanups	2025-05-08 20:25:39 +02:00
Xuan-Son Nguyen	ee01d71e58	server : (webui) fix a very small misalignment (#13387 ) * server : (webui) fix a very small misalignment * restore font-bold	2025-05-08 18:51:45 +02:00
Concedo	2439014a03	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # examples/embedding/embedding.cpp # tools/imatrix/imatrix.cpp # tools/perplexity/perplexity.cpp	2025-05-08 23:41:02 +08:00

1 2

70 commits