koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-09 11:00:40 +00:00

Author	SHA1	Message	Date
Concedo	8c701d7ded	Merge commit '`72b090da2c`' into concedo_experimental # Conflicts: # docs/backend/CANN.md # docs/function-calling.md # examples/embedding/embedding.cpp # examples/retrieval/retrieval.cpp # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/Doxyfile # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/concat.cpp # ggml/src/ggml-sycl/conv.cpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/getrows.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/gla.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/norm.cpp # ggml/src/ggml-sycl/outprod.cpp # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-sycl/tsembd.cpp # ggml/src/ggml-sycl/wkv.cpp # scripts/compare-commits.sh # tests/test-chat.cpp # tests/test-sampling.cpp	2025-05-28 00:28:41 +08:00
Concedo	868cb6aff7	Merge commit '`e121edc432`' into concedo_experimental # Conflicts: # .github/workflows/release.yml # common/CMakeLists.txt # docs/function-calling.md # ggml/src/ggml-sycl/binbcast.cpp # models/templates/README.md # scripts/tool_bench.py # src/llama-kv-cache.cpp # tests/CMakeLists.txt # tests/test-chat.cpp # tools/mtmd/clip.h # tools/rpc/rpc-server.cpp # tools/server/README.md	2025-05-28 00:20:45 +08:00
Olivier Chafik	03f582ae8f	server: fix streaming crashes (#13786 ) * add preludes to content on partial regex match * allow all parsers to parse non-tool-call content. * tweak order of <\|python_tag\|> vs <function= parsing for functionary v3.1 format. still not ideal but hopefully less prone to crash	2025-05-26 16:03:57 +01:00
Olivier Chafik	d74e94c1b3	`server`: fix format of streamed tool call deltas (diff name, fix id location) (#13800 ) * fix deltas of tool_call.function.name * fix tool_call.id (was in tool_call.function.id!) + add function type * add tool_call.type * populate empty tool_call.function.arguments on first delta	2025-05-26 14:56:49 +01:00
Olivier Chafik	f13847cfb5	server: fix regression on streamed non-chat completion w/ stops (#13785 ) * more forgiving message diffs: partial stop words aren't erased, full stops are * Add (slow) server test for completion + stream + stop	2025-05-26 14:16:37 +01:00
Georgi Gerganov	79c137f776	examples : allow extracting embeddings from decoder contexts (#13797 ) ggml-ci	2025-05-26 14:03:54 +03:00
Concedo	89a3742ded	skip unquantizable clip layers	2025-05-26 16:02:49 +08:00
Olivier Chafik	e121edc432	`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771 ) --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-05-26 00:30:51 +01:00
Xuan-Son Nguyen	2f099b510f	webui : bump max upload file size to 500MB (#13779 )	2025-05-25 18:02:18 +01:00
Percy Piper	c508256db2	rpc : Fix build on OpenBSD (#13541 )	2025-05-25 15:35:53 +03:00
Xuan-Son Nguyen	40aaa8a403	mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760 ) * mtmd : add Qwen2-Audio support * small clean up * update discussion link * clarify mtmd_get_output_embd * clarification in multimodal.md * fix ultravox bug * ggml_cont	2025-05-25 14:06:32 +02:00
Olivier Chafik	d785f9c1fd	server: fix/test add_generation_prompt (#13770 ) Co-authored-by: ochafik <ochafik@google.com>	2025-05-25 10:45:49 +01:00
Olivier Chafik	f5cd27b71d	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 ) * add common_json w/ support for truncated json healing * add common_chat_msg_diff * partial common_chat_parse * refactor parser w/ optionals * server: wire chat diffs in stream mode * fix trigger of thinking models (must happen after thoughts are closed) * fix functionary v3.2 raw python! * rename: common_chat_syntax (now contains format) * rm common_regex.at_start * don't return empty <think></think> * accommodate yet another deepseek r1 distill fantasy syntax (`<｜tool▁calls｜>`) * fix QwQ 32B tool call parsing after thoughts (hermes2) * better logs for grammar triggers * consume spaces after parse_json_tool_calls * fix required tool calls w/ thinking models that have pre-opened thinking tags * fix thinking model's initial trigger + test qwq's template * run most test_tool_call tests in stream + non-stream modes * make functionary v3.2 parsing more strict (differentiate first match from others) * send final diff from server, to close off raw python arguments * support partial content streaming in Generic mode * tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5) * Update function-calling.md * Update tool_bench.py * chat-parser: remove input from exception (llm output may contain PII) --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>	2025-05-25 01:48:08 +01:00
Concedo	55cc9acec5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/release.yml # README.md # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/ggml-cann.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/clip.cpp # tools/mtmd/clip.h	2025-05-24 12:10:36 +08:00
Xuan-Son Nguyen	9ecf3e66a3	server : support audio input (#13714 ) * server : support audio input * add audio support on webui	2025-05-23 11:03:47 +02:00
Concedo	22ef97d7d3	Merge commit '`ab86335760`' into concedo_experimental # Conflicts: # .github/workflows/release.yml # examples/retrieval/retrieval.cpp # examples/simple-chat/simple-chat.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # requirements/requirements-convert_hf_to_gguf.txt # requirements/requirements-convert_hf_to_gguf_update.txt # requirements/requirements-convert_lora_to_gguf.txt # tools/run/run.cpp	2025-05-23 11:41:36 +08:00
Georgi Gerganov	8a1d206f1d	tts : fix n_ubatch + make WavTokenizer cache-less (#13713 ) ggml-ci	2025-05-22 22:21:07 +03:00
Xuan-Son Nguyen	797990c4bc	mtmd : add ultravox audio input (#13623 ) * convert ok, load ok * warmup ok * test * still does not work? * fix padding * temporary give up * fix merge conflict * build_ultravox() * rm test * fix merge conflict * add necessary mtmd APIs * first working version (only 4s of audio) * will this monster compile? * fix compile * please compile * fPIC * fix windows * various fixes * clean up audio_helpers * fix conversion * add some debug stuff * long audio input ok * adapt the api * add --audio arg * final touch UX * add miniaudio to readme * fix typo * refactor kv metadata * mtmd_default_marker()	2025-05-22 20:42:48 +02:00
Georgi Gerganov	cc74d5be99	server : pad small embedding batches (#13692 ) ggml-ci	2025-05-22 16:33:39 +03:00
Georgi Gerganov	5fbfe384d4	server : improve error reporting (#13680 )	2025-05-21 19:46:56 +03:00
Concedo	da7fd4aa57	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/musa.Dockerfile # .github/workflows/build.yml # README.md # ci/README.md # docs/docker.md # examples/lookahead/lookahead.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-arg-parser.cpp	2025-05-21 23:12:22 +08:00
Concedo	9f976e9c65	swa full used unless ctx shift and fast forward disabled	2025-05-21 22:47:45 +08:00
Robin Davidsson	0d5c742161	server : Add the endpoints /api/tags and /api/chat (#13659 ) * Add the endpoints /api/tags and /api/chat Add the endpoints /api/tags and /api/chat, and improved the model metadata response * Remove trailing whitespaces * Removed code that is not needed for copilot to work.	2025-05-21 15:15:27 +02:00
Dorin-Andrei Geman	42158ae2e8	server : fix first message identification (#13634 ) * server : fix first message identification When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message. Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> Signed-off-by: Dorin Geman <dorin.geman@docker.com> * server : Fix checks for first role message for stream=True Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> Signed-off-by: Dorin Geman <dorin.geman@docker.com> --------- Signed-off-by: Dorin Geman <dorin.geman@docker.com> Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-05-21 15:07:57 +02:00
Georgi Gerganov	797f2ac062	kv-cache : simplify the interface (#13660 ) * kv-cache : simplify the interface ggml-ci * context : revert llama_batch_allocr position change ggml-ci	2025-05-21 15:11:13 +03:00
Concedo	c0edde61c5	hey what do you know, it worked	2025-05-21 18:50:05 +08:00
Concedo	d04b4eeb04	merge not working	2025-05-21 18:06:41 +08:00
l3utterfly	b7a17463ec	mtmd-helper : bug fix to token batching in mtmd (#13650 ) * Update mtmd-helper.cpp * Update tools/mtmd/mtmd-helper.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-05-20 18:55:30 +02:00
Georgi Gerganov	e298d2fbd0	kv-cache : add SWA support (#13194 ) * kv-cache : prepare for SWA ggml-ci * kv-cache : initial iSWA implementation ggml-ci * kv-cache : rework error recovery logic ggml-ci * models : fix Phi-3 SWA parameters ggml-ci * model : adjust Granite to rope factor changes ggml-ci * server : check if context can do shifts ggml-ci * iswa : for now, always enable shifts (experiment) ggml-ci * kv-cache : simplify SWA logic ggml-ci * kv-cache : apply defrag when we fail to find slots for the batch ggml-ci * llama : update docs about llama_decode ggml-ci * kv-cache : update warning logs when no space for the batch is available ggml-ci * llama : add llama_kv_self_seq_pos_min() * kv-cache : keep track of partial SWA computes and print warnings * server : disallow use cases involving partial SWA context ggml-ci * llama : add param to control SWA cache size ggml-ci * minor : clean-up ggml-ci	2025-05-20 08:05:46 +03:00
Nicolò Scipione	f7c9429c85	sycl : Overcoming workaround for mmap() allocation on Windows (#13482 ) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag	2025-05-20 08:54:43 +08:00
Xuan-Son Nguyen	92ecdcc06a	mtmd : add vision support for llama 4 (#13282 ) * wip llama 4 conversion * rm redundant __init__ * fix conversion * fix conversion * test impl * try this * reshape patch_embeddings_0 * fix view * rm ffn_post_norm * cgraph ok * f32 for pos embd * add image marker tokens * Llama4UnfoldConvolution * correct pixel shuffle * fix merge conflicts * correct * add debug_graph * logits matched, but it still preceives the image incorrectly * fix style * add image_grid_pinpoints * handle llama 4 preprocessing * rm load_image_size * rm unused line * fix * small fix 2 * add test & docs * fix llava-1.6 test * test: add notion of huge models * add comment * add warn about degraded quality	2025-05-19 13:04:14 +02:00
Concedo	59300dbdf5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/actions/windows-setup-curl/action.yml # .github/workflows/build-linux-cross.yml # README.md # common/CMakeLists.txt # examples/parallel/README.md # examples/parallel/parallel.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # tools/server/README.md	2025-05-18 23:27:53 +08:00
Isaac McFadyen	6a2bc8bfb7	server : added --no-prefill-assistant flag (#13608 ) * added no-prefill-assistant flag * reworded documentation comment * updated server README.md	2025-05-17 23:59:48 +02:00
Xuan-Son Nguyen	6aa892ec2a	server : do not return error out of context (with ctx shift disabled) (#13577 )	2025-05-16 21:50:00 +02:00
Xuan-Son Nguyen	aea9f8b4e7	webui : improve accessibility for visually impaired people (#13551 ) * webui : improve accessibility for visually impaired people * add a11y for extra contents * fix some labels being read twice * add skip to main content	2025-05-16 21:49:01 +02:00
Concedo	e5d26a2356	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/CMakeLists.txt # docs/backend/SYCL.md # ggml/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # ggml/src/gguf.cpp # scripts/compare-llama-bench.py # tests/CMakeLists.txt # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-05-16 15:30:31 +08:00
Concedo	6cafc0e73e	Merge commit '`71bdbdb587`' into concedo_experimental # Conflicts: # ggml/src/ggml-cpu/CMakeLists.txt # tools/batched-bench/batched-bench.cpp # tools/mtmd/clip.h	2025-05-16 15:25:15 +08:00
Concedo	12e6928ec2	i'm gonna regret this, aren't i?	2025-05-15 23:59:55 +08:00
Concedo	7a76e237b8	fixed clip quantize again	2025-05-15 23:22:12 +08:00
Diego Devesa	6c8b91500e	llama-bench : fix -ot with dl backends (#13563 )	2025-05-15 15:46:55 +02:00
Xuan-Son Nguyen	3cc1f1f1d2	webui : handle PDF input (as text or image) + convert pasted long content to file (#13562 ) * webui : handle PDF input (as text or image) * handle the case where pdf image + server without mtmd * fix bug missing pages	2025-05-15 14:24:50 +02:00
Piotr Wilkin (ilintar)	c753d7bed0	server : proper error handling for missing elements in messages array (OpenAI compatible backend) (#13540 )	2025-05-15 08:40:58 +02:00
Georgi Gerganov	b2838049cc	bench : handle decode errors (#13548 ) ggml-ci	2025-05-15 05:57:02 +03:00
Olivier Chafik	aa48e373f2	`server`: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 ) * Inject date_string in llama 3.x + fix for functionary v2 https://github.com/ggml-org/llama.cpp/issues/12729 * move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-05-15 02:39:51 +01:00
Olivier Chafik	3198405e98	`common`: add partial regex support (#12808 ) * move string_find_partial_stop & string_ends_with to common * add common_regex (supports partial matches) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * partial regex: add missing iterator end checks * string utils: use string_views * direct throw to avoid ggml.h include * regex-partial: replace missed ggml_asserts --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-14 19:50:57 +01:00
Georgi Gerganov	053174436f	server : passthrough the /models endpoint during loading (#13535 ) * server : passthrough the /models endpoint during loading * server : update readme + return json for "meta" field	2025-05-14 15:42:10 +03:00
Xuan-Son Nguyen	360a9c98e1	server : fix cache_tokens bug with no cache_prompt (#13533 )	2025-05-14 13:35:07 +02:00
Xuan-Son Nguyen	bb1681fbd5	webui : use fflate for more deterministic gzip compress (#13525 ) * webui : use pako for more deterministic gzip compress * simpler code * use fflate instead of pako	2025-05-14 10:26:12 +02:00
Luca Stefani	d486dd3e8e	webui: Allow pasting file from clipboard (#13526 ) * server: Allow pasting file from clipboard * server: Prevent default action on file paste * update build * format then build combined --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-14 10:07:31 +02:00
Ed Addario	e5c834f718	quantize : improve tensor-type pattern matching (#13033 )	2025-05-13 19:12:31 +02:00

1 2 3

101 commits