koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-06-01 22:50:53 +00:00

Author	SHA1	Message	Date
Concedo	b120e107f9	Merge branch 'upstream' into concedo_experimental # Conflicts: # .clang-tidy # .devops/musa.Dockerfile # .github/workflows/build-linux-cross.yml # .github/workflows/build.yml # .github/workflows/docker.yml # .gitignore # CODEOWNERS # CONTRIBUTING.md # README.md # build-xcframework.sh # ci/README-MUSA.md # ci/run.sh # common/CMakeLists.txt # docs/docker.md # examples/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/model-conversion/Makefile # examples/model-conversion/README.md # examples/model-conversion/logits.cpp # examples/model-conversion/scripts/causal/compare-logits.py # examples/model-conversion/scripts/causal/run-org-model.py # examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh # examples/model-conversion/scripts/embedding/run-converted-model.sh # examples/model-conversion/scripts/embedding/run-original-model.py # examples/model-conversion/scripts/utils/check-nmse.py # examples/model-conversion/scripts/utils/inspect-org-model.py # examples/model-conversion/scripts/utils/semantic_check.py # ggml/CMakeLists.txt # ggml/include/ggml-zdnn.h # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/set_rows.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/set_rows.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zdnn/ggml-zdnn.cpp # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-quantize-perf.cpp # tests/test-tokenizers-repo.sh # tools/perplexity/perplexity.cpp # tools/server/tests/README.md	2025-09-27 17:09:14 +08:00
Daniel Bevenius	d0991da39d	server : add support for external server for tests (#16243 ) This commit adds support for using an externally started llama-server instance for the server tests. This can be enabled by setting the DEBUG_EXTERNAL environment variable. The motivation for this is to allow debugging of the server itself when investigating a test failure. Instructions for how to do this are added to the README.md file in the tests directory.	2025-09-25 11:36:47 +02:00
Concedo	0dc6b9f418	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/amx/amx.cpp # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.tmpl.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/set_rows.wgsl # ggml/src/ggml-zdnn/ggml-zdnn.cpp # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp	2025-09-21 11:38:47 +08:00
Concedo	326f6f3fad	not sure if working on metal	2025-09-21 11:35:02 +08:00
Radoslav Gerganov	2b6b55a59f	server : include usage statistics only when user request them (#16052 ) * server : include usage statistics only when user request them When serving the OpenAI compatible API, we should check if {"stream_options": {"include_usage": true} is set in the request when deciding whether we should send usage statistics closes: #16048 * add unit test	2025-09-18 10:36:57 +00:00
Aleksander Grygier	a7a98e0fff	SvelteKit-based WebUI (#14839 )	2025-09-17 19:29:13 +02:00
Concedo	6463f5c26b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # CONTRIBUTING.md # docs/backend/CANN.md # examples/eval-callback/eval-callback.cpp # examples/model-conversion/requirements.txt # examples/model-conversion/scripts/causal/run-org-model.py # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zdnn/ggml-zdnn.cpp # models/templates/README.md # requirements/requirements-convert_hf_to_gguf.txt # requirements/requirements-convert_legacy_llama.txt # requirements/requirements-tool_bench.txt # tests/.gitignore # tests/test-backend-ops.cpp # tests/test-chat-parser.cpp # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-tokenizer-random.py	2025-09-11 22:34:45 +08:00
Concedo	5de51b77c1	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/close-issue.yml # docs/build-s390x.md # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # ggml/src/ggml-cuda/fattn-tile-f16.cu # ggml/src/ggml-cuda/fattn.cu # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/tool_bench.py # tests/test-backend-ops.cpp # tools/batched-bench/batched-bench.cpp # tools/server/README.md	2025-09-11 22:28:19 +08:00
Daniel Bevenius	70cd37dbbe	requirements : update transformers/torch for Embedding Gemma (#15828 ) * requirements : update transformers/torch for Embedding Gemma This commit updates the requirements to support converting Embedding Gemma 300m models. The motivation for this change is that during development I had a local copy of the transformers package which is what I used for converting the models. This was a mistake on my part and I should have also updated my transformers version to the official release. I had checked the requirements/requirements-convert_legacy_llama.txt file and noted that the version was >=4.45.1,<5.0.0 and came to the conculusion that no updated would be needed, this assumed that Embedding Gemma would be in a transformers release at the time Commit `fb15d649ed` ("llama : add support for EmbeddingGemma 300m (#15798)) was merged. So anyone wanting to convert themselves would be able to do so. However, Embedding Gemma is a preview release and this commit updates the requirements to use this preview release. * resolve additional python dependencies * fix pyright errors in tokenizer test and remove unused import	2025-09-09 06:06:52 +02:00
Xuan-Son Nguyen	3c3635d2f2	server : speed up tests (#15836 ) * server : speed up tests * clean up * restore timeout_seconds in some places * flake8 * explicit offline	2025-09-06 14:45:24 +02:00
Xuan-Son Nguyen	61bdfd5298	server : implement prompt processing progress report in stream mode (#15827 ) * server : implement `return_progress` * add timings.cache_n * add progress.time_ms * add test * fix test for chat/completions * readme: add docs on timings * use ggml_time_us Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-09-06 13:35:04 +02:00
Concedo	f0d4128e9f	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/backend/CANN.md # examples/model-conversion/Makefile # examples/model-conversion/scripts/causal/compare-embeddings-logits.sh # examples/model-conversion/scripts/causal/convert-model.sh # examples/model-conversion/scripts/causal/run-casual-gen-embeddings-org.py # examples/model-conversion/scripts/causal/run-converted-model-embeddings-logits.sh # examples/model-conversion/scripts/causal/run-converted-model.sh # examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh # examples/model-conversion/scripts/embedding/convert-model.sh # examples/model-conversion/scripts/embedding/modelcard.template # examples/model-conversion/scripts/embedding/run-converted-model.sh # examples/model-conversion/scripts/utils/create-collection-add-model.sh # examples/model-conversion/scripts/utils/inspect-converted-model.sh # examples/model-conversion/scripts/utils/inspect-org-model.py # examples/model-conversion/scripts/utils/perplexity-gen.sh # examples/model-conversion/scripts/utils/perplexity-run-simple.sh # examples/model-conversion/scripts/utils/perplexity-run.sh # examples/model-conversion/scripts/utils/quantize.sh # examples/model-conversion/scripts/utils/run-embedding-server.sh # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # src/llama-context.cpp # tests/test-backend-ops.cpp # tests/test-chat.cpp	2025-09-05 13:25:34 +08:00
Xuan-Son Nguyen	a68d914426	server: add exceed_context_size_error type (#15780 ) * server: add exceed_context_size_error type * change error code to 400	2025-09-04 11:50:23 +02:00
Concedo	2562129271	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # ci/run.sh # docs/backend/CANN.md # examples/speculative/speculative.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/flash_attn_f16.cl # ggml/src/ggml-opencl/kernels/flash_attn_f32.cl # ggml/src/ggml-opencl/kernels/flash_attn_f32_f16.cl # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/gguf.cpp # src/llama-context.cpp # tests/test-sampling.cpp # tools/server/README.md	2025-09-03 17:16:42 +08:00
Georgi Gerganov	0d161f021a	server : enable /slots by default and make it secure (#15630 ) * server : enable /slots by default and make it secure ggml-ci * server : fix tests to pass `--no-slots` when necessary * server : extend /props with info about enabled endpoints	2025-08-31 20:11:58 +03:00
Concedo	7e35954695	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/build.md # docs/function-calling.md # examples/eval-callback/eval-callback.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/kleidiai/kernels.cpp # ggml/src/ggml-cpu/kleidiai/kernels.h # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # scripts/compare-llama-bench.py # scripts/server-bench.py # scripts/tool_bench.py # tests/test-chat.cpp # tools/batched-bench/batched-bench.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-08-31 23:33:36 +08:00
Johannes Gäßler	e81b8e4b7f	llama: use FA + max. GPU layers by default (#15434 ) * llama: use max. GPU layers by default, auto -fa * ggml-backend: abort instead of segfault	2025-08-30 16:32:10 +02:00
Concedo	3060dfb99f	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/model-conversion/Makefile # examples/model-conversion/scripts/causal/convert-model.sh # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cuda/CMakeLists.txt # scripts/compare-commits.sh	2025-08-28 23:17:29 +08:00
Johannes Gäßler	fbef0fad7a	server: higher timeout for tests (#15621 )	2025-08-27 20:58:09 +02:00
Concedo	8b8396c30c	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/build-s390x.md # examples/llama.vim # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # scripts/compare-llama-bench.py # src/CMakeLists.txt # tests/test-backend-ops.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-08-23 11:35:28 +08:00
Concedo	257992d6b8	possibly unstable, needs testing for fa	2025-08-22 17:35:32 +08:00
65a	4afb0a746f	server : Support multimodal completion and embeddings prompts in JSON format (#15108 ) - Use server_tokens in more places in server and util.cpp - Convert most functions that used llama_tokens to server_tokens - Modify input tokenizer to handle JSON objects as subprompts - Break out MTMD prompt parsing into utility function - Support JSON objects with multimodal_data arrays for MTMD prompts along with other existing types - Add capability to model endpoint to indicate if client can send multimodal data - Add tests.	2025-08-22 10:10:14 +02:00
teo	1bc664a26a	server: fix OpenAI API compatibility for usage statistics in chat streams (#15444 )	2025-08-21 00:10:08 +02:00
Concedo	1c41c38a6a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/cuda.Dockerfile # CODEOWNERS # README.md # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-opencl/ggml-opencl.cpp # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/test-chat.cpp # tools/batched-bench/batched-bench.cpp # tools/mtmd/clip.h	2025-08-20 20:34:45 +08:00
Georgi Gerganov	d2fcd91cf9	server : disable context shift by default (#15416 ) * server : disable context shift by default ggml-ci * server : make scopr of test parameters local	2025-08-19 16:46:37 +03:00
Concedo	f430916a71	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/backend/CANN.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # examples/speculative-simple/speculative-simple.cpp # ggml/cmake/ggml-config.cmake.in # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/add.cl # ggml/src/ggml-opencl/kernels/mul.cl # scripts/compare-commits.sh # scripts/compare-llama-bench.py # scripts/sync-ggml.last # tools/server/README.md	2025-08-02 10:25:10 +08:00
Lukas Straub	a9f77a8be3	server : add openai-style logit_bias support (#14946 ) Signed-off-by: Lukas Straub <lukasstraub2@web.de>	2025-07-31 14:08:23 +02:00
Concedo	a17c79b1a9	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/eval-callback/eval-callback.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/gelu.cl # tests/test-backend-ops.cpp	2025-07-07 17:46:58 +08:00
Sigbjørn Skjæret	ddef99522d	server : fix assistant prefilling when content is an array (#14360 )	2025-07-05 09:17:14 +02:00
Concedo	cdda9d16e0	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/tools.sh # build-xcframework.sh # ci/run.sh # examples/Miku.sh # examples/chat-13B.sh # examples/chat-persistent.sh # examples/chat-vicuna.sh # examples/chat.sh # examples/jeopardy/jeopardy.sh # examples/reason-act.sh # examples/server-llama2-13B.sh # examples/sycl/build.sh # examples/sycl/run-llama2.sh # examples/sycl/run-llama3.sh # examples/ts-type-to-grammar.sh # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # scripts/apple/validate-apps.sh # scripts/apple/validate-ios.sh # scripts/apple/validate-macos.sh # scripts/apple/validate-tvos.sh # scripts/apple/validate-visionos.sh # scripts/check-requirements.sh # scripts/ci-run.sh # scripts/compare-commits.sh # scripts/debug-test.sh # scripts/gen-authors.sh # scripts/get-hellaswag.sh # scripts/get-pg.sh # scripts/get-wikitext-103.sh # scripts/get-wikitext-2.sh # scripts/get-winogrande.sh # scripts/hf.sh # scripts/qnt-all.sh # scripts/run-all-perf.sh # scripts/run-all-ppl.sh # scripts/sync-ggml-am.sh # scripts/sync-ggml.sh # scripts/tool_bench.sh # tests/test-backend-ops.cpp # tests/test-lora-conversion-inference.sh # tests/test-tokenizer-0.sh # tools/server/README.md	2025-06-30 20:38:44 +08:00
Vedran Miletić	e9b6350e61	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
Concedo	bc89b465a8	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # .github/workflows/server.yml # README.md # docs/build.md # docs/install.md # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # tests/test-backend-ops.cpp # tests/test-chat.cpp	2025-06-05 11:03:34 +08:00
Olivier Chafik	c9bbc77931	`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933 ) * server: update deepseek reasoning format (now in reasoning_content diffs), add legacy option for compat * update unit/test_tool_call.py::test_thoughts	2025-06-02 10:15:44 -07:00
Concedo	8c701d7ded	Merge commit '`72b090da2c`' into concedo_experimental # Conflicts: # docs/backend/CANN.md # docs/function-calling.md # examples/embedding/embedding.cpp # examples/retrieval/retrieval.cpp # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/Doxyfile # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/concat.cpp # ggml/src/ggml-sycl/conv.cpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/getrows.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/gla.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/norm.cpp # ggml/src/ggml-sycl/outprod.cpp # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-sycl/tsembd.cpp # ggml/src/ggml-sycl/wkv.cpp # scripts/compare-commits.sh # tests/test-chat.cpp # tests/test-sampling.cpp	2025-05-28 00:28:41 +08:00
Concedo	868cb6aff7	Merge commit '`e121edc432`' into concedo_experimental # Conflicts: # .github/workflows/release.yml # common/CMakeLists.txt # docs/function-calling.md # ggml/src/ggml-sycl/binbcast.cpp # models/templates/README.md # scripts/tool_bench.py # src/llama-kv-cache.cpp # tests/CMakeLists.txt # tests/test-chat.cpp # tools/mtmd/clip.h # tools/rpc/rpc-server.cpp # tools/server/README.md	2025-05-28 00:20:45 +08:00
Olivier Chafik	d74e94c1b3	`server`: fix format of streamed tool call deltas (diff name, fix id location) (#13800 ) * fix deltas of tool_call.function.name * fix tool_call.id (was in tool_call.function.id!) + add function type * add tool_call.type * populate empty tool_call.function.arguments on first delta	2025-05-26 14:56:49 +01:00
Olivier Chafik	f13847cfb5	server: fix regression on streamed non-chat completion w/ stops (#13785 ) * more forgiving message diffs: partial stop words aren't erased, full stops are * Add (slow) server test for completion + stream + stop	2025-05-26 14:16:37 +01:00
Olivier Chafik	e121edc432	`server`: add `--reasoning-budget 0` to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771 ) --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-05-26 00:30:51 +01:00
Olivier Chafik	d785f9c1fd	server: fix/test add_generation_prompt (#13770 ) Co-authored-by: ochafik <ochafik@google.com>	2025-05-25 10:45:49 +01:00
Olivier Chafik	f5cd27b71d	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 ) * add common_json w/ support for truncated json healing * add common_chat_msg_diff * partial common_chat_parse * refactor parser w/ optionals * server: wire chat diffs in stream mode * fix trigger of thinking models (must happen after thoughts are closed) * fix functionary v3.2 raw python! * rename: common_chat_syntax (now contains format) * rm common_regex.at_start * don't return empty <think></think> * accommodate yet another deepseek r1 distill fantasy syntax (`<｜tool▁calls｜>`) * fix QwQ 32B tool call parsing after thoughts (hermes2) * better logs for grammar triggers * consume spaces after parse_json_tool_calls * fix required tool calls w/ thinking models that have pre-opened thinking tags * fix thinking model's initial trigger + test qwq's template * run most test_tool_call tests in stream + non-stream modes * make functionary v3.2 parsing more strict (differentiate first match from others) * send final diff from server, to close off raw python arguments * support partial content streaming in Generic mode * tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5) * Update function-calling.md * Update tool_bench.py * chat-parser: remove input from exception (llm output may contain PII) --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>	2025-05-25 01:48:08 +01:00
Concedo	55cc9acec5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/release.yml # README.md # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/ggml-cann.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/clip.cpp # tools/mtmd/clip.h	2025-05-24 12:10:36 +08:00
Xuan-Son Nguyen	9ecf3e66a3	server : support audio input (#13714 ) * server : support audio input * add audio support on webui	2025-05-23 11:03:47 +02:00
Concedo	22ef97d7d3	Merge commit '`ab86335760`' into concedo_experimental # Conflicts: # .github/workflows/release.yml # examples/retrieval/retrieval.cpp # examples/simple-chat/simple-chat.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # requirements/requirements-convert_hf_to_gguf.txt # requirements/requirements-convert_hf_to_gguf_update.txt # requirements/requirements-convert_lora_to_gguf.txt # tools/run/run.cpp	2025-05-23 11:41:36 +08:00
Dorin-Andrei Geman	42158ae2e8	server : fix first message identification (#13634 ) * server : fix first message identification When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message. Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> Signed-off-by: Dorin Geman <dorin.geman@docker.com> * server : Fix checks for first role message for stream=True Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> Signed-off-by: Dorin Geman <dorin.geman@docker.com> --------- Signed-off-by: Dorin Geman <dorin.geman@docker.com> Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-05-21 15:07:57 +02:00
Concedo	59300dbdf5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/actions/windows-setup-curl/action.yml # .github/workflows/build-linux-cross.yml # README.md # common/CMakeLists.txt # examples/parallel/README.md # examples/parallel/parallel.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # tools/server/README.md	2025-05-18 23:27:53 +08:00
Xuan-Son Nguyen	6aa892ec2a	server : do not return error out of context (with ctx shift disabled) (#13577 )	2025-05-16 21:50:00 +02:00
Concedo	e5d26a2356	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/CMakeLists.txt # docs/backend/SYCL.md # ggml/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # ggml/src/gguf.cpp # scripts/compare-llama-bench.py # tests/CMakeLists.txt # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-05-16 15:30:31 +08:00
Olivier Chafik	aa48e373f2	`server`: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 ) * Inject date_string in llama 3.x + fix for functionary v2 https://github.com/ggml-org/llama.cpp/issues/12729 * move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-05-15 02:39:51 +01:00
Xuan-Son Nguyen	360a9c98e1	server : fix cache_tokens bug with no cache_prompt (#13533 )	2025-05-14 13:35:07 +02:00
Concedo	21e31e255b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # README.md # build-xcframework.sh # common/CMakeLists.txt # examples/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-metal/ggml-metal.m # ggml/src/ggml-metal/ggml-metal.metal # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # scripts/compare-llama-bench.py # src/CMakeLists.txt # src/llama-model.cpp # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-opt.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/clip.cpp # tools/rpc/rpc-server.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-05-13 00:28:35 +08:00

1 2

53 commits