koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	103d60ed2c	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/common.cpp # examples/batched-bench/batched-bench.cpp # examples/batched/batched.cpp # examples/export-lora/export-lora.cpp # examples/gritlm/gritlm.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-vulkan/CMakeLists.txt # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp	2025-04-03 18:57:49 +08:00
Xuan-Son Nguyen	42eb248f46	common : remove json.hpp from common.cpp (#12697 ) * common : remove json.hpp from common.cpp * fix comment	2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen	267c1399f1	common : refactor downloading system, handle mmproj with -hf option (#12694 ) * (wip) refactor downloading system [no ci] * fix all examples * fix mmproj with -hf * gemma3: update readme * only handle mmproj in llava example * fix multi-shard download * windows: fix problem with std::min and std::max * fix 2	2025-04-01 23:44:05 +02:00
Concedo	396875e1c4	update api docs and lite	2025-03-29 15:39:25 +08:00
Benson Wong	5d01670266	server : include speculative decoding stats when timings_per_token is enabled (#12603 ) * Include speculative decoding stats when timings_per_token is true New fields added to the `timings` object: - draft_n : number of draft tokens generated - draft_accepted_n : number of draft tokens accepted - draft_accept_ratio: ratio of accepted/generated * Remove redundant draft_accept_ratio var * add draft acceptance rate to server console output	2025-03-28 10:05:44 +02:00
Piotr	2099a9d5db	server : Support listening on a unix socket (#12613 ) * server : Bump cpp-httplib to include AF_UNIX windows support Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> * server : Allow running the server example on a unix socket Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com> --------- Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-03-27 23:41:04 +01:00
Concedo	ea358369cc	Merge branch 'upstream' into concedo_experimental # Conflicts: # ci/README.md # ci/run.sh # docs/backend/CUDA-FEDORA.md # docs/build.md # docs/install.md # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/common.cuh # tests/test-backend-ops.cpp	2025-03-26 00:18:01 +08:00
Marius Gerdes	77f9c6bbe5	server : Add verbose output to OAI compatible chat endpoint. (#12246 ) Add verbose output to server_task_result_cmpl_final::to_json_oaicompat_chat_stream, making it conform with server_task_result_cmpl_final::to_json_oaicompat_chat, as well as the other to_json methods.	2025-03-23 19:30:26 +01:00
Concedo	7030ebf401	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/backend/SYCL.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp # ggml/src/ggml-sycl/CMakeLists.txt # tests/test-backend-ops.cpp	2025-03-22 00:32:42 +08:00
Woof Dog	e04643063b	webui : Prevent rerendering on textarea input (#12299 ) * webui: Make textarea uncontrolled to eliminate devastating lag * Update index.html.gz * use signal-style implementation * rm console log * no duplicated savedInitValue set --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-03-20 15:57:43 +01:00
Concedo	0c90d2ebcf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # cmake/common.cmake # docs/backend/SYCL.md # examples/main/README.md # examples/speculative/speculative.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # tests/test-backend-ops.cpp	2025-03-19 19:27:11 +08:00
Georgi Gerganov	810e0af3f5	server : fix warmup draft cache type (#12446 ) ggml-ci	2025-03-18 12:05:42 +02:00
Concedo	67851e5415	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/run/run.cpp # ggml/src/ggml-cann/aclnn_ops.cpp	2025-03-15 19:54:19 +08:00
Victor	add2a3aa5a	server: fix "--grammar-file" parameter (#12285 )	2025-03-14 11:21:17 +01:00
Concedo	0db4ae6237	traded my ink for a pen	2025-03-14 11:58:15 +08:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Ishaan Gandhi	2048b5913d	server : fix crash when using verbose output with input tokens that are not in printable range (#12178 ) (#12338 ) * Fix DOS index bug * Remove new APIs * remove extra line * Remove from API * Add extra newline * Update examples/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-03-13 11:10:05 +01:00
Concedo	77debb1b1b	gemma3 vision works, but is using more tokens than expected - may need resizing	2025-03-13 00:31:16 +08:00
Olivier Chafik	be421fc429	`tool-call`: ensure there's always a non-empty tool call id (#12292 )	2025-03-10 09:45:29 +00:00
Olivier Chafik	2b3a25c212	`sampler`: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 ) * Fix typo in lazy grammar handling (fixes trigger tokens) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-10 09:44:42 +00:00
Concedo	6b7c3ae1d3	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # AUTHORS # README.md # ci/run.sh # docs/build.md # ggml/src/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # scripts/sync-ggml.last	2025-03-10 10:32:41 +08:00
Georgi Gerganov	7ab364390f	server : infill gen ends on new line (#12254 )	2025-03-07 20:54:30 +02:00
Sigbjørn Skjæret	8fad3c7a7c	server : Log original chat template parsing error (#12233 )	2025-03-07 11:15:33 +01:00
Concedo	ec43d2b147	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # common/common.cpp # examples/embedding/embedding.cpp # examples/json_schema_to_grammar.py # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.swiftui/README.md # examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj # examples/lookahead/lookahead.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # requirements.txt # requirements/requirements-all.txt # scripts/fetch_server_test_models.py # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp	2025-03-06 18:54:58 +08:00
Olivier Chafik	669912d9a5	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 ) * sampler: turn lazy grammar trigger words to regexes * add scripts/tool_bench.sh & .py * constrain llama json output regardless of function name if matches at beginning * update relaxed newline space rule in grammar tests * support add_generation_prompt query parameter (useful for /apply_template) * Update src/llama-grammar.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-03-05 13:05:13 +00:00
Clauszy	06a92a193a	server : fix cache reuse logic (#12161 ) The first kv shift offsets the positions of all tokens after head_c. When using llama_kv_cache_seq_rm next, using head_c will remove the valid tokens because their positions have already been offset.	2025-03-05 09:25:45 +02:00
Concedo	0cddbe1f0b	Merge branch 'upstream' into concedo_experimental	2025-03-05 00:22:06 +08:00
Concedo	6b7d2349a7	Rewrite history to fix bad vulkan shader commits without increasing repo size added dpe colab (+8 squashed commit) Squashed commit: [b8362da4] updated lite [ed6c037d] move nsigma into the regular sampler stack [ac5f61c6] relative filepath fixed [05fe96ab] export template [ed0a5a3e] nix_example.md: refactor (#1401) * nix_example.md: add override example * nix_example.md: drop graphics example, already basic nixos knowledge * nix_example.md: format * nix_example.md: Vulkan is disabled on macOS Disabled in: `1ccd253acc` * nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities} Fixes: https://github.com/LostRuins/koboldcpp/issues/1367 [675c62f7] AutoGuess: Phi 4 (mini) (#1402) [`4bf56982`] phrasing [`b8c0df04`] Add Rep Pen to Top N Sigma sampler chain (#1397) - place after nsigma and before xtc (+3 squashed commit) Squashed commit: [`87c52b97`] disable VMM from HIP [`ee8906f3`] edit description [`e85c0e69`] Remove Unnecessary Rep Counting (#1394) * stop counting reps * fix range-based initializer * strike that - reverse it	2025-03-05 00:02:20 +08:00
Olivier Chafik	1a24c4621f	`server`: fix deadly typo in response_format.json_schema.schema handling (#12168 )	2025-03-04 08:24:07 +02:00
Xuan-Son Nguyen	7b69003af7	webui : add ?m=... and ?q=... params (#12148 ) * webui : add ?m=... and ?q=... params * also clear prefilledMessage variable * better approach * fix comment * test: bump timeout on GITHUB_ACTION	2025-03-03 11:42:45 +01:00
Vivian	2cc4a5e44a	webui : minor typo fixes (#12116 ) * fix typos and improve menu text clarity * rename variable trimedValue to trimmedValue * add updated index.html.gz * rebuild --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-03-01 11:15:09 +01:00
Olivier Chafik	d7cfe1ffe0	docs: add docs/function-calling.md to lighten server/README.md's plight (#12069 )	2025-02-25 18:52:56 +00:00
rhjdvsgsgks	401af80b54	server: handle echo=false on /v1/completions (#12060 )	2025-02-25 12:52:52 +01:00
Olivier Chafik	0b52745649	server: support add_generation_prompt query param (#12062 )	2025-02-25 10:40:22 +00:00
Concedo	159c47f0e6	Merge commit '`335eb04a91`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # Makefile # docs/build.md # examples/llama.swiftui/llama.swiftui/UI/ContentView.swift # examples/run/run.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt	2025-02-24 11:55:14 +08:00
Georgi Gerganov	cf756d6e0a	server : disable Nagle's algorithm (#12020 )	2025-02-22 11:46:31 +01:00
momonga	c392e5094d	server (webui): Fix Premature Submission During IME Conversion (#11971 ) * fix skip ime composing * fix npm rebuild * fix warn --------- Co-authored-by: momonga <115213907+mmnga@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-02-20 19:43:22 +01:00
Concedo	6d7ef10671	Merge branch 'upstream' into concedo_experimental Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902 # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # .gitignore # CONTRIBUTING.md # Makefile # common/CMakeLists.txt # common/arg.cpp # common/common.cpp # examples/main/main.cpp # examples/run/run.cpp # examples/server/tests/README.md # ggml/src/ggml-cuda/mma.cuh # scripts/get_chat_template.py # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-chat.cpp	2025-02-20 23:17:20 +08:00
Georgi Gerganov	abd4d0bc4f	speculative : update default params (#11954 ) * speculative : update default params * speculative : do not discard the last drafted token	2025-02-19 13:29:42 +02:00
igardev	b58934c183	server : (webui) Enable communication with parent html (if webui is in iframe) (#11940 ) * Webui: Enable communication with parent html (if webui is in iframe): - Listens for "setText" command from parent with "text" and "context" fields. "text" is set in inputMsg, "context" is used as hidden context on the following requests to the llama.cpp server - On pressing na Escape button sends command "escapePressed" to the parent Example handling from the parent html side: - Send command "setText" from parent html to webui in iframe: const iframe = document.getElementById('askAiIframe'); if (iframe) { iframe.contentWindow.postMessage({ command: 'setText', text: text, context: context }, ''); } - Listen for Escape key from webui on parent html: // Listen for escape key event in the iframe window.addEventListener('keydown', (event) => { if (event.key === 'Escape') { // Process case when Escape is pressed inside webui } }); Move the extraContext from storage to app.context. * Fix formatting. * add Message.extra * format + build * MessageExtraContext * build * fix display * rm console.log --------- Co-authored-by: igardev <ivailo.gardev@akros.ch> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-02-18 23:01:44 +01:00
Olivier Chafik	63e489c025	tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900 ) * tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type * addressed clang-tidy lints in [test-]chat.* * rm minja deps from util & common & move it to common/minja/ * add name & tool_call_id to common_chat_msg * add common_chat_tool * added json <-> tools, msgs conversions to chat.h * fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens) * fix deepseek r1 slow test (no longer <think> opening w/ new template) * allow empty tools w/ auto + grammar * fix & test server grammar & json_schema params w/ & w/o --jinja	2025-02-18 18:03:23 +00:00
Xuan-Son Nguyen	63ac128563	server : add TEI API format for /rerank endpoint (#11942 ) * server : add TEI API format for /rerank endpoint * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix * also gitignore examples/server/*.gz.hpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-18 14:21:41 +01:00
xiaobing318	09aaf4f1f5	docs : Fix duplicated file extension in test command (#11935 ) This commit fixes an issue in the llama.cpp project where the command for testing the llama-server object contained a duplicated file extension. The original command was: ./tests.sh unit/test_chat_completion.py.py -v -x It has been corrected to: ./tests.sh unit/test_chat_completion.py -v -x This change ensures that the test script correctly locates and executes the intended test file, preventing test failures due to an incorrect file name.	2025-02-18 10:12:49 +01:00
Antoine Viallon	c4d29baf32	server : fix divide-by-zero in metrics reporting (#11915 )	2025-02-17 11:25:12 +01:00
Xuan-Son Nguyen	0f2bbe6564	server : bump httplib to 0.19.0 (#11908 )	2025-02-16 17:11:22 +00:00
Concedo	f144b1f345	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-cpp.srpm.spec # .devops/nix/package.nix # .devops/rocm.Dockerfile # .github/ISSUE_TEMPLATE/020-enhancement.yml # .github/ISSUE_TEMPLATE/030-research.yml # .github/ISSUE_TEMPLATE/040-refactor.yml # .github/ISSUE_TEMPLATE/config.yml # .github/pull_request_template.md # .github/workflows/bench.yml.disabled # .github/workflows/build.yml # .github/workflows/labeler.yml # CONTRIBUTING.md # Makefile # README.md # SECURITY.md # ci/README.md # common/CMakeLists.txt # docs/android.md # docs/backend/SYCL.md # docs/build.md # docs/cuda-fedora.md # docs/development/HOWTO-add-model.md # docs/docker.md # docs/install.md # docs/llguidance.md # examples/cvector-generator/README.md # examples/imatrix/README.md # examples/imatrix/imatrix.cpp # examples/llama.android/llama/src/main/cpp/CMakeLists.txt # examples/llama.swiftui/README.md # examples/llama.vim # examples/lookahead/README.md # examples/lookup/README.md # examples/main/README.md # examples/passkey/README.md # examples/pydantic_models_to_grammar_examples.py # examples/retrieval/README.md # examples/server/CMakeLists.txt # examples/server/README.md # examples/simple-cmake-pkg/README.md # examples/speculative/README.md # flake.nix # grammars/README.md # pyproject.toml # scripts/check-requirements.sh	2025-02-16 02:08:39 +08:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Concedo	754fef5204	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/cuda.Dockerfile # .devops/musa.Dockerfile # .github/workflows/build.yml # README.md # docs/docker.md # examples/imatrix/imatrix.cpp # examples/llama-bench/llama-bench.cpp # examples/main/README.md # examples/perplexity/perplexity.cpp # examples/server/README.md # ggml/src/ggml-cpu/ggml-cpu.c # ggml/src/ggml-cuda/CMakeLists.txt # models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja # scripts/get_chat_template.py # scripts/sync-ggml.last # tests/test-chat.cpp # tests/test-gguf.cpp # tests/test-sampling.cpp	2025-02-15 00:49:46 +08:00
Concedo	39fad991cc	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/main/README.md # examples/run/run.cpp	2025-02-14 11:34:29 +08:00
Reza Rahemtola	c1f958c038	server : (docs) Update wrong tool calling example (#11809 ) Call updated to match the tool used in the output just below, following the example in https://github.com/ggerganov/llama.cpp/pull/9639	2025-02-13 17:22:44 +01:00

1 2 3 4 5 ...

744 commits