koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-11 09:34:37 +00:00

Author	SHA1	Message	Date
Concedo	5329df2bdf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # cmake/build-info.cmake # examples/run/CMakeLists.txt # examples/run/run.cpp # examples/simple-chat/simple-chat.cpp # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-sampling.cpp	2025-01-21 00:25:07 +08:00
Georgi Gerganov	92bc493917	tests : increase timeout when sanitizers are enabled (#11300 ) * tests : increase timeout when sanitizers are enabled * tests : add DEFAULT_HTTP_TIMEOUT	2025-01-19 20:22:30 +02:00
Xuan Son Nguyen	f30f099228	server : implement cancellable request (#11285 ) * server : implement cancellable request * fix typo * httplib 0.18.5 * fix i underflow	2025-01-18 14:12:05 +01:00
Concedo	11cd7c7bb0	survived the storm, again	2025-01-16 22:25:18 +08:00
Concedo	2a00ee8fa8	broken commit	2025-01-16 21:41:18 +08:00
ebraminio	c5bf0d1bd7	server : Improve code snippets direction between RTL text (#11221 )	2025-01-14 11:39:33 +01:00
ebraminio	504af20ee4	server : (UI) Improve messages bubble shape in RTL (#11220 ) I simply have overlooked message bubble's tail placement for RTL text as I use the dark mode and that isn't visible there and this fixes it.	2025-01-13 20:23:31 +01:00
ebraminio	437e05f714	server : (UI) Support for RTL text as models input or output (#11208 )	2025-01-13 14:46:39 +01:00
Georgi Gerganov	afa8a9ec9b	llama : add `llama_vocab`, functions -> methods, naming (#11110 ) * llama : functions -> methods (#11110) * llama : add struct llama_vocab to the API (#11156) ggml-ci * hparams : move vocab params to llama_vocab (#11159) ggml-ci * vocab : more pimpl (#11165) ggml-ci * vocab : minor tokenization optimizations (#11160) ggml-ci Co-authored-by: Diego Devesa <slarengh@gmail.com> * lora : update API names (#11167) ggml-ci * llama : update API names to use correct prefix (#11174) * llama : update API names to use correct prefix ggml-ci * cont ggml-ci * cont ggml-ci * minor [no ci] * vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174) ggml-ci * vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174) ggml-ci --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-12 11:32:42 +02:00
Concedo	b154bd3671	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/build.md # docs/development/HOWTO-add-model.md # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2025-01-10 17:57:38 +08:00
Daniel Bevenius	8eceb888d7	server : add tooltips to settings and themes btn (#11154 ) * server : add tooltips to settings and themes btn This commit adds tooltips to the settings and themes buttons in the webui. The tooltip will be displayed below the actual buttons when hovered over. The motivation for this change is to clarify the purpose of the themes button. * squash! server : add tooltips to settings and themes btn This commit adds a tooltip to the '...' button when a chat has been started. The tooltip is "Chat options" which think could be a good description as the dropdown contains options to delete or download the current chat. * rm tooltip for 3 dots button --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-09 11:28:29 +01:00
Concedo	dcfa1eca4e	Merge commit '`017cc5f446`' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # CODEOWNERS # examples/batched-bench/batched-bench.cpp # examples/batched/batched.cpp # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp # examples/gritlm/gritlm.cpp # examples/llama-bench/llama-bench.cpp # examples/passkey/passkey.cpp # examples/quantize-stats/quantize-stats.cpp # examples/run/run.cpp # examples/simple-chat/simple-chat.cpp # examples/simple/simple.cpp # examples/tokenize/tokenize.cpp # ggml/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # scripts/sync-ggml.last # src/llama.cpp # tests/test-autorelease.cpp # tests/test-model-load-cancel.cpp # tests/test-tokenizer-0.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-spm.cpp	2025-01-08 23:15:21 +08:00
Georgi Gerganov	a3c1232c3f	arg : option to exclude arguments from specific examples (#11136 ) * arg : option to exclude arguments from specific examples ggml-ci * readme : remove old args [no ci]	2025-01-08 12:55:36 +02:00
Georgi Gerganov	e6e7c75d94	server : fix extra BOS in infill endpoint (#11106 ) * server : fix extra BOS in infill endpoing ggml-ci * server : update infill tests	2025-01-06 15:36:08 +02:00
Georgi Gerganov	727368c60f	llama : use LLAMA_TOKEN_NULL (#11062 ) ggml-ci	2025-01-06 10:52:15 +02:00
Concedo	f9f1585a7f	broken merge - kcpp changes will be applied above this commit for better tracking.	2025-01-03 23:49:17 +08:00
Georgi Gerganov	f66f582927	llama : refactor `src/llama.cpp` (#10902 ) * llama : scatter llama.cpp into multiple modules (wip) * llama : control-vector -> adapter * llama : arch * llama : mmap ggml-ci * ci : remove BUILD_SHARED_LIBS=OFF ggml-ci * llama : arch (cont) ggml-ci * llama : chat ggml-ci * llama : model ggml-ci * llama : hparams ggml-ci * llama : adapter ggml-ci * examples : fix ggml-ci * rebase ggml-ci * minor * llama : kv cache ggml-ci * llama : impl ggml-ci * llama : batch ggml-ci * cont ggml-ci * llama : context ggml-ci * minor * llama : context (cont) ggml-ci * llama : model loader ggml-ci * common : update lora ggml-ci * llama : quant ggml-ci * llama : quant (cont) ggml-ci * minor [no ci]	2025-01-03 10:18:53 +02:00
Concedo	911da8765f	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/run/run.cpp # examples/server/README.md # examples/server/bench/README.md # examples/server/tests/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # tests/test-backend-ops.cpp	2025-01-03 11:56:20 +08:00
Pierrick Hymbert	2f0ee84b9b	server: bench: minor fixes (#10765 ) * server/bench: - support openAI streaming standard output with [DONE]\n\n - export k6 raw results in csv - fix too many tcp idle connection in tcp_wait - add metric time to emit first token * server/bench: - fix when prometheus not started - wait for server to be ready before starting bench	2025-01-02 18:06:12 +01:00
Xuan Son Nguyen	0da5d86026	server : allow using LoRA adapters per-request (#10994 ) * slot.can_batch_with * lora per request * test: force disable cache prompt * move can_batch_with check * fix condition * add slow test with llama 8b * update docs * move lora change task to queue * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * lora_base * remove redundant check --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-02 15:05:18 +01:00
Xuan Son Nguyen	45095a61bf	server : clean up built-in template detection (#11026 ) * server : clean up built-in template detection * fix compilation * add chat template test * fix condition	2024-12-31 15:22:01 +01:00
Xuan Son Nguyen	5896c65232	server : add OAI compat for /v1/completions (#10974 ) * server : add OAI compat for /v1/completions * add test * add docs * better docs	2024-12-31 12:34:13 +01:00
Isaac McFadyen	f865ea149d	server: added more docs for response_fields field (#10995 )	2024-12-28 16:09:19 +01:00
Alexey Parfenov	16cdce7b68	server : fix token duplication when streaming with stop strings (#10997 )	2024-12-28 16:08:54 +01:00
Concedo	7c671f289e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # examples/cvector-generator/mean.hpp # examples/cvector-generator/pca.hpp # examples/export-lora/export-lora.cpp # examples/rpc/rpc-server.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/CMakeLists.txt # examples/server/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-vulkan/ggml-vulkan.cpp # scripts/compare-llama-bench.py # scripts/hf.sh # tests/test-chat-template.cpp	2024-12-28 12:48:34 +08:00
Reza Kakhki	9ba399dfa7	server : add support for "encoding_format": "base64" to the /embeddings endpoints (#10967 ) add support for base64 * fix base64 test * improve test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-24 21:33:04 +01:00
Djip007	2cd43f4900	ggml : more perfo with llamafile tinyblas on x86_64 (#10714 ) * more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: https://github.com/ikawrakow/ik_llama.cpp/pull/71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test	2024-12-24 18:54:49 +01:00
NeverLucky	09fe2e7613	server: allow filtering llama server response fields (#10940 ) * llama_server_response_fields * llama_server_response_fields_fix_issues * params fixes * fix * clarify docs * change to "response_fields" --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-24 17:39:49 +01:00
Xuan Son Nguyen	14b699ecde	server : fix missing model id in /model endpoint (#10957 ) * server : fix missing model id in /model endpoint * fix ci	2024-12-23 12:52:25 +01:00
Xuan Son Nguyen	485dc01214	server : add system_fingerprint to chat/completion (#10917 ) * server : add system_fingerprint to chat/completion * update README	2024-12-23 12:02:44 +01:00
Concedo	4c56b7cada	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/gbnf-validator/gbnf-validator.cpp # examples/llava/clip.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # ggml/src/ggml-cpu/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-12-21 09:41:49 +08:00
Xuan Son Nguyen	0ca416c91a	server : (UI) fix copy to clipboard function (#10916 )	2024-12-20 14:12:06 +01:00
Xuan Son Nguyen	57bb2c40cd	server : fix logprobs, make it OAI-compatible (#10783 ) * server : fix logprobs, make it openai-compatible * update docs * add std::log * return pre-sampling p * sort before apply softmax * add comment * fix test * set p for sampled token * update docs * add --multi-token-probs * update docs * add `post_sampling_probs` option * update docs [no ci] * remove --multi-token-probs * "top_probs" with "post_sampling_probs" * resolve review comments * rename struct token_prob to prob_info * correct comment placement * fix setting prob for sampled token	2024-12-19 15:40:08 +01:00
Concedo	ee486bad3e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # examples/CMakeLists.txt # examples/batched/batched.cpp # examples/gritlm/gritlm.cpp # examples/llama.android/llama/build.gradle.kts # examples/main/README.md # examples/retrieval/retrieval.cpp # examples/server/CMakeLists.txt # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml.c # scripts/compare-commits.sh # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-sampling.cpp	2024-12-19 11:57:43 +08:00
Gaetan Bisson	7bbb5acf12	server: avoid overwriting Authorization header (#10878 ) * server: avoid overwriting Authorization header If no API key is set, leave the Authorization header as is. It may be used by another part of the Web stack, such as an authenticating proxy. Fixes https://github.com/ggerganov/llama.cpp/issues/10854 * rebuild --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-18 15:00:07 +01:00
Georgi Gerganov	152610eda9	server : output embeddings for all tokens when pooling = none (#10861 ) * server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : update readme [no ci] * server : fix spacing [no ci] Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * server : be explicit about the pooling type in the tests ggml-ci * server : update /embeddings and /v1/embeddings endpoints ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * server : update readme ggml-ci * server : fixes * tests : update server tests ggml-ci * server : update readme [no ci] * server : remove rebase artifact --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-18 13:01:41 +02:00
Georgi Gerganov	0e70ba686e	server : add "tokens" output (#10853 ) * server : add "tokens" output ggml-ci * server : update readme ggml-ci * server : return tokens ids only if requested ggml-ci * tests : improve "tokens" type check Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * server : remove "tokens" from the OAI endpoint ggml-ci --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-18 11:05:29 +02:00
Xuan Son Nguyen	46828872c3	server : (embeddings) using same format for "input" and "content" (#10872 ) * server : (embeddings) using same format for "input" and "content" * fix test case * handle empty input case * fix test	2024-12-18 10:55:09 +02:00
krystiancha	05c3a444b8	server : fill usage info in embeddings and rerank responses (#10852 ) * server : fill usage info in embeddings response * server : fill usage info in reranking response	2024-12-17 18:00:24 +02:00
Xuan Son Nguyen	227d7c5a7f	server : (UI) fix missing async generator on safari (#10857 ) * server : (UI) fix missing async generator on safari * fix	2024-12-17 09:52:09 +01:00
Georgi Gerganov	644fd71b44	sampling : refactor + optimize penalties sampler (#10803 ) * sampling : refactor + optimize penalties sampler ggml-ci * common : apply ignore_eos as logit bias ggml-ci * batched : remove penalties sampler * params : allow penalty_last_n == -1 to be equal to context size ggml-ci * common : by default, move the penalties at the end of the sampling chain ggml-ci * common : ignore all EOG tokens Co-authored-by: Diego Devesa <slarengh@gmail.com> * common : move back the penalties at the front of the sampling chain ggml-ci * readme : restore hint about --ignore-eos flag [no ci] * llama : minor ggml-ci * webui : update --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-12-16 12:31:14 +02:00
Vinesh Janarthanan	5478bbcd17	server: (UI) add syntax highlighting and latex math rendering (#10808 ) * add code highlighting and math formatting * code cleanup * build public/index.html * rebuild public/index.html * fixed coding style * fixed coding style * style fixes * highlight: smaller bundle size, fix light & dark theme * remove katex * add bundle size check * add more languages * add php * reuse some langs * use gzip * Revert "remove katex" This reverts commit c0e5046accd10be3f83018cffdc29a652849fc61. * use better maintained @vscode/markdown-it-katex * fix gzip non deterministic * ability to add a demo conversation for dev * fix latex rendering * add comment * latex codeblock as code --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-15 12:55:54 +01:00
Concedo	f456ed7237	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .devops/tools.sh # .github/workflows/build.yml # Makefile # README.md # common/CMakeLists.txt # common/common.h # examples/llava/CMakeLists.txt # examples/run/CMakeLists.txt # examples/run/README.md # examples/run/run.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-kompute/ggml-kompute.cpp # tests/test-backend-ops.cpp # tests/test-rope.cpp	2024-12-15 15:30:10 +08:00
Michelle Tan	89d604f2c8	server: Fix `has_next_line` in JSON response (#10818 ) * Update server JSON response. * Add unit test to check `has_new_line` JSON response * Remove `has_new_line` unit test changes. * Address code review comment: type check for `has_new_line` in unit test	2024-12-14 23:29:45 +01:00
cduk	56eea0781c	Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default (#10771 ) Signed-off-by: Charles Darke <s.cduk@toodevious.com> Co-authored-by: Charles Darke <s.cduk@toodevious.com>	2024-12-13 23:21:49 +01:00
Concedo	ed75f8a741	up to date merge, without vulkan-gen-shaders. They will be built before each release from now on, as they are very large	2024-12-13 17:18:01 +08:00
Concedo	de64b9198c	merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders)	2024-12-13 17:04:19 +08:00
Concedo	4c4ce5e808	rewritten checkpoint 1 - before coopmat	2024-12-13 16:55:23 +08:00
Xuan Son Nguyen	adffa6ffd5	common : improve -ctv -ctk CLI arguments (#10806 ) * common : improve ctv ctk cli argument * regenerate docs * even better approach * use std::vector	2024-12-12 22:53:05 +01:00
CentricStorm	5555c0c1f6	docs: update server streaming mode documentation (#9519 ) Provide more documentation for streaming mode.	2024-12-11 23:40:40 +01:00

1 2 3 4 5 ...

649 commits