koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-12 18:09:42 +00:00

Author	SHA1	Message	Date
Concedo	bdfe8526b8	Merge branch 'upstream' into concedo_experimental # Conflicts: # .gitignore # CONTRIBUTING.md # Makefile # examples/llava/CMakeLists.txt # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # src/llama-vocab.cpp	2024-08-10 11:42:32 +08:00
Mathieu Geli	daef3ab233	server : add one level list nesting for embeddings (#8936 )	2024-08-09 09:32:02 +03:00
Xuan Son Nguyen	1e6f6554aa	server : add lora hotswap endpoint (WIP) (#8857 ) * server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style	2024-08-06 17:33:39 +02:00
Concedo	e1f97f7fb5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-server.Dockerfile # README.md # flake.lock # ggml/src/ggml-vulkan.cpp # ggml/src/vulkan-shaders/concat.comp # ggml/src/vulkan-shaders/pad.comp # ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # src/llama.cpp # tests/test-backend-ops.cpp	2024-08-06 16:33:26 +08:00
Liu Jia	0a4ce78681	common : Changed tuple to struct (TODO fix) (#8823 ) * common : Changed tuple to struct (TODO fix) Use struct `llama_init_result` to replace the previous std::tuple<struct llama_model , struct llama_context > * delete llama_init_default_params() * delete the extra whitespace	2024-08-05 18:14:10 +02:00
ardfork	978ba3d83d	Server: Don't ignore llama.cpp params (#8754 ) * Don't ignore llama.cpp params * Add fallback for max_tokens	2024-08-04 20:16:23 +02:00
Concedo	101efb66af	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # CMakeLists.txt # Makefile	2024-08-01 10:54:28 +08:00
Igor Okulist	afbbcf3c04	server : update llama-server embedding flag documentation (#8779 ) Fixes #8763	2024-07-31 19:59:09 -04:00
Concedo	ba5babb876	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/apps.nix # .devops/tools.sh # Makefile # README.md # docs/backend/SYCL.md # docs/build.md # examples/CMakeLists.txt # ggml/include/ggml.h # src/llama-vocab.cpp # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-sampling.cpp	2024-07-27 23:15:54 +08:00
Yaiko	01aec4a631	server : add Speech Recognition & Synthesis to UI (#8679 ) * server : add Speech Recognition & Synthesis to UI * server : add Speech Recognition & Synthesis to UI (fixes)	2024-07-26 00:10:16 +02:00
Ujjawal Panchal	4b0eff3df5	docs : Quantum -> Quantized (#8666 ) * docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.	2024-07-25 11:13:27 +03:00
Concedo	01d5175654	Merge branch 'upstream' into concedo_experimental # Conflicts: # Makefile # ggml/src/CMakeLists.txt	2024-07-24 16:41:33 +08:00
Vali Malinoiu	b841d07408	server : fix URL.parse in the UI (#8646 )	2024-07-23 17:37:42 +03:00
Concedo	c81d1623b4	Merge commit '`751fcfc6c3`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # README.md # flake.lock # tests/CMakeLists.txt # tests/test-backend-ops.cpp	2024-07-23 19:18:05 +08:00
Jan Boon	628154492a	server : update doc to clarify n_keep when there is bos token (#8619 )	2024-07-22 11:02:09 +03:00
Concedo	24b9616344	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-intel.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cli.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # CMakeLists.txt # CONTRIBUTING.md # Makefile # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # requirements.txt # src/llama.cpp # tests/test-backend-ops.cpp	2024-07-19 14:23:33 +08:00
Eric Zhang	0d2c7321e9	server: use relative routes for static files in new UI (#8552 ) * server: public: fix api_url on non-index pages * server: public: use relative routes for static files in new UI	2024-07-18 12:43:49 +02:00
RunningLeon	3807c3de04	server : respect `--special` cli arg (#8553 )	2024-07-18 11:06:22 +03:00
Xuan Son Nguyen	4db8f60fe7	fix ci (#8494 )	2024-07-15 19:23:10 +02:00
Concedo	e707ab9025	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/development/HOWTO-add-model.md # docs/development/token_generation_performance_tips.md # flake.lock	2024-07-16 00:49:34 +08:00
M-A	f17f39ff9c	server: update README.md with llama-server --help output [no ci] (#8472 ) The README.md had a stale information. In particular, the --ctx-size "defaults to 512" confused me and I had to check the code to confirm this was false. This the server is evolving rapidly, it's probably better to keep the source of truth at a single place (in the source) and generate the README.md based on that. Did: make llama-server ./llama-server --help > t.txt vimdiff t.txt examples/server/README.md I copied the content inside a backquote block. I would have preferred proper text but it would require a fair amount of surgery to make the current output compatible with markdown. A follow up could be to automate this process with a script. No functional change.	2024-07-15 15:04:56 +03:00
Concedo	602661ba49	Merge commit '`c917b67f06`' into concedo_experimental # Conflicts: # .devops/tools.sh # Makefile # ggml/src/ggml-cuda/mmq.cuh # tests/test-double-float.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp	2024-07-14 11:38:20 +08:00
Georgi Gerganov	4e24cffd8c	server : handle content array in chat API (#8449 ) * server : handle content array in chat API * Update examples/server/utils.hpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-07-12 14:48:15 +03:00
Douglas Hanley	c3ebcfa148	server : ensure batches are either all embed or all completion (#8420 ) * make sure batches are all embed or all non-embed * non-embedding batch for sampled tokens; fix unused params warning	2024-07-12 11:14:12 +03:00
Concedo	2cad736260	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/labeler.yml # .gitignore # CMakeLists.txt # Makefile # Package.swift # README.md # ci/run.sh # docs/build.md # examples/CMakeLists.txt # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # requirements/requirements-convert_hf_to_gguf.txt # requirements/requirements-convert_hf_to_gguf_update.txt # scripts/check-requirements.sh # scripts/compare-llama-bench.py # scripts/gen-unicode-data.py # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-tokenizer-random.py	2024-07-11 16:36:16 +08:00
Clint Herron	278d0e1846	Initialize default slot sampling parameters from the global context. (#8418 )	2024-07-10 20:08:17 -04:00
Clint Herron	a59f8fdc85	Server: Enable setting default sampling parameters via command-line (#8402 ) * Load server sampling parameters from the server context by default. * Wordsmithing comment	2024-07-09 18:26:40 -04:00
compilade	3fd62a6b1c	py : type-check all Python scripts with Pyright (#8341 ) * py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.	2024-07-07 15:04:39 -04:00
Bjarke Viksøe	cb4d86c4d7	server: Retrieve prompt template in /props (#8337 ) * server: Retrieve prompt template in /props This PR adds the following: - Expose the model's Jinja2 prompt template from the model in the /props endpoint. - Change log-level from Error to Warning for warning about template mismatch. The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it. Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function. * Make string buffer dynamic * Add doc and better string handling * Using chat_template naming convention * Use intermediate vector for string assignment	2024-07-07 11:10:38 +02:00
Concedo	5b605d03ea	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/config.yml # .gitignore # CMakeLists.txt # CONTRIBUTING.md # Makefile # README.md # ci/run.sh # common/common.h # examples/main-cmake-pkg/CMakeLists.txt # ggml/src/CMakeLists.txt # models/ggml-vocab-bert-bge.gguf.inp # models/ggml-vocab-bert-bge.gguf.out # models/ggml-vocab-deepseek-coder.gguf.inp # models/ggml-vocab-deepseek-coder.gguf.out # models/ggml-vocab-deepseek-llm.gguf.inp # models/ggml-vocab-deepseek-llm.gguf.out # models/ggml-vocab-falcon.gguf.inp # models/ggml-vocab-falcon.gguf.out # models/ggml-vocab-gpt-2.gguf.inp # models/ggml-vocab-gpt-2.gguf.out # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # models/ggml-vocab-llama-spm.gguf.inp # models/ggml-vocab-llama-spm.gguf.out # models/ggml-vocab-mpt.gguf.inp # models/ggml-vocab-mpt.gguf.out # models/ggml-vocab-phi-3.gguf.inp # models/ggml-vocab-phi-3.gguf.out # models/ggml-vocab-starcoder.gguf.inp # models/ggml-vocab-starcoder.gguf.out # requirements.txt # requirements/requirements-convert_legacy_llama.txt # scripts/check-requirements.sh # scripts/pod-llama.sh # src/CMakeLists.txt # src/llama.cpp # tests/test-rope.cpp	2024-07-06 00:25:10 +08:00
Pieter Ouwerkerk	5a7447c569	readme : fix minor typos [no ci] (#8314 )	2024-07-05 09:58:41 +03:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
Concedo	02f92f6ecc	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/docker.yml # CMakeLists.txt # Makefile # README.md # examples/llama.android/llama/src/main/cpp/CMakeLists.txt # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp	2024-06-30 10:59:42 +08:00
Concedo	9c10486204	merge the file structure refactor, testing	2024-06-29 12:14:38 +08:00
Sigbjørn Skjæret	38373cfbab	Add SPM infill support (#8016 ) * add --spm-infill option * support --spm-infill * support --spm-infill	2024-06-28 12:53:43 +02:00
Olivier Chafik	139cc621e9	`json`: restore default additionalProperties to false, fix some pattern escapes (#8180 ) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md	2024-06-28 09:26:45 +01:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
Concedo	f3dfa96dbc	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # .github/workflows/docker.yml # README.md # llama.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-llama-grammar.cpp	2024-06-26 18:59:10 +08:00
Olivier Chafik	9b2f16f805	`json`: better support for "type" unions (e.g. nullable arrays w/ typed items) (#7863 ) * json: better suport for "type" arrays (e.g. `{"type": ["array", "null"], "items": {"type": "string"}}`) * json: add test for type: [array, null] fix * update tests	2024-06-26 01:46:35 +01:00
Olivier Chafik	6777c544bd	`json`: fix additionalProperties, allow space after enum/const (#7840 ) * json: default additionalProperty to true * json: don't force additional props after normal properties! * json: allow space after enum/const * json: update pydantic example to set additionalProperties: false * json: prevent additional props to redefine a typed prop * port not_strings to python, add trailing space * fix not_strings & port to js+py * Update json-schema-to-grammar.cpp * fix _not_strings for substring overlaps * json: fix additionalProperties default, uncomment tests * json: add integ. test case for additionalProperties * json: nit: simplify condition * reformat grammar integ tests w/ R"""()""" strings where there's escapes * update # tokens in server test: consts can now have trailing space	2024-06-26 01:45:58 +01:00
Olivier Chafik	84631fe150	`json`: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum (#7797 ) * json: support minimum for positive integer values * json: fix min 0 * json: min + max integer constraints * json: handle negative min / max integer bounds * json: fix missing paren min/max bug * json: proper paren fix * json: integration test for schemas * json: fix bounds tests * Update json-schema-to-grammar.cpp * json: fix negative max * json: fix negative min (w/ more than 1 digit) * Update test-grammar-integration.cpp * json: nit: move string rules together * json: port min/max integer support to Python & JS * nit: move + rename _build_min_max_int * fix min in [1, 9] * Update test-grammar-integration.cpp * add C++11-compatible replacement for std::string_view * add min/max constrained int field to pydantic json schema example * fix merge * json: add integration tests for min/max bounds * reshuffle/merge min/max integ test cases * nits / cleanups * defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)	2024-06-25 20:06:20 +01:00
Xuan Son Nguyen	48e6b92cc3	Add chat template support for llama-cli (#8068 ) * add chat template support for llama-cli * add help message * server: simplify format_chat * more consistent naming * improve * add llama_chat_format_example * fix server * code style * code style * Update examples/main/main.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-25 21:56:49 +10:00
HanishKVC	3791ad2193	SimpleChat v3.1: Boolean chat request options in Settings UI, cache_prompt (#7950 ) * SimpleChat: Allow for chat req bool options to be user controlled * SimpleChat: Allow user to control cache_prompt flag in request * SimpleChat: Add sample GUI images to readme file Show the chat screen and the settings screen * SimpleChat:Readme: Add quickstart block, title to image, cleanup * SimpleChat: RePosition contents of the Info and Settings UI Make it more logically structured and flow through. * SimpleChat: Rename to apiRequestOptions from chatRequestOptions So that it is not wrongly assumed that these request options are used only for chat/completions endpoint. Rather these are used for both the end points, so rename to match semantic better. * SimpleChat: Update image included with readme wrt settings ui * SimpleChat:ReadMe: Switch to webp screen image to reduce size	2024-06-25 21:27:35 +10:00
Concedo	12dfb92436	Merge commit '`d62e4aaa02`' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # .github/workflows/server.yml # CMakeLists.txt # Makefile # common/common.cpp # ggml.c # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp	2024-06-25 18:27:12 +08:00
Aarni Koskela	6a2f298bd7	server : fix JSON-Scheme typo (#7975 )	2024-06-23 11:03:08 -04:00
Concedo	92afdfcae4	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/server.yml # .gitignore # CMakeLists.txt # Makefile # README-sycl.md # README.md # llama.cpp # requirements/requirements-convert-hf-to-gguf-update.txt # requirements/requirements-convert-hf-to-gguf.txt # requirements/requirements-convert-legacy-llama.txt # scripts/sync-ggml.last # tests/test-tokenizer-random.py	2024-06-22 01:33:44 +08:00
sasha0552	ba58993152	server : fix smart slot selection (#8020 )	2024-06-20 09:57:10 +10:00
Sigbjørn Skjæret	91c188d6c2	Only use FIM middle token if it exists (#7648 ) * Only use FIM middle if it exists * Only use FIM middle if it exists	2024-06-18 22:19:45 +10:00
Concedo	b53e760557	Merge commit '`1c641e6aac`' into concedo_experimental # Conflicts: # .devops/cloud-v-pipeline # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cli.Dockerfile # .devops/llama-cpp-clblast.srpm.spec # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-cpp.srpm.spec # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # .devops/nix/apps.nix # .devops/nix/package.nix # .devops/tools.sh # .dockerignore # .github/ISSUE_TEMPLATE/01-bug-low.yml # .github/ISSUE_TEMPLATE/02-bug-medium.yml # .github/ISSUE_TEMPLATE/03-bug-high.yml # .github/ISSUE_TEMPLATE/04-bug-critical.yml # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/docker.yml # .github/workflows/server.yml # .gitignore # Makefile # README-sycl.md # README.md # ci/run.sh # docs/token_generation_performance_tips.md # flake.nix # grammars/README.md # pocs/vdot/CMakeLists.txt # scripts/get-hellaswag.sh # scripts/get-wikitext-103.sh # scripts/get-wikitext-2.sh # scripts/get-winogrande.sh # scripts/hf.sh # scripts/pod-llama.sh # scripts/qnt-all.sh # scripts/run-all-ppl.sh # scripts/run-with-preset.py # scripts/server-llm.sh # tests/test-backend-ops.cpp	2024-06-14 18:41:37 +08:00
Concedo	a8db72eca0	Merge commit '`ef52d1d16a`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # README.md # flake.lock # grammars/README.md # grammars/json.gbnf # grammars/json_arr.gbnf # tests/test-json-schema-to-grammar.cpp	2024-06-13 18:26:45 +08:00

1 2 3 4 5 ...

461 commits