koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-11 17:44:38 +00:00

Author	SHA1	Message	Date
Concedo	b2c1ff7a13	Merge branch 'upstream' into concedo_experimental # Conflicts: # .ecrc # CMakePresets.json # ci/run.sh # docs/backend/SYCL.md # ggml/src/CMakeLists.txt # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-sampling.cpp	2024-08-27 17:46:40 +08:00
Georgi Gerganov	e5edb210cd	server : update deps (#9183 )	2024-08-26 12:16:57 +03:00
Xuan Son Nguyen	fc54ef0d1c	server : support reading arguments from environment variables (#9105 ) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var	2024-08-21 11:04:34 +02:00
Concedo	1edf83761a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/bench.yml.disabled # Makefile # README.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-vulkan.cpp	2024-08-17 16:21:14 +08:00
Xuan Son Nguyen	8b3befc0e2	server : refactor middleware and /health endpoint (#9056 ) * server : refactor middleware and /health endpoint * move "fail_on_no_slot" to /slots * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix server tests * fix CI * update server docs --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-16 17:19:05 +02:00
Riceball LEE	37501d9c79	server : fix duplicated n_predict key in the generation_settings (#8994 )	2024-08-15 10:28:05 +03:00
Zhenwei Jin	4af8420afb	common : remove duplicate function llama_should_add_bos_token (#8778 )	2024-08-15 10:23:23 +03:00
Jiří Podivín	234b30676a	server : init stop and error fields of the result struct (#9026 ) Signed-off-by: Jiri Podivin <jpodivin@redhat.com>	2024-08-15 09:21:57 +03:00
Concedo	e8de0af3ec	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/python-check-requirements.yml # README.md # docs/backend/SYCL.md # flake.lock # ggml/CMakeLists.txt # ggml/src/kompute-shaders/op_rope_f16.comp # ggml/src/kompute-shaders/op_rope_f32.comp # ggml/src/kompute-shaders/rope_common.comp	2024-08-14 22:25:43 +08:00
compilade	98a532d474	server : fix segfault on long system prompt (#8987 ) * server : fix segfault on long system prompt * server : fix parallel generation with very small batch sizes * server : fix typo in comment	2024-08-14 09:51:02 +03:00
Georgi Gerganov	5ef07e25ac	server : handle models with missing EOS token (#8997 ) ggml-ci	2024-08-12 10:21:50 +03:00
Concedo	bdfe8526b8	Merge branch 'upstream' into concedo_experimental # Conflicts: # .gitignore # CONTRIBUTING.md # Makefile # examples/llava/CMakeLists.txt # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # src/llama-vocab.cpp	2024-08-10 11:42:32 +08:00
Mathieu Geli	daef3ab233	server : add one level list nesting for embeddings (#8936 )	2024-08-09 09:32:02 +03:00
Xuan Son Nguyen	1e6f6554aa	server : add lora hotswap endpoint (WIP) (#8857 ) * server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style	2024-08-06 17:33:39 +02:00
Concedo	e1f97f7fb5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-server.Dockerfile # README.md # flake.lock # ggml/src/ggml-vulkan.cpp # ggml/src/vulkan-shaders/concat.comp # ggml/src/vulkan-shaders/pad.comp # ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # src/llama.cpp # tests/test-backend-ops.cpp	2024-08-06 16:33:26 +08:00
Liu Jia	0a4ce78681	common : Changed tuple to struct (TODO fix) (#8823 ) * common : Changed tuple to struct (TODO fix) Use struct `llama_init_result` to replace the previous std::tuple<struct llama_model , struct llama_context > * delete llama_init_default_params() * delete the extra whitespace	2024-08-05 18:14:10 +02:00
ardfork	978ba3d83d	Server: Don't ignore llama.cpp params (#8754 ) * Don't ignore llama.cpp params * Add fallback for max_tokens	2024-08-04 20:16:23 +02:00
Concedo	101efb66af	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # CMakeLists.txt # Makefile	2024-08-01 10:54:28 +08:00
Igor Okulist	afbbcf3c04	server : update llama-server embedding flag documentation (#8779 ) Fixes #8763	2024-07-31 19:59:09 -04:00
Concedo	ba5babb876	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/apps.nix # .devops/tools.sh # Makefile # README.md # docs/backend/SYCL.md # docs/build.md # examples/CMakeLists.txt # ggml/include/ggml.h # src/llama-vocab.cpp # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-sampling.cpp	2024-07-27 23:15:54 +08:00
Yaiko	01aec4a631	server : add Speech Recognition & Synthesis to UI (#8679 ) * server : add Speech Recognition & Synthesis to UI * server : add Speech Recognition & Synthesis to UI (fixes)	2024-07-26 00:10:16 +02:00
Ujjawal Panchal	4b0eff3df5	docs : Quantum -> Quantized (#8666 ) * docfix: imatrix readme, quantum models -> quantized models. * docfix: server readme: quantum models -> quantized models.	2024-07-25 11:13:27 +03:00
Concedo	01d5175654	Merge branch 'upstream' into concedo_experimental # Conflicts: # Makefile # ggml/src/CMakeLists.txt	2024-07-24 16:41:33 +08:00
Vali Malinoiu	b841d07408	server : fix URL.parse in the UI (#8646 )	2024-07-23 17:37:42 +03:00
Concedo	c81d1623b4	Merge commit '`751fcfc6c3`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # README.md # flake.lock # tests/CMakeLists.txt # tests/test-backend-ops.cpp	2024-07-23 19:18:05 +08:00
Jan Boon	628154492a	server : update doc to clarify n_keep when there is bos token (#8619 )	2024-07-22 11:02:09 +03:00
Concedo	24b9616344	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-intel.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cli.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # CMakeLists.txt # CONTRIBUTING.md # Makefile # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # requirements.txt # src/llama.cpp # tests/test-backend-ops.cpp	2024-07-19 14:23:33 +08:00
Eric Zhang	0d2c7321e9	server: use relative routes for static files in new UI (#8552 ) * server: public: fix api_url on non-index pages * server: public: use relative routes for static files in new UI	2024-07-18 12:43:49 +02:00
RunningLeon	3807c3de04	server : respect `--special` cli arg (#8553 )	2024-07-18 11:06:22 +03:00
Xuan Son Nguyen	4db8f60fe7	fix ci (#8494 )	2024-07-15 19:23:10 +02:00
Concedo	e707ab9025	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/development/HOWTO-add-model.md # docs/development/token_generation_performance_tips.md # flake.lock	2024-07-16 00:49:34 +08:00
M-A	f17f39ff9c	server: update README.md with llama-server --help output [no ci] (#8472 ) The README.md had a stale information. In particular, the --ctx-size "defaults to 512" confused me and I had to check the code to confirm this was false. This the server is evolving rapidly, it's probably better to keep the source of truth at a single place (in the source) and generate the README.md based on that. Did: make llama-server ./llama-server --help > t.txt vimdiff t.txt examples/server/README.md I copied the content inside a backquote block. I would have preferred proper text but it would require a fair amount of surgery to make the current output compatible with markdown. A follow up could be to automate this process with a script. No functional change.	2024-07-15 15:04:56 +03:00
Concedo	602661ba49	Merge commit '`c917b67f06`' into concedo_experimental # Conflicts: # .devops/tools.sh # Makefile # ggml/src/ggml-cuda/mmq.cuh # tests/test-double-float.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp	2024-07-14 11:38:20 +08:00
Georgi Gerganov	4e24cffd8c	server : handle content array in chat API (#8449 ) * server : handle content array in chat API * Update examples/server/utils.hpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-07-12 14:48:15 +03:00
Douglas Hanley	c3ebcfa148	server : ensure batches are either all embed or all completion (#8420 ) * make sure batches are all embed or all non-embed * non-embedding batch for sampled tokens; fix unused params warning	2024-07-12 11:14:12 +03:00
Concedo	2cad736260	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/labeler.yml # .gitignore # CMakeLists.txt # Makefile # Package.swift # README.md # ci/run.sh # docs/build.md # examples/CMakeLists.txt # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # requirements/requirements-convert_hf_to_gguf.txt # requirements/requirements-convert_hf_to_gguf_update.txt # scripts/check-requirements.sh # scripts/compare-llama-bench.py # scripts/gen-unicode-data.py # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-tokenizer-random.py	2024-07-11 16:36:16 +08:00
Clint Herron	278d0e1846	Initialize default slot sampling parameters from the global context. (#8418 )	2024-07-10 20:08:17 -04:00
Clint Herron	a59f8fdc85	Server: Enable setting default sampling parameters via command-line (#8402 ) * Load server sampling parameters from the server context by default. * Wordsmithing comment	2024-07-09 18:26:40 -04:00
compilade	3fd62a6b1c	py : type-check all Python scripts with Pyright (#8341 ) * py : type-check all Python scripts with Pyright * server-tests : use trailing slash in openai base_url * server-tests : add more type annotations * server-tests : strip "chat" from base_url in oai_chat_completions * server-tests : model metadata is a dict * ci : disable pip cache in type-check workflow The cache is not shared between branches, and it's 250MB in size, so it would become quite a big part of the 10GB cache limit of the repo. * py : fix new type errors from master branch * tests : fix test-tokenizer-random.py Apparently, gcc applies optimisations even when pre-processing, which confuses pycparser. * ci : only show warnings and errors in python type-check The "information" level otherwise has entries from 'examples/pydantic_models_to_grammar.py', which could be confusing for someone trying to figure out what failed, considering that these messages can safely be ignored even though they look like errors.	2024-07-07 15:04:39 -04:00
Bjarke Viksøe	cb4d86c4d7	server: Retrieve prompt template in /props (#8337 ) * server: Retrieve prompt template in /props This PR adds the following: - Expose the model's Jinja2 prompt template from the model in the /props endpoint. - Change log-level from Error to Warning for warning about template mismatch. The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it. Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function. * Make string buffer dynamic * Add doc and better string handling * Using chat_template naming convention * Use intermediate vector for string assignment	2024-07-07 11:10:38 +02:00
Concedo	5b605d03ea	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/config.yml # .gitignore # CMakeLists.txt # CONTRIBUTING.md # Makefile # README.md # ci/run.sh # common/common.h # examples/main-cmake-pkg/CMakeLists.txt # ggml/src/CMakeLists.txt # models/ggml-vocab-bert-bge.gguf.inp # models/ggml-vocab-bert-bge.gguf.out # models/ggml-vocab-deepseek-coder.gguf.inp # models/ggml-vocab-deepseek-coder.gguf.out # models/ggml-vocab-deepseek-llm.gguf.inp # models/ggml-vocab-deepseek-llm.gguf.out # models/ggml-vocab-falcon.gguf.inp # models/ggml-vocab-falcon.gguf.out # models/ggml-vocab-gpt-2.gguf.inp # models/ggml-vocab-gpt-2.gguf.out # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # models/ggml-vocab-llama-spm.gguf.inp # models/ggml-vocab-llama-spm.gguf.out # models/ggml-vocab-mpt.gguf.inp # models/ggml-vocab-mpt.gguf.out # models/ggml-vocab-phi-3.gguf.inp # models/ggml-vocab-phi-3.gguf.out # models/ggml-vocab-starcoder.gguf.inp # models/ggml-vocab-starcoder.gguf.out # requirements.txt # requirements/requirements-convert_legacy_llama.txt # scripts/check-requirements.sh # scripts/pod-llama.sh # src/CMakeLists.txt # src/llama.cpp # tests/test-rope.cpp	2024-07-06 00:25:10 +08:00
Pieter Ouwerkerk	5a7447c569	readme : fix minor typos [no ci] (#8314 )	2024-07-05 09:58:41 +03:00
Clint Herron	07a3fc0608	Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258 )	2024-07-02 12:18:10 -04:00
Concedo	02f92f6ecc	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/docker.yml # CMakeLists.txt # Makefile # README.md # examples/llama.android/llama/src/main/cpp/CMakeLists.txt # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp	2024-06-30 10:59:42 +08:00
Concedo	9c10486204	merge the file structure refactor, testing	2024-06-29 12:14:38 +08:00
Sigbjørn Skjæret	38373cfbab	Add SPM infill support (#8016 ) * add --spm-infill option * support --spm-infill * support --spm-infill	2024-06-28 12:53:43 +02:00
Olivier Chafik	139cc621e9	`json`: restore default additionalProperties to false, fix some pattern escapes (#8180 ) * json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset * json: revert default of additionalProperties to false * Update README.md	2024-06-28 09:26:45 +01:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00
Concedo	f3dfa96dbc	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # .github/workflows/docker.yml # README.md # llama.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-llama-grammar.cpp	2024-06-26 18:59:10 +08:00
Olivier Chafik	9b2f16f805	`json`: better support for "type" unions (e.g. nullable arrays w/ typed items) (#7863 ) * json: better suport for "type" arrays (e.g. `{"type": ["array", "null"], "items": {"type": "string"}}`) * json: add test for type: [array, null] fix * update tests	2024-06-26 01:46:35 +01:00

1 2 3 4 5 ...

472 commits