koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	1edf83761a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/bench.yml.disabled # Makefile # README.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-vulkan.cpp	2024-08-17 16:21:14 +08:00
Xuan Son Nguyen	8b3befc0e2	server : refactor middleware and /health endpoint (#9056 ) * server : refactor middleware and /health endpoint * move "fail_on_no_slot" to /slots * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix server tests * fix CI * update server docs --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-16 17:19:05 +02:00
Riceball LEE	37501d9c79	server : fix duplicated n_predict key in the generation_settings (#8994 )	2024-08-15 10:28:05 +03:00
Zhenwei Jin	4af8420afb	common : remove duplicate function llama_should_add_bos_token (#8778 )	2024-08-15 10:23:23 +03:00
Jiří Podivín	234b30676a	server : init stop and error fields of the result struct (#9026 ) Signed-off-by: Jiri Podivin <jpodivin@redhat.com>	2024-08-15 09:21:57 +03:00
Concedo	e8de0af3ec	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/python-check-requirements.yml # README.md # docs/backend/SYCL.md # flake.lock # ggml/CMakeLists.txt # ggml/src/kompute-shaders/op_rope_f16.comp # ggml/src/kompute-shaders/op_rope_f32.comp # ggml/src/kompute-shaders/rope_common.comp	2024-08-14 22:25:43 +08:00
compilade	98a532d474	server : fix segfault on long system prompt (#8987 ) * server : fix segfault on long system prompt * server : fix parallel generation with very small batch sizes * server : fix typo in comment	2024-08-14 09:51:02 +03:00
Georgi Gerganov	5ef07e25ac	server : handle models with missing EOS token (#8997 ) ggml-ci	2024-08-12 10:21:50 +03:00
Concedo	bdfe8526b8	Merge branch 'upstream' into concedo_experimental # Conflicts: # .gitignore # CONTRIBUTING.md # Makefile # examples/llava/CMakeLists.txt # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # src/llama-vocab.cpp	2024-08-10 11:42:32 +08:00
Mathieu Geli	daef3ab233	server : add one level list nesting for embeddings (#8936 )	2024-08-09 09:32:02 +03:00
Xuan Son Nguyen	1e6f6554aa	server : add lora hotswap endpoint (WIP) (#8857 ) * server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style	2024-08-06 17:33:39 +02:00
Concedo	e1f97f7fb5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-server.Dockerfile # README.md # flake.lock # ggml/src/ggml-vulkan.cpp # ggml/src/vulkan-shaders/concat.comp # ggml/src/vulkan-shaders/pad.comp # ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # src/llama.cpp # tests/test-backend-ops.cpp	2024-08-06 16:33:26 +08:00
Liu Jia	0a4ce78681	common : Changed tuple to struct (TODO fix) (#8823 ) * common : Changed tuple to struct (TODO fix) Use struct `llama_init_result` to replace the previous std::tuple<struct llama_model , struct llama_context > * delete llama_init_default_params() * delete the extra whitespace	2024-08-05 18:14:10 +02:00
ardfork	978ba3d83d	Server: Don't ignore llama.cpp params (#8754 ) * Don't ignore llama.cpp params * Add fallback for max_tokens	2024-08-04 20:16:23 +02:00
Concedo	24b9616344	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-intel.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cli.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # CMakeLists.txt # CONTRIBUTING.md # Makefile # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # requirements.txt # src/llama.cpp # tests/test-backend-ops.cpp	2024-07-19 14:23:33 +08:00
RunningLeon	3807c3de04	server : respect `--special` cli arg (#8553 )	2024-07-18 11:06:22 +03:00
Concedo	602661ba49	Merge commit '`c917b67f06`' into concedo_experimental # Conflicts: # .devops/tools.sh # Makefile # ggml/src/ggml-cuda/mmq.cuh # tests/test-double-float.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp	2024-07-14 11:38:20 +08:00
Douglas Hanley	c3ebcfa148	server : ensure batches are either all embed or all completion (#8420 ) * make sure batches are all embed or all non-embed * non-embedding batch for sampled tokens; fix unused params warning	2024-07-12 11:14:12 +03:00
Concedo	2cad736260	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/labeler.yml # .gitignore # CMakeLists.txt # Makefile # Package.swift # README.md # ci/run.sh # docs/build.md # examples/CMakeLists.txt # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # requirements/requirements-convert_hf_to_gguf.txt # requirements/requirements-convert_hf_to_gguf_update.txt # scripts/check-requirements.sh # scripts/compare-llama-bench.py # scripts/gen-unicode-data.py # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-tokenizer-random.py	2024-07-11 16:36:16 +08:00
Clint Herron	278d0e1846	Initialize default slot sampling parameters from the global context. (#8418 )	2024-07-10 20:08:17 -04:00
Clint Herron	a59f8fdc85	Server: Enable setting default sampling parameters via command-line (#8402 ) * Load server sampling parameters from the server context by default. * Wordsmithing comment	2024-07-09 18:26:40 -04:00
Bjarke Viksøe	cb4d86c4d7	server: Retrieve prompt template in /props (#8337 ) * server: Retrieve prompt template in /props This PR adds the following: - Expose the model's Jinja2 prompt template from the model in the /props endpoint. - Change log-level from Error to Warning for warning about template mismatch. The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it. Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function. * Make string buffer dynamic * Add doc and better string handling * Using chat_template naming convention * Use intermediate vector for string assignment	2024-07-07 11:10:38 +02:00
Concedo	02f92f6ecc	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/docker.yml # CMakeLists.txt # Makefile # README.md # examples/llama.android/llama/src/main/cpp/CMakeLists.txt # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp	2024-06-30 10:59:42 +08:00
Sigbjørn Skjæret	38373cfbab	Add SPM infill support (#8016 ) * add --spm-infill option * support --spm-infill * support --spm-infill	2024-06-28 12:53:43 +02:00
Concedo	f3dfa96dbc	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # .github/workflows/docker.yml # README.md # llama.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-llama-grammar.cpp	2024-06-26 18:59:10 +08:00
Xuan Son Nguyen	48e6b92cc3	Add chat template support for llama-cli (#8068 ) * add chat template support for llama-cli * add help message * server: simplify format_chat * more consistent naming * improve * add llama_chat_format_example * fix server * code style * code style * Update examples/main/main.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-06-25 21:56:49 +10:00
Concedo	92afdfcae4	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/server.yml # .gitignore # CMakeLists.txt # Makefile # README-sycl.md # README.md # llama.cpp # requirements/requirements-convert-hf-to-gguf-update.txt # requirements/requirements-convert-hf-to-gguf.txt # requirements/requirements-convert-legacy-llama.txt # scripts/sync-ggml.last # tests/test-tokenizer-random.py	2024-06-22 01:33:44 +08:00
sasha0552	ba58993152	server : fix smart slot selection (#8020 )	2024-06-20 09:57:10 +10:00
Sigbjørn Skjæret	91c188d6c2	Only use FIM middle token if it exists (#7648 ) * Only use FIM middle if it exists * Only use FIM middle if it exists	2024-06-18 22:19:45 +10:00
Concedo	b53e760557	Merge commit '`1c641e6aac`' into concedo_experimental # Conflicts: # .devops/cloud-v-pipeline # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cli.Dockerfile # .devops/llama-cpp-clblast.srpm.spec # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-cpp.srpm.spec # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # .devops/nix/apps.nix # .devops/nix/package.nix # .devops/tools.sh # .dockerignore # .github/ISSUE_TEMPLATE/01-bug-low.yml # .github/ISSUE_TEMPLATE/02-bug-medium.yml # .github/ISSUE_TEMPLATE/03-bug-high.yml # .github/ISSUE_TEMPLATE/04-bug-critical.yml # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/docker.yml # .github/workflows/server.yml # .gitignore # Makefile # README-sycl.md # README.md # ci/run.sh # docs/token_generation_performance_tips.md # flake.nix # grammars/README.md # pocs/vdot/CMakeLists.txt # scripts/get-hellaswag.sh # scripts/get-wikitext-103.sh # scripts/get-wikitext-2.sh # scripts/get-winogrande.sh # scripts/hf.sh # scripts/pod-llama.sh # scripts/qnt-all.sh # scripts/run-all-ppl.sh # scripts/run-with-preset.py # scripts/server-llm.sh # tests/test-backend-ops.cpp	2024-06-14 18:41:37 +08:00
Concedo	a8db72eca0	Merge commit '`ef52d1d16a`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # README.md # flake.lock # grammars/README.md # grammars/json.gbnf # grammars/json_arr.gbnf # tests/test-json-schema-to-grammar.cpp	2024-06-13 18:26:45 +08:00
Georgi Gerganov	704a35b183	server : restore numeric prompts (#7883 )	2024-06-12 14:42:29 +03:00
Georgi Gerganov	d9da0e4986	server : improve "prompt" handling (#7847 )	2024-06-10 14:59:55 +03:00
Concedo	562d980140	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full.Dockerfile # .devops/main-cuda.Dockerfile # .devops/main-rocm.Dockerfile # .devops/main-vulkan.Dockerfile # .devops/main.Dockerfile # .devops/server-cuda.Dockerfile # .devops/server.Dockerfile # README.md # common/CMakeLists.txt # grammars/README.md # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-json-schema-to-grammar.cpp	2024-06-09 17:30:05 +08:00
sasha0552	7a16ce7db2	server : smart slot selection using Longest Common Prefix (#7728 ) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument	2024-06-08 10:50:31 +03:00
woodx	a5cabd7649	server : do not get prompt in infill mode (#7286 ) * avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <wudexiang@bytedance.com>	2024-06-07 10:09:45 +03:00
Georgi Gerganov	f83351f9a6	imatrix : migrate to gpt_params (#7771 ) * imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl	2024-06-06 16:30:58 +03:00
Concedo	6659742a2d	do not merge the removal of opencl	2024-06-05 10:57:52 +08:00
Georgi Gerganov	1442677f92	common : refactor cli arg parsing (#7675 ) * common : gpt_params_parse do not print usage * common : rework usage print (wip) * common : valign * common : rework print_usage * infill : remove cfg support * common : reorder args * server : deduplicate parameters ggml-ci * common : add missing header ggml-ci * common : remote --random-prompt usages ggml-ci * examples : migrate to gpt_params ggml-ci * batched-bench : migrate to gpt_params * retrieval : migrate to gpt_params * common : change defaults for escape and n_ctx * common : remove chatml and instruct params ggml-ci * common : passkey use gpt_params	2024-06-04 21:23:39 +03:00
Concedo	a97f7d5f91	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/main-cuda.Dockerfile # .devops/main-intel.Dockerfile # .devops/main-rocm.Dockerfile # .devops/main.Dockerfile # .devops/server-cuda.Dockerfile # .devops/server-intel.Dockerfile # .devops/server-rocm.Dockerfile # .devops/server.Dockerfile # .devops/tools.sh # .github/workflows/docker.yml # CMakeLists.txt # Makefile # README-sycl.md # README.md # ci/run.sh # llama.cpp # requirements.txt # requirements/requirements-convert-hf-to-gguf-update.txt # requirements/requirements-convert-hf-to-gguf.txt # requirements/requirements-convert-legacy-llama.txt # requirements/requirements-convert-llama-ggml-to-gguf.txt # scripts/check-requirements.sh # scripts/compare-llama-bench.py # scripts/convert-gg.sh # scripts/pod-llama.sh # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-tokenizer-0.sh # tests/test-tokenizer-random.py	2024-06-02 12:28:38 +08:00
Yazan Agha-Schrader	2e666832e6	server : new UI (#7633 ) * ic * migrate my eary work * add the belonging stuff: css,favicon etc * de prompts * chore: Update HTML meta tags in index.html file * add api-key css classes * some necessary fixes * Add API key CSS classes and update styling in style.css * clean the code * move API to the top, rearrange param sliders. update css * add tooltips to the parameters with comprehensible explanations * fix FloatField and BoolField tooltips * fix grammar field width * use template literales for promptFormats.js * update const ModelGenerationInfo * remove ms per token, since not relevant for most webui users and use cases * add phi-3 prompt template * add phi3 to dropdown * add css class * update forgotten css theme * add user message suffix * fix chatml & add llama3 format * fix llama3 prompt template * more prompt format fixes * add more comon stop tokens * add missing char * do not separate with new line or comma * move prompt style * add hacky llama2 prompt solution, reduce redundancy in promptFormats.js * fix toggle state localstorage * add cmd-r prompt et reduce redundancy * set default prompt to empty * move files, clean code * fix css path * add a button to the new ui * move new ui to "/public" due to otherwise problematic CORS behaviour * include new ui in cpp * fix wrong link to old ui * renaming to ensure consistency * fix typos "prompt-format" -> "prompt-formats" * use correct indent * add new ui files to makefile * fix typo	2024-06-01 22:31:48 +03:00
Concedo	9282c307ed	this commit does not work, just for debugging	2024-05-23 20:13:47 +08:00
Georgi Gerganov	6ff13987ad	common : normalize naming style (#7462 ) * common : normalize naming style ggml-ci * common : match declaration / definition order * zig : try to fix build	2024-05-22 20:04:20 +03:00
Concedo	52f9911240	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # Makefile # README.md # requirements.txt # scripts/LlamaConfig.cmake.in	2024-05-21 19:05:52 +08:00
Georgi Gerganov	e932094d58	server : return error on too large embedding input (#7389 )	2024-05-20 08:56:05 +03:00
Johannes Gäßler	41858392e1	server: fix seed being reported back (#7382 )	2024-05-19 17:06:33 +03:00
Concedo	47cbfd6150	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # llama.cpp # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/test-backend-ops.cpp	2024-05-17 22:30:41 +08:00
Radoslav Gerganov	ee94172d33	server : add support for the RPC backend (#7305 ) ref: #7292	2024-05-17 10:00:17 +03:00
Steve Grubb	4f0263633b	server: free sampling contexts on exit (#7264 ) * server: free sampling contexts on exit This cleans up last leak found by the address sanitizer. * fix whitespace * fix whitespace	2024-05-14 16:11:24 +02:00
Concedo	2ee808a747	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # ci/run.sh # llama.cpp # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # requirements.txt # scripts/compare-llama-bench.py # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-tokenizer-1-bpe.cpp	2024-05-14 19:28:47 +08:00

1 2 3 4 5 ...

317 commits