koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-11 01:24:36 +00:00

Author	SHA1	Message	Date
Concedo	c6879f3fca	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2024-01-10 16:05:14 +08:00
Behnam M	128de3585b	server : update readme about token probs (#4777 ) * updated server readme to reflect the gg/server-token-probs-4088 commit added explanation for the API's completion result which now includes `completion_probabilities`. Also added a JSON schema that shows the type/structure of `completion_probabilities`. * simplified the `completion_probabilities` JSON schema It's now easier to understand what the structure of `completion_probabilities` looks like. * minor : fix trailing whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-09 12:02:05 +02:00
Concedo	66533c8424	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # Package.swift # README.md # tests/test-quantize-fns.cpp	2024-01-09 17:48:18 +08:00
Zsapi	8c58330318	server : add api-key flag to documentation (#4832 ) Document the api-key flag added to server in https://github.com/ggerganov/llama.cpp/pull/4441	2024-01-09 11:12:43 +02:00
Concedo	f04b6e7287	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/nix/package.nix # CMakeLists.txt # README.md # ggml-metal.m # ggml.c	2024-01-08 14:18:49 +08:00
Georgi Gerganov	67984921a7	server : fix n_predict check (#4798 )	2024-01-07 08:45:26 +02:00
Concedo	c9fdd42da2	Merge branch 'master' into concedo_experimental # Conflicts: # Package.swift	2024-01-05 18:32:54 +08:00
Georgi Gerganov	012cf349ae	server : send token probs for "stream == false" (#4714 )	2024-01-04 19:56:33 +02:00
Michael Coppola	e5804313a1	server : fix options in README.md (#4765 ) * fix examples/server/README.md * minor : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-04 10:17:09 +02:00
Concedo	d37c94bcd9	Merge branch 'master' into concedo_experimental	2024-01-03 22:46:49 +08:00
Concedo	234f79fe9d	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # ci/run.sh # llama.cpp	2024-01-03 22:33:38 +08:00
Justin Parker	f2eb19bd8b	server : throw an error when `slot unavailable` (#4741 )	2024-01-03 10:43:19 +02:00
Phil H	0ef3ca2ac6	server : add token counts to html footer (#4738 ) * server: add token counts to stats * server: generate hpp --------- Co-authored-by: phiharri <ph@got-root.co.uk>	2024-01-02 17:48:49 +02:00
Georgi Gerganov	32866c5edd	editorconfig : fix whitespace and indentation #4710	2024-01-02 13:28:15 +02:00
minarchist	5d7002d437	server : add --override-kv parameter (#4710 ) * Changes to server to allow metadata override * documentation * flake.nix: expose full scope in legacyPackages * flake.nix: rocm not yet supported on aarch64, so hide the output * flake.nix: expose checks * workflows: nix-ci: init; build flake outputs * workflows: nix-ci: add a job for eval * workflows: weekly `nix flake update` * workflows: nix-flakestry: drop tag filters ...and add a job for flakehub.com * workflows: nix-ci: add a qemu job for jetsons * flake.nix: suggest the binary caches * flake.lock: update to a commit recently cached by nixpkgs-cuda-ci --------- Co-authored-by: John <john@jLap.lan> Co-authored-by: Someone Serge <sergei.kozlukov@aalto.fi>	2024-01-02 12:38:15 +02:00
Concedo	9e0dee769b	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # flake.lock # flake.nix	2024-01-01 16:04:17 +08:00
Georgi Gerganov	9fbda719de	clip : refactor + bug fixes (#4696 ) * clip : refactor + bug fixes ggml-ci * server : add log message	2023-12-30 23:24:42 +02:00
Concedo	fe7c200610	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/main-rocm.Dockerfile # README.md # flake.lock # flake.nix # ggml-cuda.cu # requirements.txt # tests/CMakeLists.txt	2023-12-31 00:42:59 +08:00
Cuong Trinh Manh	97bbca6e85	cmake : fix ld warning duplicate libraries libllama.a (#4671 ) * fix "ld: warning: ignoring duplicate libraries: '../libllama.a'" * fix warning in example.	2023-12-29 16:39:15 +02:00
Justine Tunney	db49ff8ed7	server : replace sleep with condition variables (#4673 ) The server currently schedules tasks using a sleep(5ms) busy loop. This adds unnecessary latency since most sleep implementations do a round up to the system scheduling quantum (usually 10ms). Other libc sleep impls spin for smaller time intervals which results in the server's busy loop consuming all available cpu. Having the explicit notify() / wait() code also helps aid in the readability of the server code. See mozilla-Ocho/llamafile@711344b	2023-12-29 16:24:12 +02:00
SakuraUmi	60f55e888c	server : fix OpenAI server sampling w.r.t. penalty. (#4675 )	2023-12-29 16:22:44 +02:00
Karthik Sethuraman	b93edd22f5	server : allow to generate multimodal embeddings (#4681 )	2023-12-29 16:22:10 +02:00
Justine Tunney	65e5f6dadb	Fix OpenAI server sampling w.r.t. temp and seed (#4668 ) The default values for tfs_z and typical_p were being set to zero, which caused the token candidates array to get shrunk down to one element thus preventing any sampling. Note this only applies to OpenAI API compatible HTTP server requests. The solution is to use the default values that OpenAI documents, as well as ensuring we use the llama.cpp defaults for the rest. I've tested this change still ensures deterministic output by default. If a "temperature" greater than 0 is explicitly passed, then output is unique each time. If "seed" is specified in addition to "temperature" then the output becomes deterministic once more. See mozilla-Ocho/llamafile#117 See mozilla-Ocho/llamafile@9e4bf29	2023-12-28 15:20:00 -04:00
Concedo	293395e0f5	Merge commit '`708e179e85`' into concedo_experimental # Conflicts: # .github/workflows/docker.yml	2023-12-25 16:48:15 +08:00
Alexey Parfenov	6123979952	server : allow to specify custom prompt for penalty calculation (#3727 )	2023-12-23 11:31:49 +02:00
Concedo	49a5dfc604	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md	2023-12-19 16:07:48 +08:00
olexiyb	0ffc92d2d2	server : disable llm logs if SERVER_VERBOSE is off (#3792 )	2023-12-17 17:02:16 +02:00
AdithyanI	8edd2b40fd	server : fix grammar being ignored (#4494 ) Fix bug in identifying the grammar.	2023-12-17 16:57:56 +02:00
Alexey Parfenov	eb16dae7e7	server : fix possible ambiguity in content type charset (#4501 )	2023-12-17 16:56:09 +02:00
mzcu	62bd52b7bf	server : allow requests larger than 8K (#4500 )	2023-12-17 16:54:37 +02:00
Concedo	76a3ba42eb	Merge branch 'master' into concedo_experimental # Conflicts: # ggml.c # ggml.h # requirements.txt # tests/test-quantize-perf.cpp	2023-12-16 22:58:53 +08:00
ShadovvBeast	88ae8952b6	server : add optional API Key Authentication example (#4441 ) * Add API key authentication for enhanced server-client security * server : to snake_case --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-15 13:49:01 +02:00
Concedo	c88fc19d59	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md	2023-12-14 16:32:42 +08:00
shibe2	948ff137ec	server : fix handling of characters that span multiple tokens when streaming (#4446 )	2023-12-13 21:57:15 +02:00
Concedo	c2c238b4f3	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # tests/test-grad0.cpp # tests/test-quantize-perf.cpp	2023-12-13 14:49:03 +08:00
kalomaze	fecac45658	server : tweak default sampling parameters (#4367 ) * Set a more typical Top P setting as the default * Update temp max	2023-12-12 12:12:35 +02:00
Richard Kiss	9494d7c477	english : use `typos` to fix comments and logs (#4354 )	2023-12-12 11:53:36 +02:00
Vladimir Zorin	d9d4cfef64	server : fix local model name in server (#4420 )	2023-12-12 11:25:29 +02:00
Yueh-Po Peng	8a7b2fa528	Update README.md (#4388 ) Fix small typo.	2023-12-10 23:27:38 +01:00
Concedo	ec21fa7712	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .gitignore # CMakeLists.txt # Makefile # Package.swift # README.md # ggml-cuda.cu # llama.cpp # llama.h # scripts/sync-ggml.sh # tests/CMakeLists.txt	2023-12-08 17:42:26 +08:00
Georgi Gerganov	bcc0eb4591	llama : per-layer KV cache + quantum K cache (#4309 ) * per-layer KV * remove unnecessary copies * less code duplication, offload k and v separately * llama : offload KV cache per-layer * llama : offload K shift tensors * llama : offload for rest of the model arches * llama : enable offload debug temporarily * llama : keep the KV related layers on the device * llama : remove mirrors, perform Device -> Host when partial offload * common : add command-line arg to disable KV cache offloading * llama : update session save/load * llama : support quantum K cache (#4312) * llama : support quantum K cache (wip) * metal : add F32 -> Q8_0 copy kernel * cuda : add F32 -> Q8_0 copy kernel ggml-ci * cuda : use mmv kernel for quantum cache ops * llama : pass KV cache type through API * llama : fix build ggml-ci * metal : add F32 -> Q4_0 copy kernel * metal : add F32 -> Q4_1 copy kernel * cuda : wip * cuda : add F32 -> Q4_0 and F32 -> Q4_1 copy kernels * llama-bench : support type_k/type_v * metal : use mm kernel only for quantum KV cache * cuda : add comment * llama : remove memory_f16 and kv_f16 flags --------- Co-authored-by: slaren <slarengh@gmail.com> * readme : add API change notice --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-12-07 13:03:17 +02:00
Georgi Gerganov	05cd6e5036	server : recognize cache_prompt parameter in OAI API (#4347 )	2023-12-06 20:21:59 +02:00
Concedo	ac36aee001	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile	2023-12-03 21:56:29 +08:00
Ed Lee	33e171d1e9	server : fix OpenAI API `stop` field to be optional (#4299 ) (cherry picked from commit Mozilla-Ocho/llamafile@e8c92bcb84)	2023-12-03 11:10:43 +02:00
Rickard Edén	6949b50df5	py : add grammar to oai like api (#4294 )	2023-12-03 11:03:25 +02:00
Georgi Gerganov	d5a1cbde60	llama : support optional tensors (#4283 )	2023-12-01 20:35:47 +02:00
Concedo	4f40c226a0	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/tools.sh # .gitignore # CMakeLists.txt # Makefile # README.md	2023-12-01 23:46:59 +08:00
Ziad Ben Hadj-Alouane	1d144112c0	server : add --log-disable to disable logging to file (#4260 ) * * add --log-disable to disable logging to file in the server example * * typo fix	2023-12-01 00:25:49 +02:00
Ziad Ben Hadj-Alouane	f43f09366d	server : add single-client multi-prompt support (#4232 ) * * add multiprompt support * * cleanup * * more cleanup * * remove atomicity of id_gen, and change lock_guard to unique_lock on completion requests * * remove all references to mutex_multitasks * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update examples/server/server.cpp Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * * change to set --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>	2023-12-01 00:25:04 +02:00
rhjdvsgsgks	e2bd725f4b	py : fix oai proxy (#3972 ) * fix oai proxy fix generation not stoped while bot stop talking in chat mode fix possible `slot_id` not exist response for cors (and pre flight) * oai proxy: workaround for some client (such as Chatbox) * use stop as separator to replace hardcoded `\n`	2023-11-30 22:50:40 +02:00

1 2 3 4

164 commits