koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 04:00:53 +00:00

Author	SHA1	Message	Date
Concedo	9f56ca0ceb	hide misleading mmq print	2024-04-09 15:18:50 +08:00
Concedo	d54af7fa31	updated swagger json link fix	2024-04-09 14:55:27 +08:00
Concedo	5c323a0661	fixed img2img for different sizes	2024-04-08 23:29:46 +08:00
Concedo	2bc11e00df	fixed stop sequence overwriting	2024-04-08 21:37:47 +08:00
Concedo	1aff35524d	fixed compile issues for ci	2024-04-08 20:32:31 +08:00
Concedo	1ee5f355d4	try fix some compile issues (+1 squashed commits) Squashed commits: [e920e76b] try fix some compile issues	2024-04-08 20:01:46 +08:00
Concedo	fc881b4deb	fixed ssl cert path set if empty, added some basic cleanup of old temp dirs	2024-04-08 18:06:53 +08:00
Concedo	125f84aa02	fixed compiler warnings	2024-04-08 16:40:55 +08:00
Concedo	909e4334f9	fixed indentation in makefile	2024-04-08 16:27:13 +08:00
Concedo	021277ab67	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # flake.lock # scripts/sync-ggml-am.sh # scripts/sync-ggml.last	2024-04-08 16:19:53 +08:00
Concedo	87175db07d	try to fix cuda build makefile	2024-04-08 16:18:13 +08:00
Firat	d752327c33	Adding KodiBot to UI list (#6535 ) KodiBot is free and open source ai chat app released under the GNU General Public License.	2024-04-08 09:48:29 +02:00
Mark Fairbairn	855f54402e	Change Windows AMD example to release build to make inference much faster. (#6525 )	2024-04-07 20:52:19 +02:00
Georgi Gerganov	b909236c0b	flake.lock: Update (#6517 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/f7b3c975cf067e56e7cda6cb098ebe3fb4d74ca2' (2024-03-01) → 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d' (2024-04-01) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8?dir=lib' (2024-02-29) → 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib' (2024-03-29) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29) → 'github:NixOS/nixpkgs/fd281bd6b7d3e32ddfa399853946f782553163b5' (2024-04-03) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-04-07 11:25:30 -07:00
DAN™	e0717e751e	Add GritLM as supported models. (#6513 )	2024-04-07 19:33:59 +02:00
Concedo	aa5124439d	horde workers pause themselves if recent local usage is detected (+1 squashed commits) Squashed commits: [7ebb80bc] horde workers pause themselves if recent local usage is detected	2024-04-07 23:22:50 +08:00
Concedo	81ac0e5656	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/llama-cpp-clblast.srpm.spec # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-cpp.srpm.spec # .devops/nix/package.nix # .devops/server-cuda.Dockerfile # .devops/server-intel.Dockerfile # .devops/server-rocm.Dockerfile # .devops/server-vulkan.Dockerfile # .devops/server.Dockerfile # .github/workflows/build.yml # .github/workflows/code-coverage.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # .github/workflows/gguf-publish.yml # .github/workflows/nix-ci-aarch64.yml # .github/workflows/nix-ci.yml # .github/workflows/python-check-requirements.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .github/workflows/zig-build.yml # CMakeLists.txt # Makefile # README-sycl.md # README.md # ci/run.sh # examples/gguf-split/gguf-split.cpp # flake.lock # flake.nix # llama.cpp # scripts/compare-llama-bench.py # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2024-04-07 22:07:27 +08:00
Georgi Gerganov	c37247796b	sync : ggml	2024-04-07 17:05:51 +03:00
Slava Primenko	f77261a7c5	ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020) `cudaHostRegisterReadOnly` parameter was only introduced in CUDA 11.1 See this issue for more details: https://github.com/ggerganov/examples/whisper/whisper.cpp/issues/2007	2024-04-07 17:05:40 +03:00
Georgi Gerganov	43e8995e75	scripts : sync ggml-cuda folder	2024-04-07 16:08:12 +03:00
Concedo	22f543d09b	Merge commit '`32c8486e1f`' into concedo_experimental # Conflicts: # .devops/nix/package.nix # CMakeLists.txt # Makefile # Package.swift # README.md # build.zig # llama.cpp # tests/test-backend-ops.cpp	2024-04-07 20:39:17 +08:00
Concedo	a530afa1e4	Merge commit '`280345968d`' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/llama-cpp-cuda.srpm.spec # .devops/main-cuda.Dockerfile # .devops/nix/package.nix # .devops/server-cuda.Dockerfile # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # ci/run.sh # docs/token_generation_performance_tips.md # flake.lock # llama.cpp # scripts/LlamaConfig.cmake.in # scripts/compare-commits.sh # scripts/server-llm.sh # tests/test-quantize-fns.cpp	2024-04-07 20:27:17 +08:00
Concedo	d8b808454d	updated lite	2024-04-07 19:45:04 +08:00
limitedAtonement	9472bce308	Run make to build the project (#6457 )	2024-04-07 13:05:40 +02:00
Concedo	bec16d182b	Merge commit '`2f34b865b6`' into concedo_experimental # Conflicts: # .clang-tidy # CMakeLists.txt # Makefile # ggml-cuda.cu	2024-04-07 18:30:35 +08:00
Neo Zhang Jianyu	d4f220a5cc	support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521 )	2024-04-07 10:55:59 +08:00
Concedo	6166fdfde4	added support for OAI chat completions adapter file, added default stop sequences to prevent chat compl leakage	2024-04-07 10:35:20 +08:00
Georgi Gerganov	54ea0698fb	sync : ggml	2024-04-06 18:27:46 +03:00
Concedo	0061299cce	fixed quant tools not compiling, updated docs	2024-04-06 23:11:05 +08:00
Daniel Bevenius	b66aec675c	backend : fix typo in scheduler documentation (ggml/781) Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-06 17:42:26 +03:00
Clint Herron	57dd02c44b	Tests: Added integration tests for GBNF parser (#6472 ) * Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements. * Fixing whitespace errors and cleaning error message alert to be clearer. * Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API. * Comment cleanup. * Reorganizing tests for readability. * Cleaning up debug message to make a bit more sense.	2024-04-06 10:31:33 -04:00
Concedo	273d48ad96	revert cuda pool impl (+1 squashed commits) Squashed commits: [5d5b5062] revert cuda pool impl	2024-04-06 22:02:00 +08:00
Concedo	79c8e87922	remove constraint for img dimension	2024-04-06 19:58:58 +08:00
Concedo	9c0fbf9f73	Merge commit '`ad3a0505e3`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/close-issue.yml # .github/workflows/code-coverage.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # .github/workflows/nix-ci-aarch64.yml # .github/workflows/nix-ci.yml # .github/workflows/python-check-requirements.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .github/workflows/zig-build.yml # .gitignore # CMakeLists.txt # Makefile # README-sycl.md # README.md # build.zig # common/CMakeLists.txt # llama.cpp # tests/CMakeLists.txt # tests/test-backend-ops.cpp	2024-04-06 18:32:57 +08:00
Concedo	c348223dff	Merge commit '`ccf58aa3ec`' into concedo_experimental # Conflicts: # .gitignore # Makefile # README-sycl.md # ggml-cuda.cu	2024-04-06 17:52:53 +08:00
Concedo	743687020d	fixed img2img	2024-04-06 17:29:44 +08:00
Pierrick Hymbert	75cd4c7729	ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495 ) * ci: bench: support sse and fix prompt processing time server: add tokens usage in stream mode * ci: bench: README.md EOL * ci: bench: remove total pp and tg as it is not accurate * ci: bench: fix case when there is no token generated * ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics * ci: bench: fix finish reason rate	2024-04-06 05:40:47 +02:00
Brian	a8bd14d557	gguf.py : add licence and version to gguf writer (#6504 )	2024-04-05 21:41:38 +03:00
Hoang Nguyen	d0f5deebf8	readme : update UI list (#6503 ) * Add MindMac to UI list * Update proprietary description Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-04-05 21:39:43 +03:00
Ting Sun	87e21bbacd	bench : make n_batch and n_ubatch configurable in Batched bench (#6500 ) * bench: make n_batch and n_ubatch configurable * bench: update doc for batched bench	2024-04-05 21:34:53 +03:00
Ouadie EL FAROUKI	1b496a745c	[SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464 ) * moved INTEL_MKL guard from gemm_impl to gemm (wrapper) * Update ggml-sycl.cpp Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com> --------- Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>	2024-04-05 19:05:06 +05:30
alexpinel	a307375c02	readme : add Dot to UI list (#6487 )	2024-04-04 13:22:50 -04:00
Jun Jie	b660a5729e	readme : fix typo (#6481 )	2024-04-04 13:16:37 -04:00
Ed Lepedus	0a1d889e27	server: add cURL support to server Dockerfiles (#6474 ) * server: add cURL support to `full.Dockerfile` * server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile` * server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile` * server: add cURL support to `server-intel.Dockerfile` * server: add cURL support to `server-vulkan.Dockerfile` * fix typo in `server-vulkan.Dockerfile` Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-04 18:31:22 +02:00
Minsoo Cheong	7dda1b727e	ci: exempt master branch workflows from getting cancelled (#6486 ) * ci: exempt master branch workflows from getting cancelled * apply to bench.yml	2024-04-04 18:30:53 +02:00
Ewout ter Hoeven	c666ba26c3	build CI: Name artifacts (#6482 ) Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP. It might be possible to further simplify the packing step (in future PRs).	2024-04-04 17:08:55 +02:00
Shakhar Dasgupta	2e66913e5f	server: allow penalizing repetition of newlines on server webpage (#6431 )	2024-04-04 17:03:00 +02:00
Pierrick Hymbert	8120efee1d	ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478 )	2024-04-04 16:59:04 +02:00
limitedAtonement	a74401f0e5	Correct README link (#6458 ) README is called README.md.	2024-04-04 16:30:02 +02:00
Pierrick Hymbert	7a2c92637a	ci: bench: add more ftype, fix triggers and bot comment (#6466 ) * ci: bench: change trigger path to not spawn on each PR * ci: bench: add more file type for phi-2: q8_0 and f16. - do not show the comment by default * ci: bench: add seed parameter in k6 script * ci: bench: artefact name perf job * Add iteration in the commit status, reduce again the autocomment * ci: bench: add per slot metric in the commit status * Fix trailing spaces	2024-04-04 12:57:58 +03:00

1 2 3 4 5 ...

4108 commits