Concedo
9f56ca0ceb
hide misleading mmq print
2024-04-09 15:18:50 +08:00
Concedo
d54af7fa31
updated swagger json link fix
2024-04-09 14:55:27 +08:00
Concedo
5c323a0661
fixed img2img for different sizes
2024-04-08 23:29:46 +08:00
Concedo
2bc11e00df
fixed stop sequence overwriting
2024-04-08 21:37:47 +08:00
Concedo
1aff35524d
fixed compile issues for ci
2024-04-08 20:32:31 +08:00
Concedo
1ee5f355d4
try fix some compile issues (+1 squashed commits)
...
Squashed commits:
[e920e76b] try fix some compile issues
2024-04-08 20:01:46 +08:00
Concedo
fc881b4deb
fixed ssl cert path set if empty, added some basic cleanup of old temp dirs
2024-04-08 18:06:53 +08:00
Concedo
125f84aa02
fixed compiler warnings
2024-04-08 16:40:55 +08:00
Concedo
909e4334f9
fixed indentation in makefile
2024-04-08 16:27:13 +08:00
Concedo
021277ab67
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# flake.lock
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
2024-04-08 16:19:53 +08:00
Concedo
87175db07d
try to fix cuda build makefile
2024-04-08 16:18:13 +08:00
Firat
d752327c33
Adding KodiBot to UI list ( #6535 )
...
KodiBot is free and open source ai chat app released under the GNU General Public License.
2024-04-08 09:48:29 +02:00
Mark Fairbairn
855f54402e
Change Windows AMD example to release build to make inference much faster. ( #6525 )
2024-04-07 20:52:19 +02:00
Georgi Gerganov
b909236c0b
flake.lock: Update ( #6517 )
...
Flake lock file updates:
• Updated input 'flake-parts':
'github:hercules-ci/flake-parts/f7b3c975cf067e56e7cda6cb098ebe3fb4d74ca2' (2024-03-01)
→ 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d' (2024-04-01)
• Updated input 'flake-parts/nixpkgs-lib':
'github:NixOS/nixpkgs/1536926ef5621b09bba54035ae2bb6d806d72ac8?dir=lib' (2024-02-29)
→ 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib' (2024-03-29)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089' (2024-03-29)
→ 'github:NixOS/nixpkgs/fd281bd6b7d3e32ddfa399853946f782553163b5' (2024-04-03)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-04-07 11:25:30 -07:00
DAN™
e0717e751e
Add GritLM as supported models. ( #6513 )
2024-04-07 19:33:59 +02:00
Concedo
aa5124439d
horde workers pause themselves if recent local usage is detected (+1 squashed commits)
...
Squashed commits:
[7ebb80bc] horde workers pause themselves if recent local usage is detected
2024-04-07 23:22:50 +08:00
Concedo
81ac0e5656
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/llama-cpp-clblast.srpm.spec
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-rocm.Dockerfile
# .devops/server-vulkan.Dockerfile
# .devops/server.Dockerfile
# .github/workflows/build.yml
# .github/workflows/code-coverage.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .github/workflows/zig-build.yml
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# examples/gguf-split/gguf-split.cpp
# flake.lock
# flake.nix
# llama.cpp
# scripts/compare-llama-bench.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2024-04-07 22:07:27 +08:00
Georgi Gerganov
c37247796b
sync : ggml
2024-04-07 17:05:51 +03:00
Slava Primenko
f77261a7c5
ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020)
...
`cudaHostRegisterReadOnly` parameter was only introduced in CUDA 11.1
See this issue for more details:
https://github.com/ggerganov/examples/whisper/whisper.cpp/issues/2007
2024-04-07 17:05:40 +03:00
Georgi Gerganov
43e8995e75
scripts : sync ggml-cuda folder
2024-04-07 16:08:12 +03:00
Concedo
22f543d09b
Merge commit ' 32c8486e1f' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# CMakeLists.txt
# Makefile
# Package.swift
# README.md
# build.zig
# llama.cpp
# tests/test-backend-ops.cpp
2024-04-07 20:39:17 +08:00
Concedo
a530afa1e4
Merge commit ' 280345968d' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/main-cuda.Dockerfile
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# docs/token_generation_performance_tips.md
# flake.lock
# llama.cpp
# scripts/LlamaConfig.cmake.in
# scripts/compare-commits.sh
# scripts/server-llm.sh
# tests/test-quantize-fns.cpp
2024-04-07 20:27:17 +08:00
Concedo
d8b808454d
updated lite
2024-04-07 19:45:04 +08:00
limitedAtonement
9472bce308
Run make to build the project ( #6457 )
2024-04-07 13:05:40 +02:00
Concedo
bec16d182b
Merge commit ' 2f34b865b6' into concedo_experimental
...
# Conflicts:
# .clang-tidy
# CMakeLists.txt
# Makefile
# ggml-cuda.cu
2024-04-07 18:30:35 +08:00
Neo Zhang Jianyu
d4f220a5cc
support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M ( #6521 )
2024-04-07 10:55:59 +08:00
Concedo
6166fdfde4
added support for OAI chat completions adapter file, added default stop sequences to prevent chat compl leakage
2024-04-07 10:35:20 +08:00
Georgi Gerganov
54ea0698fb
sync : ggml
2024-04-06 18:27:46 +03:00
Concedo
0061299cce
fixed quant tools not compiling, updated docs
2024-04-06 23:11:05 +08:00
Daniel Bevenius
b66aec675c
backend : fix typo in scheduler documentation (ggml/781)
...
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-06 17:42:26 +03:00
Clint Herron
57dd02c44b
Tests: Added integration tests for GBNF parser ( #6472 )
...
* Added integration tests for GBNF parser to validate correctness of parsing, as well as correctness of string matching. Intended for use to pin behavior while working on performance improvements.
* Fixing whitespace errors and cleaning error message alert to be clearer.
* Removing hacky include to llama.cpp from grammar integration test now that needed functions are available via internal API.
* Comment cleanup.
* Reorganizing tests for readability.
* Cleaning up debug message to make a bit more sense.
2024-04-06 10:31:33 -04:00
Concedo
273d48ad96
revert cuda pool impl (+1 squashed commits)
...
Squashed commits:
[5d5b5062] revert cuda pool impl
2024-04-06 22:02:00 +08:00
Concedo
79c8e87922
remove constraint for img dimension
2024-04-06 19:58:58 +08:00
Concedo
9c0fbf9f73
Merge commit ' ad3a0505e3' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/code-coverage.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .github/workflows/zig-build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# build.zig
# common/CMakeLists.txt
# llama.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
2024-04-06 18:32:57 +08:00
Concedo
c348223dff
Merge commit ' ccf58aa3ec' into concedo_experimental
...
# Conflicts:
# .gitignore
# Makefile
# README-sycl.md
# ggml-cuda.cu
2024-04-06 17:52:53 +08:00
Concedo
743687020d
fixed img2img
2024-04-06 17:29:44 +08:00
Pierrick Hymbert
75cd4c7729
ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response ( #6495 )
...
* ci: bench: support sse and fix prompt processing time
server: add tokens usage in stream mode
* ci: bench: README.md EOL
* ci: bench: remove total pp and tg as it is not accurate
* ci: bench: fix case when there is no token generated
* ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics
* ci: bench: fix finish reason rate
2024-04-06 05:40:47 +02:00
Brian
a8bd14d557
gguf.py : add licence and version to gguf writer ( #6504 )
2024-04-05 21:41:38 +03:00
Hoang Nguyen
d0f5deebf8
readme : update UI list ( #6503 )
...
* Add MindMac to UI list
* Update proprietary description
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-04-05 21:39:43 +03:00
Ting Sun
87e21bbacd
bench : make n_batch and n_ubatch configurable in Batched bench ( #6500 )
...
* bench: make n_batch and n_ubatch configurable
* bench: update doc for batched bench
2024-04-05 21:34:53 +03:00
Ouadie EL FAROUKI
1b496a745c
[SYCL] Fixed minor bug when enabling FP16 for non intel targets ( #6464 )
...
* moved INTEL_MKL guard from gemm_impl to gemm (wrapper)
* Update ggml-sycl.cpp
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
---------
Co-authored-by: AidanBeltonS <87009434+AidanBeltonS@users.noreply.github.com>
2024-04-05 19:05:06 +05:30
alexpinel
a307375c02
readme : add Dot to UI list ( #6487 )
2024-04-04 13:22:50 -04:00
Jun Jie
b660a5729e
readme : fix typo ( #6481 )
2024-04-04 13:16:37 -04:00
Ed Lepedus
0a1d889e27
server: add cURL support to server Dockerfiles ( #6474 )
...
* server: add cURL support to `full.Dockerfile`
* server: add cURL support to `full-cuda.Dockerfile` and `server-cuda.Dockerfile`
* server: add cURL support to `full-rocm.Dockerfile` and `server-rocm.Dockerfile`
* server: add cURL support to `server-intel.Dockerfile`
* server: add cURL support to `server-vulkan.Dockerfile`
* fix typo in `server-vulkan.Dockerfile`
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-04 18:31:22 +02:00
Minsoo Cheong
7dda1b727e
ci: exempt master branch workflows from getting cancelled ( #6486 )
...
* ci: exempt master branch workflows from getting cancelled
* apply to bench.yml
2024-04-04 18:30:53 +02:00
Ewout ter Hoeven
c666ba26c3
build CI: Name artifacts ( #6482 )
...
Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP.
It might be possible to further simplify the packing step (in future PRs).
2024-04-04 17:08:55 +02:00
Shakhar Dasgupta
2e66913e5f
server: allow penalizing repetition of newlines on server webpage ( #6431 )
2024-04-04 17:03:00 +02:00
Pierrick Hymbert
8120efee1d
ci: bench fix concurrency for workflow trigger dispatch with sha1 ( #6478 )
2024-04-04 16:59:04 +02:00
limitedAtonement
a74401f0e5
Correct README link ( #6458 )
...
README is called README.md.
2024-04-04 16:30:02 +02:00
Pierrick Hymbert
7a2c92637a
ci: bench: add more ftype, fix triggers and bot comment ( #6466 )
...
* ci: bench: change trigger path to not spawn on each PR
* ci: bench: add more file type for phi-2: q8_0 and f16.
- do not show the comment by default
* ci: bench: add seed parameter in k6 script
* ci: bench: artefact name perf job
* Add iteration in the commit status, reduce again the autocomment
* ci: bench: add per slot metric in the commit status
* Fix trailing spaces
2024-04-04 12:57:58 +03:00