Concedo
bdfe8526b8
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .gitignore
# CONTRIBUTING.md
# Makefile
# examples/llava/CMakeLists.txt
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# src/llama-vocab.cpp
2024-08-10 11:42:32 +08:00
Mathieu Geli
daef3ab233
server : add one level list nesting for embeddings ( #8936 )
2024-08-09 09:32:02 +03:00
Xuan Son Nguyen
1e6f6554aa
server : add lora hotswap endpoint (WIP) ( #8857 )
...
* server : add lora hotswap endpoint
* handle lora_no_apply
* fix build
* updae docs
* clean up struct def
* fix build
* add LoRA test
* fix style
2024-08-06 17:33:39 +02:00
Concedo
e1f97f7fb5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-server.Dockerfile
# README.md
# flake.lock
# ggml/src/ggml-vulkan.cpp
# ggml/src/vulkan-shaders/concat.comp
# ggml/src/vulkan-shaders/pad.comp
# ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-backend-ops.cpp
2024-08-06 16:33:26 +08:00
Liu Jia
0a4ce78681
common : Changed tuple to struct (TODO fix) ( #8823 )
...
* common : Changed tuple to struct (TODO fix)
Use struct `llama_init_result` to replace the previous
std::tuple<struct llama_model *, struct llama_context *>
* delete llama_init_default_params()
* delete the extra whitespace
2024-08-05 18:14:10 +02:00
ardfork
978ba3d83d
Server: Don't ignore llama.cpp params ( #8754 )
...
* Don't ignore llama.cpp params
* Add fallback for max_tokens
2024-08-04 20:16:23 +02:00
Concedo
101efb66af
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# CMakeLists.txt
# Makefile
2024-08-01 10:54:28 +08:00
Igor Okulist
afbbcf3c04
server : update llama-server embedding flag documentation ( #8779 )
...
Fixes #8763
2024-07-31 19:59:09 -04:00
Concedo
ba5babb876
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/apps.nix
# .devops/tools.sh
# Makefile
# README.md
# docs/backend/SYCL.md
# docs/build.md
# examples/CMakeLists.txt
# ggml/include/ggml.h
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-sampling.cpp
2024-07-27 23:15:54 +08:00
Yaiko
01aec4a631
server : add Speech Recognition & Synthesis to UI ( #8679 )
...
* server : add Speech Recognition & Synthesis to UI
* server : add Speech Recognition & Synthesis to UI (fixes)
2024-07-26 00:10:16 +02:00
Ujjawal Panchal
4b0eff3df5
docs : Quantum -> Quantized ( #8666 )
...
* docfix: imatrix readme, quantum models -> quantized models.
* docfix: server readme: quantum models -> quantized models.
2024-07-25 11:13:27 +03:00
Concedo
01d5175654
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# Makefile
# ggml/src/CMakeLists.txt
2024-07-24 16:41:33 +08:00
Vali Malinoiu
b841d07408
server : fix URL.parse in the UI ( #8646 )
2024-07-23 17:37:42 +03:00
Concedo
c81d1623b4
Merge commit ' 751fcfc6c3
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CONTRIBUTING.md
# README.md
# flake.lock
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
2024-07-23 19:18:05 +08:00
Jan Boon
628154492a
server : update doc to clarify n_keep when there is bos token ( #8619 )
2024-07-22 11:02:09 +03:00
Concedo
24b9616344
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cli.Dockerfile
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# CMakeLists.txt
# CONTRIBUTING.md
# Makefile
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# requirements.txt
# src/llama.cpp
# tests/test-backend-ops.cpp
2024-07-19 14:23:33 +08:00
Eric Zhang
0d2c7321e9
server: use relative routes for static files in new UI ( #8552 )
...
* server: public: fix api_url on non-index pages
* server: public: use relative routes for static files in new UI
2024-07-18 12:43:49 +02:00
RunningLeon
3807c3de04
server : respect --special
cli arg ( #8553 )
2024-07-18 11:06:22 +03:00
Xuan Son Nguyen
4db8f60fe7
fix ci ( #8494 )
2024-07-15 19:23:10 +02:00
Concedo
e707ab9025
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/development/HOWTO-add-model.md
# docs/development/token_generation_performance_tips.md
# flake.lock
2024-07-16 00:49:34 +08:00
M-A
f17f39ff9c
server: update README.md with llama-server --help output [no ci] ( #8472 )
...
The README.md had a stale information. In particular, the --ctx-size
"defaults to 512" confused me and I had to check the code to confirm
this was false. This the server is evolving rapidly, it's probably
better to keep the source of truth at a single place (in the source) and
generate the README.md based on that.
Did:
make llama-server
./llama-server --help > t.txt
vimdiff t.txt examples/server/README.md
I copied the content inside a backquote block. I would have preferred
proper text but it would require a fair amount of surgery to make the
current output compatible with markdown. A follow up could be to
automate this process with a script.
No functional change.
2024-07-15 15:04:56 +03:00
Concedo
602661ba49
Merge commit ' c917b67f06
' into concedo_experimental
...
# Conflicts:
# .devops/tools.sh
# Makefile
# ggml/src/ggml-cuda/mmq.cuh
# tests/test-double-float.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
2024-07-14 11:38:20 +08:00
Georgi Gerganov
4e24cffd8c
server : handle content array in chat API ( #8449 )
...
* server : handle content array in chat API
* Update examples/server/utils.hpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-07-12 14:48:15 +03:00
Douglas Hanley
c3ebcfa148
server : ensure batches are either all embed or all completion ( #8420 )
...
* make sure batches are all embed or all non-embed
* non-embedding batch for sampled tokens; fix unused params warning
2024-07-12 11:14:12 +03:00
Concedo
2cad736260
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .github/labeler.yml
# .gitignore
# CMakeLists.txt
# Makefile
# Package.swift
# README.md
# ci/run.sh
# docs/build.md
# examples/CMakeLists.txt
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# requirements/requirements-convert_hf_to_gguf.txt
# requirements/requirements-convert_hf_to_gguf_update.txt
# scripts/check-requirements.sh
# scripts/compare-llama-bench.py
# scripts/gen-unicode-data.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-tokenizer-random.py
2024-07-11 16:36:16 +08:00
Clint Herron
278d0e1846
Initialize default slot sampling parameters from the global context. ( #8418 )
2024-07-10 20:08:17 -04:00
Clint Herron
a59f8fdc85
Server: Enable setting default sampling parameters via command-line ( #8402 )
...
* Load server sampling parameters from the server context by default.
* Wordsmithing comment
2024-07-09 18:26:40 -04:00
compilade
3fd62a6b1c
py : type-check all Python scripts with Pyright ( #8341 )
...
* py : type-check all Python scripts with Pyright
* server-tests : use trailing slash in openai base_url
* server-tests : add more type annotations
* server-tests : strip "chat" from base_url in oai_chat_completions
* server-tests : model metadata is a dict
* ci : disable pip cache in type-check workflow
The cache is not shared between branches, and it's 250MB in size,
so it would become quite a big part of the 10GB cache limit of the repo.
* py : fix new type errors from master branch
* tests : fix test-tokenizer-random.py
Apparently, gcc applies optimisations even when pre-processing,
which confuses pycparser.
* ci : only show warnings and errors in python type-check
The "information" level otherwise has entries
from 'examples/pydantic_models_to_grammar.py',
which could be confusing for someone trying to figure out what failed,
considering that these messages can safely be ignored
even though they look like errors.
2024-07-07 15:04:39 -04:00
Bjarke Viksøe
cb4d86c4d7
server: Retrieve prompt template in /props ( #8337 )
...
* server: Retrieve prompt template in /props
This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.
The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.
Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.
* Make string buffer dynamic
* Add doc and better string handling
* Using chat_template naming convention
* Use intermediate vector for string assignment
2024-07-07 11:10:38 +02:00
Concedo
5b605d03ea
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/config.yml
# .gitignore
# CMakeLists.txt
# CONTRIBUTING.md
# Makefile
# README.md
# ci/run.sh
# common/common.h
# examples/main-cmake-pkg/CMakeLists.txt
# ggml/src/CMakeLists.txt
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements.txt
# requirements/requirements-convert_legacy_llama.txt
# scripts/check-requirements.sh
# scripts/pod-llama.sh
# src/CMakeLists.txt
# src/llama.cpp
# tests/test-rope.cpp
2024-07-06 00:25:10 +08:00
Pieter Ouwerkerk
5a7447c569
readme : fix minor typos [no ci] ( #8314 )
2024-07-05 09:58:41 +03:00
Clint Herron
07a3fc0608
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. ( #8258 )
2024-07-02 12:18:10 -04:00
Concedo
02f92f6ecc
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
# examples/llama.android/llama/src/main/cpp/CMakeLists.txt
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2024-06-30 10:59:42 +08:00
Concedo
9c10486204
merge the file structure refactor, testing
2024-06-29 12:14:38 +08:00
Sigbjørn Skjæret
38373cfbab
Add SPM infill support ( #8016 )
...
* add --spm-infill option
* support --spm-infill
* support --spm-infill
2024-06-28 12:53:43 +02:00
Olivier Chafik
139cc621e9
json
: restore default additionalProperties to false, fix some pattern escapes (#8180 )
...
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset
* json: revert default of additionalProperties to false
* Update README.md
2024-06-28 09:26:45 +01:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Concedo
f3dfa96dbc
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# .github/workflows/docker.yml
# README.md
# llama.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
2024-06-26 18:59:10 +08:00
Olivier Chafik
9b2f16f805
json
: better support for "type" unions (e.g. nullable arrays w/ typed items) (#7863 )
...
* json: better suport for "type" arrays (e.g. `{"type": ["array", "null"], "items": {"type": "string"}}`)
* json: add test for type: [array, null] fix
* update tests
2024-06-26 01:46:35 +01:00
Olivier Chafik
6777c544bd
json
: fix additionalProperties, allow space after enum/const (#7840 )
...
* json: default additionalProperty to true
* json: don't force additional props after normal properties!
* json: allow space after enum/const
* json: update pydantic example to set additionalProperties: false
* json: prevent additional props to redefine a typed prop
* port not_strings to python, add trailing space
* fix not_strings & port to js+py
* Update json-schema-to-grammar.cpp
* fix _not_strings for substring overlaps
* json: fix additionalProperties default, uncomment tests
* json: add integ. test case for additionalProperties
* json: nit: simplify condition
* reformat grammar integ tests w/ R"""()""" strings where there's escapes
* update # tokens in server test: consts can now have trailing space
2024-06-26 01:45:58 +01:00
Olivier Chafik
84631fe150
json
: support integer minimum, maximum, exclusiveMinimum, exclusiveMaximum (#7797 )
...
* json: support minimum for positive integer values
* json: fix min 0
* json: min + max integer constraints
* json: handle negative min / max integer bounds
* json: fix missing paren min/max bug
* json: proper paren fix
* json: integration test for schemas
* json: fix bounds tests
* Update json-schema-to-grammar.cpp
* json: fix negative max
* json: fix negative min (w/ more than 1 digit)
* Update test-grammar-integration.cpp
* json: nit: move string rules together
* json: port min/max integer support to Python & JS
* nit: move + rename _build_min_max_int
* fix min in [1, 9]
* Update test-grammar-integration.cpp
* add C++11-compatible replacement for std::string_view
* add min/max constrained int field to pydantic json schema example
* fix merge
* json: add integration tests for min/max bounds
* reshuffle/merge min/max integ test cases
* nits / cleanups
* defensive code against string out of bounds (apparently different behaviour of libstdc++ vs. clang's libc++, can't read final NULL char w/ former)
2024-06-25 20:06:20 +01:00
Xuan Son Nguyen
48e6b92cc3
Add chat template support for llama-cli ( #8068 )
...
* add chat template support for llama-cli
* add help message
* server: simplify format_chat
* more consistent naming
* improve
* add llama_chat_format_example
* fix server
* code style
* code style
* Update examples/main/main.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-25 21:56:49 +10:00
HanishKVC
3791ad2193
SimpleChat v3.1: Boolean chat request options in Settings UI, cache_prompt ( #7950 )
...
* SimpleChat: Allow for chat req bool options to be user controlled
* SimpleChat: Allow user to control cache_prompt flag in request
* SimpleChat: Add sample GUI images to readme file
Show the chat screen and the settings screen
* SimpleChat:Readme: Add quickstart block, title to image, cleanup
* SimpleChat: RePosition contents of the Info and Settings UI
Make it more logically structured and flow through.
* SimpleChat: Rename to apiRequestOptions from chatRequestOptions
So that it is not wrongly assumed that these request options are
used only for chat/completions endpoint. Rather these are used
for both the end points, so rename to match semantic better.
* SimpleChat: Update image included with readme wrt settings ui
* SimpleChat:ReadMe: Switch to webp screen image to reduce size
2024-06-25 21:27:35 +10:00
Concedo
12dfb92436
Merge commit ' d62e4aaa02
' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# .github/workflows/server.yml
# CMakeLists.txt
# Makefile
# common/common.cpp
# ggml.c
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
2024-06-25 18:27:12 +08:00
Aarni Koskela
6a2f298bd7
server : fix JSON-Scheme typo ( #7975 )
2024-06-23 11:03:08 -04:00
Concedo
92afdfcae4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/server.yml
# .gitignore
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# llama.cpp
# requirements/requirements-convert-hf-to-gguf-update.txt
# requirements/requirements-convert-hf-to-gguf.txt
# requirements/requirements-convert-legacy-llama.txt
# scripts/sync-ggml.last
# tests/test-tokenizer-random.py
2024-06-22 01:33:44 +08:00
sasha0552
ba58993152
server : fix smart slot selection ( #8020 )
2024-06-20 09:57:10 +10:00
Sigbjørn Skjæret
91c188d6c2
Only use FIM middle token if it exists ( #7648 )
...
* Only use FIM middle if it exists
* Only use FIM middle if it exists
2024-06-18 22:19:45 +10:00
Concedo
b53e760557
Merge commit ' 1c641e6aac
' into concedo_experimental
...
# Conflicts:
# .devops/cloud-v-pipeline
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cli.Dockerfile
# .devops/llama-cpp-clblast.srpm.spec
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# .devops/nix/apps.nix
# .devops/nix/package.nix
# .devops/tools.sh
# .dockerignore
# .github/ISSUE_TEMPLATE/01-bug-low.yml
# .github/ISSUE_TEMPLATE/02-bug-medium.yml
# .github/ISSUE_TEMPLATE/03-bug-high.yml
# .github/ISSUE_TEMPLATE/04-bug-critical.yml
# .github/workflows/bench.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .github/workflows/server.yml
# .gitignore
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# docs/token_generation_performance_tips.md
# flake.nix
# grammars/README.md
# pocs/vdot/CMakeLists.txt
# scripts/get-hellaswag.sh
# scripts/get-wikitext-103.sh
# scripts/get-wikitext-2.sh
# scripts/get-winogrande.sh
# scripts/hf.sh
# scripts/pod-llama.sh
# scripts/qnt-all.sh
# scripts/run-all-ppl.sh
# scripts/run-with-preset.py
# scripts/server-llm.sh
# tests/test-backend-ops.cpp
2024-06-14 18:41:37 +08:00
Concedo
a8db72eca0
Merge commit ' ef52d1d16a
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/server.yml
# CMakeLists.txt
# README.md
# flake.lock
# grammars/README.md
# grammars/json.gbnf
# grammars/json_arr.gbnf
# tests/test-json-schema-to-grammar.cpp
2024-06-13 18:26:45 +08:00