Commit graph

1272 commits

Author SHA1 Message Date
Concedo
e692a79aab Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	CONTRIBUTING.md
#	docs/android.md
#	docs/docker.md
#	examples/embedding/embedding.cpp
#	examples/imatrix/imatrix.cpp
#	examples/infill/infill.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/main/README.md
#	examples/parallel/parallel.cpp
#	examples/perplexity/perplexity.cpp
#	examples/quantize-stats/quantize-stats.cpp
#	examples/save-load-state/save-load-state.cpp
#	examples/server/README.md
#	examples/simple/CMakeLists.txt
#	examples/speculative/speculative.cpp
#	flake.lock
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-blas.cpp
#	pocs/vdot/q8dot.cpp
#	pocs/vdot/vdot.cpp
#	scripts/debug-test.sh
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-quantize-fns.cpp
#	tests/test-quantize-perf.cpp
#	tests/test-tokenizer-0.cpp
#	tests/test-tokenizer-1-bpe.cpp
#	tests/test-tokenizer-1-spm.cpp
2024-10-11 11:59:59 +08:00
Diego Devesa
7eee341bee
common : use common_ prefix for common library functions (#9805)
* common : use common_ prefix for common library functions

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-10-10 22:57:42 +02:00
Diego Devesa
0e9f760eb1
rpc : add backend registry / device interfaces (#9812)
* rpc : add backend registry / device interfaces

* llama : add llama_supports_rpc API

* ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server
2024-10-10 20:14:55 +02:00
Diego Devesa
c7499c557c
examples : do not use common library in simple example (#9803)
* examples : do not use common library in simple example

* add command line parser, simplify code
2024-10-10 19:50:49 +02:00
Diego Devesa
c81f3bbb05
cmake : do not build common library by default when standalone (#9804) 2024-10-09 18:49:52 +02:00
Georgi Gerganov
e7022064ab
perplexity : fix integer overflow (#9783)
* perplexity : fix integer overflow

ggml-ci

* perplexity : keep n_vocab as int and make appropriate casts

ggml-ci
2024-10-09 17:00:18 +03:00
Georgi Gerganov
3dc48fe75a
examples : remove llama.vim
An updated version will be added in #9787
2024-10-09 10:55:42 +03:00
Diego Devesa
dca1d4b58a
ggml : fix BLAS with unsupported types (#9775)
* ggml : do not use BLAS with types without to_float

* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies

* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits

it's not really internal if everybody uses it
2024-10-08 14:21:43 +02:00
Xuan Son Nguyen
458367a906
server : better security control for public deployments (#9776)
* server : more explicit endpoint access settings

* protect /props endpoint

* fix tests

* update server docs

* fix typo

* fix tests
2024-10-08 13:27:04 +02:00
Georgi Gerganov
f4b2dcdf49
readme : fix typo [no ci] 2024-10-06 13:49:41 +03:00
Concedo
da6cf261a8 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/close-issue.yml
#	.github/workflows/nix-ci-aarch64.yml
#	.github/workflows/nix-ci.yml
#	README.md
#	ci/run.sh
#	examples/server/README.md
#	ggml/src/ggml-cuda.cu
#	ggml/src/ggml-metal.m
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
2024-10-05 22:24:08 +08:00
Concedo
3e1cbedbae Merge commit 'c83ad6d01e' into concedo_experimental
# Conflicts:
#	.github/workflows/bench.yml.disabled
#	Makefile
#	Package.swift
#	README.md
#	docs/backend/SYCL.md
#	examples/CMakeLists.txt
#	examples/benchmark/benchmark-matmult.cpp
#	ggml/src/CMakeLists.txt
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.sh
#	src/llama.cpp
#	tests/test-backend-ops.cpp
2024-10-05 22:17:33 +08:00
Georgi Gerganov
8c475b97b8
rerank : use [SEP] token instead of [BOS] (#9737)
* rerank : use [SEP] token instead of [BOS]

ggml-ci

* common : sanity check for non-NULL tokens

ggml-ci

* ci : adjust rank score interval

ggml-ci

* ci : add shebang to run.sh

ggml-ci
2024-10-05 15:55:04 +03:00
Daniel Kleine
133c7b46b3
Fixed RNG seed docs (#9723)
* Update README.md

fixed RNG seed info

* changed print format to unsigned
2024-10-04 10:54:44 +02:00
Radoslav Gerganov
841713e1e4
rpc : enable vulkan (#9714)
closes #8536
2024-10-03 13:00:52 +03:00
Zhenwei Jin
76b37d1541
gguf-split : improve --split and --merge logic (#9619)
* make sure params --split and --merge are not specified at same time

* update gguf-split params parse logic

* Update examples/gguf-split/gguf-split.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-10-02 10:21:57 +03:00
Georgi Gerganov
148844fe97
examples : remove benchmark (#9704)
ggml-ci
2024-10-02 10:14:44 +03:00
Concedo
ce7f9c9a2c Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/full-rocm.Dockerfile
#	.devops/llama-cli-rocm.Dockerfile
#	.devops/llama-server-rocm.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/python-type-check.yml
#	CMakeLists.txt
#	CONTRIBUTING.md
#	README.md
#	ci/run.sh
#	examples/embedding/embedding.cpp
#	examples/server/README.md
#	flake.lock
#	ggml/include/ggml.h
#	ggml/src/ggml.c
#	requirements/requirements-convert_legacy_llama.txt
#	scripts/sync-ggml.last
#	src/llama-vocab.cpp
#	src/llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-grad0.cpp
#	tests/test-tokenizer-0.cpp
2024-10-02 01:00:57 +08:00
Georgi Gerganov
cad341d889
metal : reduce command encoding overhead (#9698)
* metal : reduce command encoding overhead

ggml-ci

* metal : add comments
2024-10-01 16:00:25 +03:00
compilade
511636df0c
ci : reduce severity of unused Pyright ignore comments (#9697) 2024-09-30 14:13:16 -04:00
vb
08a43d05b6
py : update transfomers version (#9694)
* update transfomers version.

* update hfh version.
2024-09-30 18:03:47 +03:00
Georgi Gerganov
f4d2b8846a
llama : add reranking support (#9510)
* py : add XLMRobertaForSequenceClassification [no ci]

* py : fix scalar-tensor conversion [no ci]

* py : fix position embeddings chop [no ci]

* llama : read new cls tensors [no ci]

* llama : add classigication head (wip) [no ci]

* llama : add "rank" pooling type

ggml-ci

* server : add rerank endpoint

ggml-ci

* llama : aboud ggml_repeat during classification

* rerank : cleanup + comments

* server : accept /rerank endpoint in addition to /v1/rerank [no ci]

* embedding : parse special tokens

* jina : support v1 reranker

* vocab : minor style

ggml-ci

* server : initiate tests for later

ggml-ci

* server : add docs

* llama : add comment [no ci]

* llama : fix uninitialized tensors

* ci : add rerank tests

ggml-ci

* add reranking test

* change test data

* Update examples/server/server.cpp

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* add `--reranking` argument

* update server docs

* llama : fix comment [no ci]

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-09-28 17:42:03 +03:00
Zhenwei Jin
6102037bbb
vocab : refactor tokenizer to reduce init overhead (#9449)
* refactor tokenizer

* llama : make llm_tokenizer more private

ggml-ci

* refactor tokenizer

* refactor tokenizer

* llama : make llm_tokenizer more private

ggml-ci

* remove unused files

* remove unused fileds to avoid unused filed build error

* avoid symbol link error

* Update src/llama.cpp

* Update src/llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-28 15:10:58 +03:00
Concedo
ea55f69dc1 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.dockerignore
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	Makefile
#	README.md
#	examples/infill/infill.cpp
#	examples/perplexity/perplexity.cpp
#	examples/server/README.md
#	examples/speculative/speculative.cpp
#	flake.lock
#	ggml/src/CMakeLists.txt
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tests/test-sampling.cpp
2024-09-27 11:21:28 +08:00
Xuan Son Nguyen
afbbfaa537
server : add more env vars, improve gen-docs (#9635)
* server : add more env vars, improve gen-docs

* update server docs

* LLAMA_ARG_NO_CONTEXT_SHIFT
2024-09-25 14:05:13 +02:00
Georgi Gerganov
cea1486ecf
log : add CONT level for continuing previous log entry (#9610) 2024-09-24 10:15:35 +03:00
StrangeBytesDev
0aa15011e3
server : add newline after chat example (#9616) 2024-09-24 09:04:39 +03:00
Georgi Gerganov
b0f27361f3
sampling : avoid expensive softmax during greedy sampling (#9605)
* sampling : avoid expensive softmax during greedy sampling

ggml-ci

* speculative : fix default RNG seed + set sparams.n_probs

* Update tests/test-sampling.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* sampling : add clarifying comment [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-09-24 09:03:17 +03:00
Xuan Son Nguyen
0b3bf966f4
server : add --no-context-shift option (#9607)
* server : add --no-context-shift option

* small fix

* Update examples/server/tests/features/embeddings.feature

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* tests : minor fix

* revert usage of GGML_ASSERT

* update server documentation

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-23 22:23:54 +02:00
Georgi Gerganov
37f8c7b4c9
perplexity : remove extra new lines after chunks (#9596) 2024-09-23 11:28:02 +03:00
Concedo
cd1a52a29e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tests/test-grad0.cpp
2024-09-21 11:23:54 +08:00
slaren
63351143b2
quantize : improve type name parsing (#9570)
quantize : do not ignore invalid types in arg parsing

quantize : ignore case of type and ftype arguments
2024-09-20 20:55:36 +02:00
Concedo
55a249d222 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/imatrix/imatrix.cpp
#	examples/infill/infill.cpp
#	examples/perplexity/perplexity.cpp
2024-09-20 18:03:45 +08:00
Georgi Gerganov
d39e26741f
examples : flush log upon ctrl+c (#9559) 2024-09-20 11:46:56 +03:00
Sigbjørn Skjæret
722ec1eb51
perplexity : do not escape input data by default (#9548) 2024-09-20 09:38:10 +03:00
Georgi Gerganov
6026da52d6
server : clean-up completed tasks from waiting list (#9531)
ggml-ci
2024-09-19 12:44:53 +03:00
Sigbjørn Skjæret
eca0fab44e
imatrix : disable prompt escape by default (#9543) 2024-09-19 10:58:14 +03:00
Concedo
29625c3d2e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	Makefile
#	README.md
#	ci/run.sh
#	common/CMakeLists.txt
#	common/common.cpp
#	docs/backend/SYCL.md
#	examples/embedding/embedding.cpp
#	examples/imatrix/imatrix.cpp
#	examples/infill/infill.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/main/README.md
#	examples/parallel/parallel.cpp
#	examples/perplexity/perplexity.cpp
#	examples/server/CMakeLists.txt
#	examples/server/README.md
#	examples/server/bench/README.md
#	examples/server/tests/README.md
#	examples/speculative/speculative.cpp
#	flake.lock
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	grammars/README.md
#	scripts/compare-commits.sh
#	scripts/compare-llama-bench.py
#	tests/CMakeLists.txt
2024-09-19 14:53:57 +08:00
Vinesh Janarthanan
8a308354f6
server : match OAI structured output response (#9527) 2024-09-18 09:50:34 +03:00
Eric Zhang
f799155ab8
server : fix OpenSSL build (remove obsolete LOG_INFO) (#9529) 2024-09-18 09:28:20 +03:00
Neo Zhang Jianyu
faf67b3de4
[SYCL]set context default value to avoid memory issue, update guide (#9476)
* set context default to avoid memory issue, update guide

* Update docs/backend/SYCL.md

Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-09-18 08:30:31 +08:00
Michael Podvitskiy
7be099fa81
llama-bench: correct argument parsing error message (#9524) 2024-09-17 22:41:38 +02:00
Bert Wagner
8b836ae731
arg : add env variable for parallel (#9513)
* add env variable for parallel

* Update README.md with env:  LLAMA_ARG_N_PARALLEL
2024-09-17 16:35:38 +03:00
Vinesh Janarthanan
441b72b91f
main : option to disable context shift (#9484)
* added cli arg to disable context shift

* reverted precommit

* updated README.md for main

* white space

* allow disabling context shift in the server

* Update common/arg.cpp

no-context-shift only works for main example

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* added server example to --no-context-shift args

* removed server changes

* white space

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-09-16 09:20:01 +03:00
Georgi Gerganov
6262d13e0b
common : reimplement logging (#9418)
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-15 20:46:12 +03:00
slaren
e6deac31f7
gguf-split : add basic checks (#9499)
* gguf-split : do not overwrite existing files when merging

* gguf-split : error when too many arguments are passed
2024-09-15 19:02:27 +02:00
Concedo
ab41e324d6 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	examples/server/CMakeLists.txt
#	ggml/src/CMakeLists.txt
2024-09-15 19:28:05 +08:00
VoidIsVoid
dcdcee3a74
server: add data: [DONE] to /chat/completions stream response (#9459) 2024-09-14 11:36:44 +02:00
Xuan Son Nguyen
feff4aa846
server : add loading html page while model is loading (#9468)
* Adding loading page for '/' server requests

* set content when model is loading

* removed loading html file

* updated cmakelist

* updated makefile

* cleaned up whitespace

* cleanup for PR removed error

* updated server test to handle 503 HTML

* updated server test to handle 503 HTML

* ca†ch 503 before parsing json

* revert test

* account for both api and web browser requests

* precommit corrections

* eol fix

* revert changes to pre-commit

* removed print statement

* made loading message more descriptive

* also support .html files

---------

Co-authored-by: VJHack <flymyplane21@gmail.com>
Co-authored-by: Vinesh Janarthanan <36610342+VJHack@users.noreply.github.com>
2024-09-13 14:23:11 +02:00
Concedo
e44ddf26ef Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	Makefile
#	examples/embedding/embedding.cpp
#	examples/imatrix/imatrix.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/llava/MobileVLM-README.md
#	examples/parallel/parallel.cpp
#	examples/perplexity/perplexity.cpp
#	examples/quantize/CMakeLists.txt
#	examples/server/README.md
#	examples/speculative/speculative.cpp
#	tests/test-backend-ops.cpp
2024-09-13 16:17:24 +08:00