Commit graph

216 commits

Author SHA1 Message Date
Concedo
e1f97f7fb5 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/llama-server.Dockerfile
#	README.md
#	flake.lock
#	ggml/src/ggml-vulkan.cpp
#	ggml/src/vulkan-shaders/concat.comp
#	ggml/src/vulkan-shaders/pad.comp
#	ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-backend-ops.cpp
2024-08-06 16:33:26 +08:00
Liu Jia
0a4ce78681
common : Changed tuple to struct (TODO fix) (#8823)
* common : Changed tuple to struct (TODO fix)

Use struct `llama_init_result` to replace the previous
std::tuple<struct llama_model *, struct llama_context *>

* delete llama_init_default_params()

* delete the extra whitespace
2024-08-05 18:14:10 +02:00
Concedo
cca2fa9a6c Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/llama-cli-intel.Dockerfile
#	.devops/llama-server-intel.Dockerfile
#	README.md
#	ggml/src/CMakeLists.txt
#	tests/test-chat-template.cpp
2024-07-24 21:57:50 +08:00
Xuan Son Nguyen
96952e7181
llama : fix llama_chat_format_single for mistral (#8657)
* fix `llama_chat_format_single` for mistral

* fix typo

* use printf
2024-07-24 13:48:46 +02:00
Concedo
602661ba49 Merge commit 'c917b67f06' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	Makefile
#	ggml/src/ggml-cuda/mmq.cuh
#	tests/test-double-float.cpp
#	tests/test-quantize-fns.cpp
#	tests/test-quantize-perf.cpp
2024-07-14 11:38:20 +08:00
Georgi Gerganov
6af51c0d96
main : print error on empty input (#8456) 2024-07-12 14:48:04 +03:00
Concedo
2cad736260 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/nix/package.nix
#	.github/labeler.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	Package.swift
#	README.md
#	ci/run.sh
#	docs/build.md
#	examples/CMakeLists.txt
#	flake.lock
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	grammars/README.md
#	requirements/requirements-convert_hf_to_gguf.txt
#	requirements/requirements-convert_hf_to_gguf_update.txt
#	scripts/check-requirements.sh
#	scripts/compare-llama-bench.py
#	scripts/gen-unicode-data.py
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	scripts/sync-ggml.sh
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-tokenizer-random.py
2024-07-11 16:36:16 +08:00
Denis Spasyuk
a8db2a9ce6
Update llama-cli documentation (#8315)
* Update README.md

* Update README.md

* Update README.md

fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas

* Update README.md

* Update README.md

* Update README.md
2024-07-07 17:08:28 +02:00
Concedo
5b605d03ea Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/ISSUE_TEMPLATE/config.yml
#	.gitignore
#	CMakeLists.txt
#	CONTRIBUTING.md
#	Makefile
#	README.md
#	ci/run.sh
#	common/common.h
#	examples/main-cmake-pkg/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	models/ggml-vocab-bert-bge.gguf.inp
#	models/ggml-vocab-bert-bge.gguf.out
#	models/ggml-vocab-deepseek-coder.gguf.inp
#	models/ggml-vocab-deepseek-coder.gguf.out
#	models/ggml-vocab-deepseek-llm.gguf.inp
#	models/ggml-vocab-deepseek-llm.gguf.out
#	models/ggml-vocab-falcon.gguf.inp
#	models/ggml-vocab-falcon.gguf.out
#	models/ggml-vocab-gpt-2.gguf.inp
#	models/ggml-vocab-gpt-2.gguf.out
#	models/ggml-vocab-llama-bpe.gguf.inp
#	models/ggml-vocab-llama-bpe.gguf.out
#	models/ggml-vocab-llama-spm.gguf.inp
#	models/ggml-vocab-llama-spm.gguf.out
#	models/ggml-vocab-mpt.gguf.inp
#	models/ggml-vocab-mpt.gguf.out
#	models/ggml-vocab-phi-3.gguf.inp
#	models/ggml-vocab-phi-3.gguf.out
#	models/ggml-vocab-starcoder.gguf.inp
#	models/ggml-vocab-starcoder.gguf.out
#	requirements.txt
#	requirements/requirements-convert_legacy_llama.txt
#	scripts/check-requirements.sh
#	scripts/pod-llama.sh
#	src/CMakeLists.txt
#	src/llama.cpp
#	tests/test-rope.cpp
2024-07-06 00:25:10 +08:00
Xuan Son Nguyen
a38b884c6c
cli: add EOT when user hit Ctrl+C (#8296)
* main: add need_insert_eot

* do not format system prompt if it is empty
2024-07-04 20:55:03 +02:00
fairydreaming
807b0c49ff
Inference support for T5 and FLAN-T5 model families (#5763)
* llama : add inference support and model types for T5 and FLAN-T5 model families

* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()

* common, llama-cli, llama-batched : add support for encoder-decoder models

* convert-hf : handle shared token embeddings tensors in T5Model

* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)

* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model

* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-04 15:46:11 +02:00
Concedo
0fc18d2d82 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/nix/package.nix
#	CMakePresets.json
#	README.md
#	flake.lock
#	ggml/src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
2024-07-02 21:05:45 +08:00
Xuan Son Nguyen
9ef0780062
Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203)
* preserve new line llama_chat_format_single

* disable chat template if in-prefix/suffix is set

* remove redundant change
2024-06-30 20:27:13 +02:00
Concedo
02f92f6ecc Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/full-cuda.Dockerfile
#	.devops/full-rocm.Dockerfile
#	.devops/llama-cli-cuda.Dockerfile
#	.devops/llama-cli-rocm.Dockerfile
#	.devops/llama-cli-vulkan.Dockerfile
#	.devops/llama-cpp-cuda.srpm.spec
#	.devops/llama-server-cuda.Dockerfile
#	.devops/llama-server-rocm.Dockerfile
#	.devops/llama-server-vulkan.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	Makefile
#	README.md
#	examples/llama.android/llama/src/main/cpp/CMakeLists.txt
#	flake.lock
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	grammars/README.md
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	tests/test-chat-template.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-json-schema-to-grammar.cpp
2024-06-30 10:59:42 +08:00
Concedo
9c10486204 merge the file structure refactor, testing 2024-06-29 12:14:38 +08:00
Xuan Son Nguyen
72272b83a3
fix code typo in llama-cli (#8198) 2024-06-29 00:14:20 +02:00
Concedo
f3dfa96dbc Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/llama-server-cuda.Dockerfile
#	.devops/llama-server-rocm.Dockerfile
#	.devops/llama-server-vulkan.Dockerfile
#	.devops/llama-server.Dockerfile
#	.github/workflows/docker.yml
#	README.md
#	llama.cpp
#	tests/test-chat-template.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-llama-grammar.cpp
2024-06-26 18:59:10 +08:00
Xuan Son Nguyen
48e6b92cc3
Add chat template support for llama-cli (#8068)
* add chat template support for llama-cli

* add help message

* server: simplify format_chat

* more consistent naming

* improve

* add llama_chat_format_example

* fix server

* code style

* code style

* Update examples/main/main.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-25 21:56:49 +10:00
Concedo
b53e760557 Merge commit '1c641e6aac' into concedo_experimental
# Conflicts:
#	.devops/cloud-v-pipeline
#	.devops/llama-cli-cuda.Dockerfile
#	.devops/llama-cli-rocm.Dockerfile
#	.devops/llama-cli-vulkan.Dockerfile
#	.devops/llama-cli.Dockerfile
#	.devops/llama-cpp-clblast.srpm.spec
#	.devops/llama-cpp-cuda.srpm.spec
#	.devops/llama-cpp.srpm.spec
#	.devops/llama-server-cuda.Dockerfile
#	.devops/llama-server-rocm.Dockerfile
#	.devops/llama-server-vulkan.Dockerfile
#	.devops/llama-server.Dockerfile
#	.devops/nix/apps.nix
#	.devops/nix/package.nix
#	.devops/tools.sh
#	.dockerignore
#	.github/ISSUE_TEMPLATE/01-bug-low.yml
#	.github/ISSUE_TEMPLATE/02-bug-medium.yml
#	.github/ISSUE_TEMPLATE/03-bug-high.yml
#	.github/ISSUE_TEMPLATE/04-bug-critical.yml
#	.github/workflows/bench.yml
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	.github/workflows/server.yml
#	.gitignore
#	Makefile
#	README-sycl.md
#	README.md
#	ci/run.sh
#	docs/token_generation_performance_tips.md
#	flake.nix
#	grammars/README.md
#	pocs/vdot/CMakeLists.txt
#	scripts/get-hellaswag.sh
#	scripts/get-wikitext-103.sh
#	scripts/get-wikitext-2.sh
#	scripts/get-winogrande.sh
#	scripts/hf.sh
#	scripts/pod-llama.sh
#	scripts/qnt-all.sh
#	scripts/run-all-ppl.sh
#	scripts/run-with-preset.py
#	scripts/server-llm.sh
#	tests/test-backend-ops.cpp
2024-06-14 18:41:37 +08:00
Olivier Chafik
1c641e6aac
build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809)
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew

* server: update refs -> llama-server

gitignore llama-server

* server: simplify nix package

* main: update refs -> llama

fix examples/main ref

* main/server: fix targets

* update more names

* Update build.yml

* rm accidentally checked in bins

* update straggling refs

* Update .gitignore

* Update server-llm.sh

* main: target name -> llama-cli

* Prefix all example bins w/ llama-

* fix main refs

* rename {main->llama}-cmake-pkg binary

* prefix more cmake targets w/ llama-

* add/fix gbnf-validator subfolder to cmake

* sort cmake example subdirs

* rm bin files

* fix llama-lookup-* Makefile rules

* gitignore /llama-*

* rename Dockerfiles

* rename llama|main -> llama-cli; consistent RPM bin prefixes

* fix some missing -cli suffixes

* rename dockerfile w/ llama-cli

* rename(make): llama-baby-llama

* update dockerfile refs

* more llama-cli(.exe)

* fix test-eval-callback

* rename: llama-cli-cmake-pkg(.exe)

* address gbnf-validator unused fread warning (switched to C++ / ifstream)

* add two missing llama- prefixes

* Updating docs for eval-callback binary to use new `llama-` prefix.

* Updating a few lingering doc references for rename of main to llama-cli

* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.

* Updating documentation references for lookup-merge and export-lora

* Updating two small `main` references missed earlier in the finetune docs.

* Update apps.nix

* update grammar/README.md w/ new llama-* names

* update llama-rpc-server bin name + doc

* Revert "update llama-rpc-server bin name + doc"

This reverts commit e474ef1df481fd8936cd7d098e3065d7de378930.

* add hot topic notice to README.md

* Update README.md

* Update README.md

* rename gguf-split & quantize bins refs in **/tests.sh

---------

Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
Concedo
02357eadf8 Merge commit '7672adeec7' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	kompute-shaders/op_rope_f16.comp
#	kompute-shaders/op_rope_f32.comp
#	kompute-shaders/rope_common.comp
#	tests/test-backend-ops.cpp
#	tests/test-grad0.cpp
#	tests/test-rope.cpp
2024-06-09 15:35:51 +08:00
arch-btw
9973e81c5c
readme : remove -ins (#7759)
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675

I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
Concedo
6659742a2d do not merge the removal of opencl 2024-06-05 10:57:52 +08:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing (#7675)
* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Concedo
4ed9ba7352 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	Makefile
#	README.md
#	flake.lock
#	tests/test-backend-ops.cpp
2024-05-28 21:57:19 +08:00
Brian
d298382ad9
main: replace --no-special with --special (#7534)
This also flips the default behavior of the output to not include control token by default.
2024-05-27 00:10:17 +10:00
Justine Tunney
00c6390793
main : don't print special tokens with --grammar (#6923)
* main : don't print special tokens with --grammar

The CLI interface was recently changed to print special control tokens
like the </s> stop message one. This token shouldn't be printed if the
grammar flag was passed, unless the grammar specifies it, because that
breaks shell-scriptability.

* main: use seperate stream for control characters

* main: use dprintf and add --ctrl-token-no-out and --ctrl-token-fd-out

* main: dprintf isn't part of the IEEE POSIX standard. Just use write().

* main: remove --ctrl-token-fd-out in favor for fcntl() based detection

* common.cpp: accidentally removed --interactive-first

* main: only merge stdout and control token if not in conversation or grammar mode

* main: rejig control token descriptor handling

* main: must check pipe status on very top of program

* main: renamed --no-special from  --ctrl-token-no-out and other refactoring

* main: refactor ctrl_token_no_out --> no_special

* llama: rename llama_token_is_control_token() to llama_token_is_control()

* main: remove special token file descriptor feature (#5)

---------

Co-authored-by: Brian <mofosyne@gmail.com>
2024-05-25 19:04:03 +10:00
Concedo
9282c307ed this commit does not work, just for debugging 2024-05-23 20:13:47 +08:00
Georgi Gerganov
fbf777d2b9
main : minor (#7462) 2024-05-23 09:43:49 +03:00
Georgi Gerganov
6ff13987ad
common : normalize naming style (#7462)
* common : normalize naming style

ggml-ci

* common : match declaration / definition order

* zig : try to fix build
2024-05-22 20:04:20 +03:00
Concedo
4a3c8c190b Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	tests/test-backend-ops.cpp
2024-05-22 15:04:31 +08:00
Olivier Chafik
e402de364b
grammars: fix resampling logic regression (#7424) 2024-05-21 20:40:00 +01:00
Amir
11474e756d
examples: cache hf model when --model not provided (#7353)
* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided
2024-05-21 17:13:12 +03:00
Concedo
2ee808a747 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	README.md
#	ci/run.sh
#	llama.cpp
#	models/ggml-vocab-llama-bpe.gguf.inp
#	models/ggml-vocab-llama-bpe.gguf.out
#	requirements.txt
#	scripts/compare-llama-bench.py
#	scripts/sync-ggml.last
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-tokenizer-1-bpe.cpp
2024-05-14 19:28:47 +08:00
Justine Tunney
4e3880978f
Fix memory bug in grammar parser (#7194)
The llama.cpp grammar parser had a bug where forgetting to add a closing
quotation mark to strings would cause parsing to crash. Anyone running a
server on a public endpoint is advised to upgrade. To reproduce this bug

    ./llamafile -m foo.gguf -p bar --grammar 'root::="'

Credit for discovering and reporting this issue goes to Eclypsium
Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.
2024-05-10 21:01:08 +10:00
HanishKVC
f89fe2732c
Main+: optionally allow special tokens from user in interactive mode (#7097)
@hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.
2024-05-10 20:21:58 +10:00
Concedo
d084f78faa Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	common/common.cpp
#	requirements/requirements-convert-hf-to-gguf-update.txt
#	requirements/requirements-convert-hf-to-gguf.txt
#	requirements/requirements-convert.txt
#	tests/CMakeLists.txt
#	tests/test-json-schema-to-grammar.cpp
2024-05-09 15:13:34 +08:00
Dawid Potocki
83330d8cd6
main : add --conversation / -cnv flag (#7108) 2024-05-08 17:32:32 +03:00
Concedo
bc39b4d98a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	README.md
#	ci/run.sh
#	docs/BLIS.md
#	flake.lock
#	grammars/README.md
2024-05-08 09:58:23 +08:00
RhinoDevel
3af34c1d1b
main : update log text (EOS to EOG) (#7104)
* Update log text (EOS to EOG)

The log text "found EOS" is no longer always correct, here, because there is now an is-EOG check that also returns true for EOT.

* Improve log msg. further by using "an" instead of "some".

As suggested, to avoid misunderstanding (no multiple EOG tokens found, just one).
2024-05-07 20:51:31 +03:00
omahs
04976db7a8
docs: fix typos (#7124)
* fix typo

* fix typos

* fix typo

* fix typos

* fix typo

* fix typos
2024-05-07 18:20:33 +03:00
Concedo
6c000cbe7a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.flake8
#	.github/workflows/bench.yml
#	.github/workflows/python-lint.yml
#	.pre-commit-config.yaml
#	Makefile
#	README.md
#	models/ggml-vocab-bert-bge.gguf.inp
#	models/ggml-vocab-bert-bge.gguf.out
#	models/ggml-vocab-deepseek-coder.gguf.inp
#	models/ggml-vocab-deepseek-coder.gguf.out
#	models/ggml-vocab-deepseek-llm.gguf.inp
#	models/ggml-vocab-deepseek-llm.gguf.out
#	models/ggml-vocab-falcon.gguf.inp
#	models/ggml-vocab-falcon.gguf.out
#	models/ggml-vocab-gpt-2.gguf.inp
#	models/ggml-vocab-gpt-2.gguf.out
#	models/ggml-vocab-llama-bpe.gguf.inp
#	models/ggml-vocab-llama-bpe.gguf.out
#	models/ggml-vocab-llama-spm.gguf.inp
#	models/ggml-vocab-llama-spm.gguf.out
#	models/ggml-vocab-mpt.gguf.inp
#	models/ggml-vocab-mpt.gguf.out
#	models/ggml-vocab-phi-3.gguf
#	models/ggml-vocab-phi-3.gguf.inp
#	models/ggml-vocab-phi-3.gguf.out
#	models/ggml-vocab-refact.gguf
#	models/ggml-vocab-starcoder.gguf.inp
#	models/ggml-vocab-starcoder.gguf.out
#	requirements/requirements-convert.txt
#	scripts/compare-llama-bench.py
#	scripts/run-with-preset.py
#	scripts/verify-checksum-models.py
#	tests/CMakeLists.txt
#	tests/test-tokenizer-0.cpp
2024-05-06 18:09:45 +08:00
l3utterfly
8d608a81b7
main : fix off by one error for context shift (#6921) 2024-05-01 22:27:41 +03:00
Concedo
17a24d753c Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/main-intel.Dockerfile
#	.devops/main-vulkan.Dockerfile
#	.devops/server-intel.Dockerfile
#	.devops/server-vulkan.Dockerfile
#	.github/workflows/bench.yml
#	.github/workflows/build.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/server.yml
#	.gitignore
#	Makefile
#	README-sycl.md
#	README.md
#	ci/run.sh
#	flake.lock
#	llama.cpp
#	models/ggml-vocab-falcon.gguf
#	models/ggml-vocab-llama-spm.gguf
#	models/ggml-vocab-mpt.gguf
#	models/ggml-vocab-stablelm.gguf
#	models/ggml-vocab-starcoder.gguf
#	requirements.txt
#	scripts/check-requirements.sh
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-tokenizer-0-bpe.py
#	tests/test-tokenizer-0-spm.py
#	tests/test-tokenizer-1-spm.cpp
2024-04-30 21:04:17 +08:00
Olivier Chafik
8843a98c2b
Improve usability of --model-url & related flags (#6930)
* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf)

* args: main & server now call gpt_params_handle_model_default

* args: define DEFAULT_MODEL_PATH + update cli docs

* curl: check url of previous download (.json metadata w/ url, etag & lastModified)

* args: fix update to quantize-stats.cpp

* curl: support legacy .etag / .lastModified companion files

* curl: rm legacy .etag file support

* curl: reuse regex across headers callback calls

* curl: unique_ptr to manage lifecycle of curl & outfile

* curl: nit: no need for multiline regex flag

* curl: update failed test (model file collision) + gitignore *.gguf.json
2024-04-30 00:52:50 +01:00
Daniel Bevenius
5539e6fdd1
main : fix typo in comment in main.cpp (#6985)
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-29 13:56:59 -04:00
Concedo
a681cdd9ef Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	common/sampling.h
#	llama.h
#	tests/test-chat-template.cpp
2024-04-24 21:29:07 +08:00
Johannes Gäßler
28103f4832
Server: fix seed for multiple slots (#6835)
* Server: add tests for consistent results

* sampling: separate rng per sampling context
2024-04-24 11:08:36 +02:00
Concedo
b4d2031215 merged, added ability to render special tokens 2024-04-22 18:19:58 +08:00
Pedro Cuenca
b97bc3966e
llama : support Llama 3 HF conversion (#6745)
* Support Llama 3 conversion

The tokenizer is BPE.

* style

* Accept suggestion

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* llama : add llama_token_is_eog()

ggml-ci

* llama : auto-detect more EOT tokens when missing in KV data

* convert : replacing EOS token is a hack

* llama : fix codegemma EOT token + add TODOs

* llama : fix model type string for 8B model

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-21 14:50:41 +03:00