Concedo
e1f97f7fb5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-server.Dockerfile
# README.md
# flake.lock
# ggml/src/ggml-vulkan.cpp
# ggml/src/vulkan-shaders/concat.comp
# ggml/src/vulkan-shaders/pad.comp
# ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-backend-ops.cpp
2024-08-06 16:33:26 +08:00
Liu Jia
0a4ce78681
common : Changed tuple to struct (TODO fix) ( #8823 )
...
* common : Changed tuple to struct (TODO fix)
Use struct `llama_init_result` to replace the previous
std::tuple<struct llama_model *, struct llama_context *>
* delete llama_init_default_params()
* delete the extra whitespace
2024-08-05 18:14:10 +02:00
Concedo
cca2fa9a6c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-server-intel.Dockerfile
# README.md
# ggml/src/CMakeLists.txt
# tests/test-chat-template.cpp
2024-07-24 21:57:50 +08:00
Xuan Son Nguyen
96952e7181
llama : fix llama_chat_format_single
for mistral ( #8657 )
...
* fix `llama_chat_format_single` for mistral
* fix typo
* use printf
2024-07-24 13:48:46 +02:00
Concedo
602661ba49
Merge commit ' c917b67f06
' into concedo_experimental
...
# Conflicts:
# .devops/tools.sh
# Makefile
# ggml/src/ggml-cuda/mmq.cuh
# tests/test-double-float.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
2024-07-14 11:38:20 +08:00
Georgi Gerganov
6af51c0d96
main : print error on empty input ( #8456 )
2024-07-12 14:48:04 +03:00
Concedo
2cad736260
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .github/labeler.yml
# .gitignore
# CMakeLists.txt
# Makefile
# Package.swift
# README.md
# ci/run.sh
# docs/build.md
# examples/CMakeLists.txt
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# requirements/requirements-convert_hf_to_gguf.txt
# requirements/requirements-convert_hf_to_gguf_update.txt
# scripts/check-requirements.sh
# scripts/compare-llama-bench.py
# scripts/gen-unicode-data.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-tokenizer-random.py
2024-07-11 16:36:16 +08:00
Denis Spasyuk
a8db2a9ce6
Update llama-cli documentation ( #8315 )
...
* Update README.md
* Update README.md
* Update README.md
fixed llama-cli/main, templates on some cmds added chat template sections and fixed typos in some areas
* Update README.md
* Update README.md
* Update README.md
2024-07-07 17:08:28 +02:00
Concedo
5b605d03ea
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/config.yml
# .gitignore
# CMakeLists.txt
# CONTRIBUTING.md
# Makefile
# README.md
# ci/run.sh
# common/common.h
# examples/main-cmake-pkg/CMakeLists.txt
# ggml/src/CMakeLists.txt
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements.txt
# requirements/requirements-convert_legacy_llama.txt
# scripts/check-requirements.sh
# scripts/pod-llama.sh
# src/CMakeLists.txt
# src/llama.cpp
# tests/test-rope.cpp
2024-07-06 00:25:10 +08:00
Xuan Son Nguyen
a38b884c6c
cli: add EOT when user hit Ctrl+C ( #8296 )
...
* main: add need_insert_eot
* do not format system prompt if it is empty
2024-07-04 20:55:03 +02:00
fairydreaming
807b0c49ff
Inference support for T5 and FLAN-T5 model families ( #5763 )
...
* llama : add inference support and model types for T5 and FLAN-T5 model families
* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()
* common, llama-cli, llama-batched : add support for encoder-decoder models
* convert-hf : handle shared token embeddings tensors in T5Model
* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)
* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model
* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-04 15:46:11 +02:00
Concedo
0fc18d2d82
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# CMakePresets.json
# README.md
# flake.lock
# ggml/src/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2024-07-02 21:05:45 +08:00
Xuan Son Nguyen
9ef0780062
Fix new line issue with chat template, disable template when in-prefix/suffix is set ( #8203 )
...
* preserve new line llama_chat_format_single
* disable chat template if in-prefix/suffix is set
* remove redundant change
2024-06-30 20:27:13 +02:00
Concedo
02f92f6ecc
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
# examples/llama.android/llama/src/main/cpp/CMakeLists.txt
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2024-06-30 10:59:42 +08:00
Concedo
9c10486204
merge the file structure refactor, testing
2024-06-29 12:14:38 +08:00
Xuan Son Nguyen
72272b83a3
fix code typo in llama-cli ( #8198 )
2024-06-29 00:14:20 +02:00
Concedo
f3dfa96dbc
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# .github/workflows/docker.yml
# README.md
# llama.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
2024-06-26 18:59:10 +08:00
Xuan Son Nguyen
48e6b92cc3
Add chat template support for llama-cli ( #8068 )
...
* add chat template support for llama-cli
* add help message
* server: simplify format_chat
* more consistent naming
* improve
* add llama_chat_format_example
* fix server
* code style
* code style
* Update examples/main/main.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-25 21:56:49 +10:00
Concedo
b53e760557
Merge commit ' 1c641e6aac
' into concedo_experimental
...
# Conflicts:
# .devops/cloud-v-pipeline
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cli.Dockerfile
# .devops/llama-cpp-clblast.srpm.spec
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# .devops/nix/apps.nix
# .devops/nix/package.nix
# .devops/tools.sh
# .dockerignore
# .github/ISSUE_TEMPLATE/01-bug-low.yml
# .github/ISSUE_TEMPLATE/02-bug-medium.yml
# .github/ISSUE_TEMPLATE/03-bug-high.yml
# .github/ISSUE_TEMPLATE/04-bug-critical.yml
# .github/workflows/bench.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .github/workflows/server.yml
# .gitignore
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# docs/token_generation_performance_tips.md
# flake.nix
# grammars/README.md
# pocs/vdot/CMakeLists.txt
# scripts/get-hellaswag.sh
# scripts/get-wikitext-103.sh
# scripts/get-wikitext-2.sh
# scripts/get-winogrande.sh
# scripts/hf.sh
# scripts/pod-llama.sh
# scripts/qnt-all.sh
# scripts/run-all-ppl.sh
# scripts/run-with-preset.py
# scripts/server-llm.sh
# tests/test-backend-ops.cpp
2024-06-14 18:41:37 +08:00
Olivier Chafik
1c641e6aac
build
: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df481fd8936cd7d098e3065d7de378930.
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
Concedo
02357eadf8
Merge commit ' 7672adeec7
' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# kompute-shaders/op_rope_f16.comp
# kompute-shaders/op_rope_f32.comp
# kompute-shaders/rope_common.comp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-rope.cpp
2024-06-09 15:35:51 +08:00
arch-btw
9973e81c5c
readme : remove -ins ( #7759 )
...
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675
I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
Concedo
6659742a2d
do not merge the removal of opencl
2024-06-05 10:57:52 +08:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing ( #7675 )
...
* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Concedo
4ed9ba7352
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
# flake.lock
# tests/test-backend-ops.cpp
2024-05-28 21:57:19 +08:00
Brian
d298382ad9
main: replace --no-special with --special ( #7534 )
...
This also flips the default behavior of the output to not include control token by default.
2024-05-27 00:10:17 +10:00
Justine Tunney
00c6390793
main : don't print special tokens with --grammar ( #6923 )
...
* main : don't print special tokens with --grammar
The CLI interface was recently changed to print special control tokens
like the </s> stop message one. This token shouldn't be printed if the
grammar flag was passed, unless the grammar specifies it, because that
breaks shell-scriptability.
* main: use seperate stream for control characters
* main: use dprintf and add --ctrl-token-no-out and --ctrl-token-fd-out
* main: dprintf isn't part of the IEEE POSIX standard. Just use write().
* main: remove --ctrl-token-fd-out in favor for fcntl() based detection
* common.cpp: accidentally removed --interactive-first
* main: only merge stdout and control token if not in conversation or grammar mode
* main: rejig control token descriptor handling
* main: must check pipe status on very top of program
* main: renamed --no-special from --ctrl-token-no-out and other refactoring
* main: refactor ctrl_token_no_out --> no_special
* llama: rename llama_token_is_control_token() to llama_token_is_control()
* main: remove special token file descriptor feature (#5 )
---------
Co-authored-by: Brian <mofosyne@gmail.com>
2024-05-25 19:04:03 +10:00
Concedo
9282c307ed
this commit does not work, just for debugging
2024-05-23 20:13:47 +08:00
Georgi Gerganov
fbf777d2b9
main : minor ( #7462 )
2024-05-23 09:43:49 +03:00
Georgi Gerganov
6ff13987ad
common : normalize naming style ( #7462 )
...
* common : normalize naming style
ggml-ci
* common : match declaration / definition order
* zig : try to fix build
2024-05-22 20:04:20 +03:00
Concedo
4a3c8c190b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# tests/test-backend-ops.cpp
2024-05-22 15:04:31 +08:00
Olivier Chafik
e402de364b
grammars
: fix resampling logic regression (#7424 )
2024-05-21 20:40:00 +01:00
Amir
11474e756d
examples: cache hf model when --model not provided ( #7353 )
...
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
* examples: cache hf model when --model not provided
2024-05-21 17:13:12 +03:00
Concedo
2ee808a747
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# README.md
# ci/run.sh
# llama.cpp
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# requirements.txt
# scripts/compare-llama-bench.py
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-tokenizer-1-bpe.cpp
2024-05-14 19:28:47 +08:00
Justine Tunney
4e3880978f
Fix memory bug in grammar parser ( #7194 )
...
The llama.cpp grammar parser had a bug where forgetting to add a closing
quotation mark to strings would cause parsing to crash. Anyone running a
server on a public endpoint is advised to upgrade. To reproduce this bug
./llamafile -m foo.gguf -p bar --grammar 'root::="'
Credit for discovering and reporting this issue goes to Eclypsium
Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.
2024-05-10 21:01:08 +10:00
HanishKVC
f89fe2732c
Main+: optionally allow special tokens from user in interactive mode ( #7097 )
...
@hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.
2024-05-10 20:21:58 +10:00
Concedo
d084f78faa
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# common/common.cpp
# requirements/requirements-convert-hf-to-gguf-update.txt
# requirements/requirements-convert-hf-to-gguf.txt
# requirements/requirements-convert.txt
# tests/CMakeLists.txt
# tests/test-json-schema-to-grammar.cpp
2024-05-09 15:13:34 +08:00
Dawid Potocki
83330d8cd6
main : add --conversation / -cnv flag ( #7108 )
2024-05-08 17:32:32 +03:00
Concedo
bc39b4d98a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# README.md
# ci/run.sh
# docs/BLIS.md
# flake.lock
# grammars/README.md
2024-05-08 09:58:23 +08:00
RhinoDevel
3af34c1d1b
main : update log text (EOS to EOG) ( #7104 )
...
* Update log text (EOS to EOG)
The log text "found EOS" is no longer always correct, here, because there is now an is-EOG check that also returns true for EOT.
* Improve log msg. further by using "an" instead of "some".
As suggested, to avoid misunderstanding (no multiple EOG tokens found, just one).
2024-05-07 20:51:31 +03:00
omahs
04976db7a8
docs: fix typos ( #7124 )
...
* fix typo
* fix typos
* fix typo
* fix typos
* fix typo
* fix typos
2024-05-07 18:20:33 +03:00
Concedo
6c000cbe7a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .flake8
# .github/workflows/bench.yml
# .github/workflows/python-lint.yml
# .pre-commit-config.yaml
# Makefile
# README.md
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-refact.gguf
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements/requirements-convert.txt
# scripts/compare-llama-bench.py
# scripts/run-with-preset.py
# scripts/verify-checksum-models.py
# tests/CMakeLists.txt
# tests/test-tokenizer-0.cpp
2024-05-06 18:09:45 +08:00
l3utterfly
8d608a81b7
main : fix off by one error for context shift ( #6921 )
2024-05-01 22:27:41 +03:00
Concedo
17a24d753c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/main-intel.Dockerfile
# .devops/main-vulkan.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-vulkan.Dockerfile
# .github/workflows/bench.yml
# .github/workflows/build.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .gitignore
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# flake.lock
# llama.cpp
# models/ggml-vocab-falcon.gguf
# models/ggml-vocab-llama-spm.gguf
# models/ggml-vocab-mpt.gguf
# models/ggml-vocab-stablelm.gguf
# models/ggml-vocab-starcoder.gguf
# requirements.txt
# scripts/check-requirements.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-tokenizer-0-bpe.py
# tests/test-tokenizer-0-spm.py
# tests/test-tokenizer-1-spm.cpp
2024-04-30 21:04:17 +08:00
Olivier Chafik
8843a98c2b
Improve usability of --model-url & related flags ( #6930 )
...
* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf)
* args: main & server now call gpt_params_handle_model_default
* args: define DEFAULT_MODEL_PATH + update cli docs
* curl: check url of previous download (.json metadata w/ url, etag & lastModified)
* args: fix update to quantize-stats.cpp
* curl: support legacy .etag / .lastModified companion files
* curl: rm legacy .etag file support
* curl: reuse regex across headers callback calls
* curl: unique_ptr to manage lifecycle of curl & outfile
* curl: nit: no need for multiline regex flag
* curl: update failed test (model file collision) + gitignore *.gguf.json
2024-04-30 00:52:50 +01:00
Daniel Bevenius
5539e6fdd1
main : fix typo in comment in main.cpp ( #6985 )
...
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-29 13:56:59 -04:00
Concedo
a681cdd9ef
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/sampling.h
# llama.h
# tests/test-chat-template.cpp
2024-04-24 21:29:07 +08:00
Johannes Gäßler
28103f4832
Server: fix seed for multiple slots ( #6835 )
...
* Server: add tests for consistent results
* sampling: separate rng per sampling context
2024-04-24 11:08:36 +02:00
Concedo
b4d2031215
merged, added ability to render special tokens
2024-04-22 18:19:58 +08:00
Pedro Cuenca
b97bc3966e
llama : support Llama 3 HF conversion ( #6745 )
...
* Support Llama 3 conversion
The tokenizer is BPE.
* style
* Accept suggestion
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
* llama : add llama_token_is_eog()
ggml-ci
* llama : auto-detect more EOT tokens when missing in KV data
* convert : replacing EOS token is a hack
* llama : fix codegemma EOT token + add TODOs
* llama : fix model type string for 8B model
---------
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-21 14:50:41 +03:00