Concedo
ba5babb876
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/apps.nix
# .devops/tools.sh
# Makefile
# README.md
# docs/backend/SYCL.md
# docs/build.md
# examples/CMakeLists.txt
# ggml/include/ggml.h
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-sampling.cpp
2024-07-27 23:15:54 +08:00
slaren
2b1f616b20
ggml : reduce hash table reset cost ( #8698 )
...
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
Concedo
24b9616344
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cli.Dockerfile
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# CMakeLists.txt
# CONTRIBUTING.md
# Makefile
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# requirements.txt
# src/llama.cpp
# tests/test-backend-ops.cpp
2024-07-19 14:23:33 +08:00
hipudding
1bdd8ae19f
[CANN] Add Ascend NPU backend ( #6035 )
...
* [CANN] Add Ascend NPU backend
Ascend is a full-stack AI computing infrastructure for industry
applications and services based on Huawei Ascend processors and
software.
CANN (Compute Architecture of Neural Networks), developped by
Huawei, is a heterogeneous computing architecture for AI.
Co-authored-by: wangshuai09 <391746016@qq.com>
* delete trailing whitespaces
* Modify the code based on review comment
* Rename LLAMA_CANN to GGML_CANN
* Make ggml-common.h private
* add ggml_cann prefix for acl funcs
* Add logging for CANN backend
* Delete Trailing whitespace
---------
Co-authored-by: wangshuai09 <391746016@qq.com>
2024-07-17 14:23:50 +03:00
Concedo
2cad736260
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .github/labeler.yml
# .gitignore
# CMakeLists.txt
# Makefile
# Package.swift
# README.md
# ci/run.sh
# docs/build.md
# examples/CMakeLists.txt
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# requirements/requirements-convert_hf_to_gguf.txt
# requirements/requirements-convert_hf_to_gguf_update.txt
# scripts/check-requirements.sh
# scripts/compare-llama-bench.py
# scripts/gen-unicode-data.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-tokenizer-random.py
2024-07-11 16:36:16 +08:00
compilade
3fd62a6b1c
py : type-check all Python scripts with Pyright ( #8341 )
...
* py : type-check all Python scripts with Pyright
* server-tests : use trailing slash in openai base_url
* server-tests : add more type annotations
* server-tests : strip "chat" from base_url in oai_chat_completions
* server-tests : model metadata is a dict
* ci : disable pip cache in type-check workflow
The cache is not shared between branches, and it's 250MB in size,
so it would become quite a big part of the 10GB cache limit of the repo.
* py : fix new type errors from master branch
* tests : fix test-tokenizer-random.py
Apparently, gcc applies optimisations even when pre-processing,
which confuses pycparser.
* ci : only show warnings and errors in python type-check
The "information" level otherwise has entries
from 'examples/pydantic_models_to_grammar.py',
which could be confusing for someone trying to figure out what failed,
considering that these messages can safely be ignored
even though they look like errors.
2024-07-07 15:04:39 -04:00
compilade
d39130a398
py : use cpu-only torch in requirements.txt ( #8335 )
2024-07-07 14:23:38 +03:00
Concedo
5b605d03ea
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/config.yml
# .gitignore
# CMakeLists.txt
# CONTRIBUTING.md
# Makefile
# README.md
# ci/run.sh
# common/common.h
# examples/main-cmake-pkg/CMakeLists.txt
# ggml/src/CMakeLists.txt
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements.txt
# requirements/requirements-convert_legacy_llama.txt
# scripts/check-requirements.sh
# scripts/pod-llama.sh
# src/CMakeLists.txt
# src/llama.cpp
# tests/test-rope.cpp
2024-07-06 00:25:10 +08:00
Georgi Gerganov
e235b267a2
py : switch to snake_case ( #8305 )
...
* py : switch to snake_case
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* cont : fix link
* gguf-py : use snake_case in scripts entrypoint export
* py : rename requirements for convert_legacy_llama.py
Needed for scripts/check-requirements.sh
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-05 07:53:33 +03:00
ditsuke
07786a61a2
chore: Fixup requirements and build
2024-07-04 15:39:13 +00:00
Concedo
02f92f6ecc
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
# examples/llama.android/llama/src/main/cpp/CMakeLists.txt
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2024-06-30 10:59:42 +08:00
Concedo
9c10486204
merge the file structure refactor, testing
2024-06-29 12:14:38 +08:00
Daniel Bevenius
9b31a40c6d
clip : suppress unused variable warnings ( #8105 )
...
* clip : suppress unused variable warnings
This commit suppresses unused variable warnings for the variables e in
the catch blocks.
The motivation for this change is to suppress the warnings that are
generated on Windows when using the MSVC compiler. The warnings are
not displayed when using GCC because GCC will mark all catch parameters
as used.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* squash! clip : suppress unused variable warnings
Remove e (/*e*/) instead instead of using GGML_UNUSED.
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-06-27 01:50:09 +02:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Concedo
b53e760557
Merge commit ' 1c641e6aac' into concedo_experimental
...
# Conflicts:
# .devops/cloud-v-pipeline
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cli.Dockerfile
# .devops/llama-cpp-clblast.srpm.spec
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# .devops/nix/apps.nix
# .devops/nix/package.nix
# .devops/tools.sh
# .dockerignore
# .github/ISSUE_TEMPLATE/01-bug-low.yml
# .github/ISSUE_TEMPLATE/02-bug-medium.yml
# .github/ISSUE_TEMPLATE/03-bug-high.yml
# .github/ISSUE_TEMPLATE/04-bug-critical.yml
# .github/workflows/bench.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .github/workflows/server.yml
# .gitignore
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# docs/token_generation_performance_tips.md
# flake.nix
# grammars/README.md
# pocs/vdot/CMakeLists.txt
# scripts/get-hellaswag.sh
# scripts/get-wikitext-103.sh
# scripts/get-wikitext-2.sh
# scripts/get-winogrande.sh
# scripts/hf.sh
# scripts/pod-llama.sh
# scripts/qnt-all.sh
# scripts/run-all-ppl.sh
# scripts/run-with-preset.py
# scripts/server-llm.sh
# tests/test-backend-ops.cpp
2024-06-14 18:41:37 +08:00
Olivier Chafik
1c641e6aac
build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df481fd8936cd7d098e3065d7de378930.
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
Concedo
6659742a2d
do not merge the removal of opencl
2024-06-05 10:57:52 +08:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing ( #7675 )
...
* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Concedo
a97f7d5f91
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/main-cuda.Dockerfile
# .devops/main-intel.Dockerfile
# .devops/main-rocm.Dockerfile
# .devops/main.Dockerfile
# .devops/server-cuda.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-rocm.Dockerfile
# .devops/server.Dockerfile
# .devops/tools.sh
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# llama.cpp
# requirements.txt
# requirements/requirements-convert-hf-to-gguf-update.txt
# requirements/requirements-convert-hf-to-gguf.txt
# requirements/requirements-convert-legacy-llama.txt
# requirements/requirements-convert-llama-ggml-to-gguf.txt
# scripts/check-requirements.sh
# scripts/compare-llama-bench.py
# scripts/convert-gg.sh
# scripts/pod-llama.sh
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-tokenizer-0.sh
# tests/test-tokenizer-random.py
2024-06-02 12:28:38 +08:00
Galunid
9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py ( #7430 )
...
* Move convert.py to examples/convert-no-torch.py
* Fix CI, scripts, readme files
* convert-no-torch -> convert-legacy-llama
* Move vocab thing to vocab.py
* Fix convert-no-torch -> convert-legacy-llama
* Fix lost convert.py in ci/run.sh
* Fix imports
* Fix gguf not imported correctly
* Fix flake8 complaints
* Fix check-requirements.sh
* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE
* Review fixes
2024-05-30 21:40:00 +10:00
Concedo
4ed9ba7352
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
# flake.lock
# tests/test-backend-ops.cpp
2024-05-28 21:57:19 +08:00
Ikko Eltociear Ashimine
74b239b3d5
llava : update clip.h ( #7580 )
...
overriden -> overridden
2024-05-28 12:48:16 +10:00
Concedo
9282c307ed
this commit does not work, just for debugging
2024-05-23 20:13:47 +08:00
Georgi Gerganov
6ff13987ad
common : normalize naming style ( #7462 )
...
* common : normalize naming style
ggml-ci
* common : match declaration / definition order
* zig : try to fix build
2024-05-22 20:04:20 +03:00
Concedo
47cbfd6150
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# README.md
# llama.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/test-backend-ops.cpp
2024-05-17 22:30:41 +08:00
slaren
344f9126cc
ggml : tag ggml_tensor::backend as deprecated ( #7290 )
2024-05-15 15:08:48 +02:00
Concedo
2ee808a747
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# README.md
# ci/run.sh
# llama.cpp
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# requirements.txt
# scripts/compare-llama-bench.py
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-tokenizer-1-bpe.cpp
2024-05-14 19:28:47 +08:00
k.h.lai
30e70334f7
llava-cli: fix base64 prompt ( #7248 )
2024-05-14 00:02:36 +10:00
Justine Tunney
4e3880978f
Fix memory bug in grammar parser ( #7194 )
...
The llama.cpp grammar parser had a bug where forgetting to add a closing
quotation mark to strings would cause parsing to crash. Anyone running a
server on a public endpoint is advised to upgrade. To reproduce this bug
./llamafile -m foo.gguf -p bar --grammar 'root::="'
Credit for discovering and reporting this issue goes to Eclypsium
Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.
2024-05-10 21:01:08 +10:00
Andrei
d11afd6652
llava : fix moondream support ( #7163 )
...
* Revert "Revert "llava : add support for moondream vision language model (#6899 )""
This reverts commit 9da243b36a .
* Fix num_positions and embeddings initialization
2024-05-10 09:41:10 +03:00
Georgi Gerganov
9da243b36a
Revert "llava : add support for moondream vision language model ( #6899 )"
...
This reverts commit 46e12c4692 .
2024-05-08 22:14:39 +03:00
Concedo
bc39b4d98a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# README.md
# ci/run.sh
# docs/BLIS.md
# flake.lock
# grammars/README.md
2024-05-08 09:58:23 +08:00
omahs
04976db7a8
docs: fix typos ( #7124 )
...
* fix typo
* fix typos
* fix typo
* fix typos
* fix typo
* fix typos
2024-05-07 18:20:33 +03:00
Concedo
89db8afded
revert moondream to try and fix llava
2024-05-04 10:07:54 +08:00
Concedo
17a24d753c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/main-intel.Dockerfile
# .devops/main-vulkan.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-vulkan.Dockerfile
# .github/workflows/bench.yml
# .github/workflows/build.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .gitignore
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# flake.lock
# llama.cpp
# models/ggml-vocab-falcon.gguf
# models/ggml-vocab-llama-spm.gguf
# models/ggml-vocab-mpt.gguf
# models/ggml-vocab-stablelm.gguf
# models/ggml-vocab-starcoder.gguf
# requirements.txt
# scripts/check-requirements.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-tokenizer-0-bpe.py
# tests/test-tokenizer-0-spm.py
# tests/test-tokenizer-1-spm.cpp
2024-04-30 21:04:17 +08:00
cpumaxx
ffe666572f
llava-cli : multiple images ( #6969 )
...
Co-authored-by: root <root@nenya.lothlorien.ca>
2024-04-29 17:34:24 +03:00
Concedo
544c36f751
merge, deprecate openblas
2024-04-26 19:24:59 +08:00
vik
46e12c4692
llava : add support for moondream vision language model ( #6899 )
...
* add support for moondream vision language model
This required making the following changes to the CLIP model:
1. Support for patch embedding bias.
2. Make class embedding and pre-layernorm optional.
3. Add support for post-layernorm.
* Update examples/llava/clip.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-25 22:38:31 +03:00
Daniel Bevenius
4ab99d8d47
clip : rename lerp function to avoid conflict ( #6894 )
...
This commit renamesthe lerp (linear interpolation) function in clip.cpp
to avoid a conflict with the lerp function in the <cmath> standard C++
library when using c++20.
The motivation for this change is to enable projects that use c++20 to
be able to compile clip.cpp without having to resort to patching it. The
lerp function was added to cmath in version C++20 (202002L) and is why
this is not causing any issue at the moment as C++11/C++17 is currently
used by llama.cpp.
I realize that llama.cpp uses either C++11 (or C++17 in the case for
SYCL) but wanted to ask if this would be an acceptable change just the
same.
Refs: https://en.cppreference.com/w/cpp/numeric/lerp
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-25 15:38:14 +03:00
Concedo
b4d2031215
merged, added ability to render special tokens
2024-04-22 18:19:58 +08:00
Justine Tunney
89b0bf0d5d
llava : use logger in llava-cli ( #6797 )
...
This change removes printf() logging so llava-cli is shell scriptable.
2024-04-21 15:19:04 +03:00
Pedro Cuenca
b97bc3966e
llama : support Llama 3 HF conversion ( #6745 )
...
* Support Llama 3 conversion
The tokenizer is BPE.
* style
* Accept suggestion
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
* llama : add llama_token_is_eog()
ggml-ci
* llama : auto-detect more EOT tokens when missing in KV data
* convert : replacing EOS token is a hack
* llama : fix codegemma EOT token + add TODOs
* llama : fix model type string for 8B model
---------
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-21 14:50:41 +03:00
Concedo
9a25d77cc1
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# ggml-cuda.cu
# ggml.c
# grammars/README.md
# scripts/get-wikitext-2.sh
# scripts/hf.sh
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2024-04-14 21:18:39 +08:00
Rene Leonhardt
5c4d767ac0
chore: Fix markdown warnings ( #6625 )
2024-04-12 10:52:36 +02:00
Jared Van Bortel
1b67731e18
BERT tokenizer fixes ( #6498 )
...
Key changes:
* BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS
* Nomic Embed conversion: pad vocab instead of slicing embedding tensor
* llama_tokenize: handle added special tokens like HF does
2024-04-09 13:44:08 -04:00
Concedo
81ac0e5656
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/llama-cpp-clblast.srpm.spec
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-rocm.Dockerfile
# .devops/server-vulkan.Dockerfile
# .devops/server.Dockerfile
# .github/workflows/build.yml
# .github/workflows/code-coverage.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .github/workflows/zig-build.yml
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# examples/gguf-split/gguf-split.cpp
# flake.lock
# flake.nix
# llama.cpp
# scripts/compare-llama-bench.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2024-04-07 22:07:27 +08:00
Concedo
a530afa1e4
Merge commit ' 280345968d' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/main-cuda.Dockerfile
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# docs/token_generation_performance_tips.md
# flake.lock
# llama.cpp
# scripts/LlamaConfig.cmake.in
# scripts/compare-commits.sh
# scripts/server-llm.sh
# tests/test-quantize-fns.cpp
2024-04-07 20:27:17 +08:00
Concedo
9c0fbf9f73
Merge commit ' ad3a0505e3' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/code-coverage.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .github/workflows/zig-build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# build.zig
# common/CMakeLists.txt
# llama.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
2024-04-06 18:32:57 +08:00
Ziang Wu
66ba560256
llava : fix MobileVLM ( #6364 )
...
* fix empty bug
* Update MobileVLM-README.md
added more results on devices
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update MobileVLM-README.md
remove gguf links
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-28 16:33:10 +02:00
Ziang Wu
d0e2f6416b
doc: fix typo in MobileVLM-README.md ( #6181 )
2024-03-28 13:03:30 +09:00