Concedo
6659742a2d
do not merge the removal of opencl
2024-06-05 10:57:52 +08:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing ( #7675 )
...
* common : gpt_params_parse do not print usage
* common : rework usage print (wip)
* common : valign
* common : rework print_usage
* infill : remove cfg support
* common : reorder args
* server : deduplicate parameters
ggml-ci
* common : add missing header
ggml-ci
* common : remote --random-prompt usages
ggml-ci
* examples : migrate to gpt_params
ggml-ci
* batched-bench : migrate to gpt_params
* retrieval : migrate to gpt_params
* common : change defaults for escape and n_ctx
* common : remove chatml and instruct params
ggml-ci
* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Concedo
a97f7d5f91
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/main-cuda.Dockerfile
# .devops/main-intel.Dockerfile
# .devops/main-rocm.Dockerfile
# .devops/main.Dockerfile
# .devops/server-cuda.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-rocm.Dockerfile
# .devops/server.Dockerfile
# .devops/tools.sh
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# llama.cpp
# requirements.txt
# requirements/requirements-convert-hf-to-gguf-update.txt
# requirements/requirements-convert-hf-to-gguf.txt
# requirements/requirements-convert-legacy-llama.txt
# requirements/requirements-convert-llama-ggml-to-gguf.txt
# scripts/check-requirements.sh
# scripts/compare-llama-bench.py
# scripts/convert-gg.sh
# scripts/pod-llama.sh
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-tokenizer-0.sh
# tests/test-tokenizer-random.py
2024-06-02 12:28:38 +08:00
Galunid
9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py ( #7430 )
...
* Move convert.py to examples/convert-no-torch.py
* Fix CI, scripts, readme files
* convert-no-torch -> convert-legacy-llama
* Move vocab thing to vocab.py
* Fix convert-no-torch -> convert-legacy-llama
* Fix lost convert.py in ci/run.sh
* Fix imports
* Fix gguf not imported correctly
* Fix flake8 complaints
* Fix check-requirements.sh
* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE
* Review fixes
2024-05-30 21:40:00 +10:00
Concedo
4ed9ba7352
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
# flake.lock
# tests/test-backend-ops.cpp
2024-05-28 21:57:19 +08:00
Ikko Eltociear Ashimine
74b239b3d5
llava : update clip.h ( #7580 )
...
overriden -> overridden
2024-05-28 12:48:16 +10:00
Concedo
9282c307ed
this commit does not work, just for debugging
2024-05-23 20:13:47 +08:00
Georgi Gerganov
6ff13987ad
common : normalize naming style ( #7462 )
...
* common : normalize naming style
ggml-ci
* common : match declaration / definition order
* zig : try to fix build
2024-05-22 20:04:20 +03:00
Concedo
47cbfd6150
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# README.md
# llama.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/test-backend-ops.cpp
2024-05-17 22:30:41 +08:00
slaren
344f9126cc
ggml : tag ggml_tensor::backend as deprecated ( #7290 )
2024-05-15 15:08:48 +02:00
Concedo
2ee808a747
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# README.md
# ci/run.sh
# llama.cpp
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# requirements.txt
# scripts/compare-llama-bench.py
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-tokenizer-1-bpe.cpp
2024-05-14 19:28:47 +08:00
k.h.lai
30e70334f7
llava-cli: fix base64 prompt ( #7248 )
2024-05-14 00:02:36 +10:00
Justine Tunney
4e3880978f
Fix memory bug in grammar parser ( #7194 )
...
The llama.cpp grammar parser had a bug where forgetting to add a closing
quotation mark to strings would cause parsing to crash. Anyone running a
server on a public endpoint is advised to upgrade. To reproduce this bug
./llamafile -m foo.gguf -p bar --grammar 'root::="'
Credit for discovering and reporting this issue goes to Eclypsium
Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.
2024-05-10 21:01:08 +10:00
Andrei
d11afd6652
llava : fix moondream support ( #7163 )
...
* Revert "Revert "llava : add support for moondream vision language model (#6899 )""
This reverts commit 9da243b36a
.
* Fix num_positions and embeddings initialization
2024-05-10 09:41:10 +03:00
Georgi Gerganov
9da243b36a
Revert "llava : add support for moondream vision language model ( #6899 )"
...
This reverts commit 46e12c4692
.
2024-05-08 22:14:39 +03:00
Concedo
bc39b4d98a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# README.md
# ci/run.sh
# docs/BLIS.md
# flake.lock
# grammars/README.md
2024-05-08 09:58:23 +08:00
omahs
04976db7a8
docs: fix typos ( #7124 )
...
* fix typo
* fix typos
* fix typo
* fix typos
* fix typo
* fix typos
2024-05-07 18:20:33 +03:00
Concedo
89db8afded
revert moondream to try and fix llava
2024-05-04 10:07:54 +08:00
Concedo
17a24d753c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/main-intel.Dockerfile
# .devops/main-vulkan.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-vulkan.Dockerfile
# .github/workflows/bench.yml
# .github/workflows/build.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .gitignore
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# flake.lock
# llama.cpp
# models/ggml-vocab-falcon.gguf
# models/ggml-vocab-llama-spm.gguf
# models/ggml-vocab-mpt.gguf
# models/ggml-vocab-stablelm.gguf
# models/ggml-vocab-starcoder.gguf
# requirements.txt
# scripts/check-requirements.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-tokenizer-0-bpe.py
# tests/test-tokenizer-0-spm.py
# tests/test-tokenizer-1-spm.cpp
2024-04-30 21:04:17 +08:00
cpumaxx
ffe666572f
llava-cli : multiple images ( #6969 )
...
Co-authored-by: root <root@nenya.lothlorien.ca>
2024-04-29 17:34:24 +03:00
Concedo
544c36f751
merge, deprecate openblas
2024-04-26 19:24:59 +08:00
vik
46e12c4692
llava : add support for moondream vision language model ( #6899 )
...
* add support for moondream vision language model
This required making the following changes to the CLIP model:
1. Support for patch embedding bias.
2. Make class embedding and pre-layernorm optional.
3. Add support for post-layernorm.
* Update examples/llava/clip.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-25 22:38:31 +03:00
Daniel Bevenius
4ab99d8d47
clip : rename lerp function to avoid conflict ( #6894 )
...
This commit renamesthe lerp (linear interpolation) function in clip.cpp
to avoid a conflict with the lerp function in the <cmath> standard C++
library when using c++20.
The motivation for this change is to enable projects that use c++20 to
be able to compile clip.cpp without having to resort to patching it. The
lerp function was added to cmath in version C++20 (202002L) and is why
this is not causing any issue at the moment as C++11/C++17 is currently
used by llama.cpp.
I realize that llama.cpp uses either C++11 (or C++17 in the case for
SYCL) but wanted to ask if this would be an acceptable change just the
same.
Refs: https://en.cppreference.com/w/cpp/numeric/lerp
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-25 15:38:14 +03:00
Concedo
b4d2031215
merged, added ability to render special tokens
2024-04-22 18:19:58 +08:00
Justine Tunney
89b0bf0d5d
llava : use logger in llava-cli ( #6797 )
...
This change removes printf() logging so llava-cli is shell scriptable.
2024-04-21 15:19:04 +03:00
Pedro Cuenca
b97bc3966e
llama : support Llama 3 HF conversion ( #6745 )
...
* Support Llama 3 conversion
The tokenizer is BPE.
* style
* Accept suggestion
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
* llama : add llama_token_is_eog()
ggml-ci
* llama : auto-detect more EOT tokens when missing in KV data
* convert : replacing EOS token is a hack
* llama : fix codegemma EOT token + add TODOs
* llama : fix model type string for 8B model
---------
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-21 14:50:41 +03:00
Concedo
9a25d77cc1
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# ggml-cuda.cu
# ggml.c
# grammars/README.md
# scripts/get-wikitext-2.sh
# scripts/hf.sh
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2024-04-14 21:18:39 +08:00
Rene Leonhardt
5c4d767ac0
chore: Fix markdown warnings ( #6625 )
2024-04-12 10:52:36 +02:00
Jared Van Bortel
1b67731e18
BERT tokenizer fixes ( #6498 )
...
Key changes:
* BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS
* Nomic Embed conversion: pad vocab instead of slicing embedding tensor
* llama_tokenize: handle added special tokens like HF does
2024-04-09 13:44:08 -04:00
Concedo
81ac0e5656
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/llama-cpp-clblast.srpm.spec
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-rocm.Dockerfile
# .devops/server-vulkan.Dockerfile
# .devops/server.Dockerfile
# .github/workflows/build.yml
# .github/workflows/code-coverage.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .github/workflows/zig-build.yml
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# examples/gguf-split/gguf-split.cpp
# flake.lock
# flake.nix
# llama.cpp
# scripts/compare-llama-bench.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2024-04-07 22:07:27 +08:00
Concedo
a530afa1e4
Merge commit ' 280345968d
' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/main-cuda.Dockerfile
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# docs/token_generation_performance_tips.md
# flake.lock
# llama.cpp
# scripts/LlamaConfig.cmake.in
# scripts/compare-commits.sh
# scripts/server-llm.sh
# tests/test-quantize-fns.cpp
2024-04-07 20:27:17 +08:00
Concedo
9c0fbf9f73
Merge commit ' ad3a0505e3
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/code-coverage.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .github/workflows/zig-build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# build.zig
# common/CMakeLists.txt
# llama.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
2024-04-06 18:32:57 +08:00
Ziang Wu
66ba560256
llava : fix MobileVLM ( #6364 )
...
* fix empty bug
* Update MobileVLM-README.md
added more results on devices
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update MobileVLM-README.md
remove gguf links
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-28 16:33:10 +02:00
Ziang Wu
d0e2f6416b
doc: fix typo in MobileVLM-README.md ( #6181 )
2024-03-28 13:03:30 +09:00
slaren
280345968d
cuda : rename build flag to LLAMA_CUDA ( #6299 )
2024-03-26 01:16:01 +01:00
Ziang Wu
f9c7ba3447
llava : update MobileVLM-README.md ( #6180 )
2024-03-20 17:29:51 +02:00
Ziang Wu
272935b281
llava : add MobileVLM_V2 backup ( #6175 )
...
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* clip : fix whitespace
* fix deifinition mistake in clip.cpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 17:02:32 +02:00
Georgi Gerganov
d795988d9e
Revert "llava : add a MobileVLM_V2-1.7B backup ( #6152 )"
...
This reverts commit f8c4e745e1
.
2024-03-20 13:29:49 +02:00
Ziang Wu
f8c4e745e1
llava : add a MobileVLM_V2-1.7B backup ( #6152 )
...
* Add MobileVLM_V2 backup
* Update MobileVLM-README.md
* Update examples/llava/MobileVLM-README.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/convert-image-encoder-to-gguf.py
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* clip : fix whitespace
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 13:20:37 +02:00
Concedo
a3fa919c67
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# Makefile
# flake.lock
# ggml-cuda.cu
# ggml-cuda.h
2024-03-19 18:57:22 +08:00
Felix
104f5e0fc1
clip : fix memory leak ( #6138 )
2024-03-18 17:40:22 +02:00
Concedo
8b360b661c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# common/common.h
2024-03-17 23:03:12 +08:00
Ting Lou
4e9a7f7f7f
llava : change API to pure C style for Rust FFI bindgen ( #6079 )
...
Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>
2024-03-15 16:31:05 +02:00
Concedo
93d3871056
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# ggml-metal.m
2024-03-15 10:37:48 +08:00
Steve Grubb
6e0438da3c
gguf : fix resource leaks ( #6061 )
...
There several places where a gguf context is allocated. A call to gguf_free
is missing in some error paths. Also on linux, llama-bench was missing a
fclose.
2024-03-14 20:29:32 +02:00
Jian Liao
15a333260a
readme : improve readme for Llava-1.6 example ( #6044 )
...
Co-authored-by: Jian Liao <jianliao@adobe.com>
2024-03-14 13:18:23 +02:00
Concedo
6a32c14e86
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# flake.lock
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/.gitignore
# tests/test-backend-ops.cpp
2024-03-11 23:00:47 +08:00
Concedo
227f59dab6
added a simple program to do quantization for clip models
2024-03-11 21:50:30 +08:00
Georgi Gerganov
5b09797321
ggml : remove old quantization functions ( #5942 )
...
* ggml : remove old quantization functions
ggml-ci
* ggml : simplify ggml_quantize_chunk
ggml-ci
* ggml : restrict correctness
ggml-ci
* ggml : remove hist data from the quantization API
ggml-ci
* tests : remove hist usage in test-backend-ops
ggml-ci
* vulkan : remove hist and fix typo
2024-03-09 15:53:59 +02:00
Georgi Gerganov
ab336a9d5e
code : normalize enum names ( #5697 )
...
* coda : normalize enum names
ggml-ci
* code : cont
* code : cont
2024-02-25 12:09:09 +02:00