Concedo
02f92f6ecc
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
# examples/llama.android/llama/src/main/cpp/CMakeLists.txt
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2024-06-30 10:59:42 +08:00
henk717
8421243c6d
Chat Adapters ( #956 )
...
* Give the CI builds a recognizable AVX1 name
* Chat Adapters
2024-06-30 10:28:43 +08:00
Concedo
d120c55e12
try to fix build errors (+1 squashed commits)
...
Squashed commits:
[27c28292] try fix build errors
2024-06-29 23:11:00 +08:00
Concedo
9c10486204
merge the file structure refactor, testing
2024-06-29 12:14:38 +08:00
Nexesenex
a5a32b9179
Add Cuda Graphs in CMakeList ( #947 )
...
original LCPP PR 3242 by Slaren
2024-06-29 10:10:51 +08:00
Xuan Son Nguyen
72272b83a3
fix code typo in llama-cli ( #8198 )
2024-06-29 00:14:20 +02:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator ( #8189 )
2024-06-28 18:02:05 +01:00
Concedo
2857fb52ba
remove contribute template
2024-06-28 23:47:03 +08:00
Nexesenex
1203a56f6b
Add Cublas12 dlls to .gitignore ( #946 )
2024-06-28 23:41:44 +08:00
Xuan Son Nguyen
26a39bbd6b
Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal ( #8172 )
...
* tmp_contains
* minicpm chat template
* add DeepSeek Lite template
* change deepseek-lite to deepseek2
* correct code comment
* correct code from master branch
2024-06-28 15:11:44 +02:00
Sigbjørn Skjæret
38373cfbab
Add SPM infill support ( #8016 )
...
* add --spm-infill option
* support --spm-infill
* support --spm-infill
2024-06-28 12:53:43 +02:00
slaren
b851b3fba0
cmake : allow user to override default options ( #8178 )
2024-06-28 12:37:45 +02:00
Concedo
5e671c2162
ignore utf decode errors
2024-06-28 16:39:48 +08:00
Olivier Chafik
139cc621e9
json: restore default additionalProperties to false, fix some pattern escapes (#8180 )
...
* json: expand ESCAPED_IN_REGEXPS_BUT_NOT_IN_LITERALS charset
* json: revert default of additionalProperties to false
* Update README.md
2024-06-28 09:26:45 +01:00
Concedo
3e6a9673ef
add runpod to readme
2024-06-28 14:47:11 +08:00
pculliton
e57dc62057
llama: Add support for Gemma2ForCausalLM ( #8156 )
...
* Inference support for Gemma 2 model family
* Update convert-hf-to-gguf.py, constants, and tensor mappings
* cleanup
* format fix
* Fix special token vocab bug
* Don't add space prefix
* fix deleted lines
* Update src/llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Add model type names
* Add control vector
* Fix model type identification
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-27 21:00:43 -07:00
Xuan Son Nguyen
a27aa50ab7
Add missing items in makefile ( #8177 )
2024-06-28 02:19:11 +02:00
Olivier Chafik
cb0b06a8a6
json: update grammars/README w/ examples & note about additionalProperties (#8132 )
...
* json: update grammars/README
* mention broken prefixItems
* add mention to llama-gbnf-validator
* json: explicit type: object for nested items object in cli example
2024-06-27 22:08:42 +01:00
loonerin
558f44bf83
CI: fix release build (Ubuntu+Mac) ( #8170 )
...
* CI: fix release build (Ubuntu)
PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.
* CI: fix release build (Mac)
---------
Co-authored-by: loonerin <loonerin@users.noreply.github.com>
2024-06-27 21:01:23 +02:00
slaren
8172ee9da9
cmake : fix deprecated option names not working ( #8171 )
...
* cmake : fix deprecated option names not working
* remove LlAMA_OPENMP
2024-06-27 20:04:39 +02:00
Xuan Son Nguyen
16791b8f0b
Add chatml fallback for cpp llama_chat_apply_template ( #8160 )
...
* add chatml fallback for cpp `llama_chat_apply_template`
* remove redundant code
2024-06-27 18:14:19 +02:00
Georgi Gerganov
ab3679112d
flake.lock: Update ( #8071 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/e9ee548d90ff586a6471b4ae80ae9cfcbceb3420?narHash=sha256-4Zu0RYRcAY/VWuu6awwq4opuiD//ahpc2aFHg2CWqFY%3D' (2024-06-13)
→ 'github:NixOS/nixpkgs/d603719ec6e294f034936c0d0dc06f689d91b6c3?narHash=sha256-k3JqJrkdoYwE3fHE6xGDY676AYmyh4U2Zw%2B0Bwe5DLU%3D' (2024-06-20)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Philip Taron <philip.taron@gmail.com>
2024-06-27 08:37:29 -07:00
jukofyork
97877eb10b
Control vector loading fixes ( #8137 )
...
* Fixed leak in llama_control_vector_load_one() and allow llama_control_vector_load() to grow
* refactored `llama_control_vector_load_one()`
* allow multiple directions for same layer in same file
* llama_control_vector_load_one() and llama_control_vector_load() now break on error
* removed unnecessary ggml_free() call
2024-06-27 16:48:07 +02:00
Raj Hammeer Singh Hada
387952651a
Delete examples/llama.android/llama/CMakeLists.txt ( #8165 )
...
* Delete examples/llama.android/llama/CMakeLists.txt
https://github.com/ggerganov/llama.cpp/pull/8145#issuecomment-2194534244
This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead.
* Update CMakeLists.txt
Pick local llama.cpp files instead of fetching content from git
2024-06-27 16:39:29 +02:00
Sigbjørn Skjæret
6030c61281
Add Qwen2MoE 57B-A14B model identifier ( #8158 )
...
* Add Qwen2MoE 57B-A14B
* Add Qwen2MoE 57B-A14B
2024-06-27 16:27:41 +02:00
Johannes Gäßler
85a267daaa
CUDA: fix MMQ stream-k for --split-mode row ( #8167 )
2024-06-27 16:26:05 +02:00
Concedo
176923f58b
test wip cmake
2024-06-27 21:53:44 +08:00
Concedo
1801594972
allow forced positive prompt
2024-06-27 20:21:17 +08:00
kustaaya
f675b20a3b
Added support for Viking pre-tokenizer ( #8135 )
...
Co-authored-by: kustaaya <kustaaya@protonmail.com>
2024-06-27 10:58:54 +02:00
Concedo
e433afb261
updated lite
2024-06-27 15:51:34 +08:00
Sigbjørn Skjæret
911e35bb8b
llama : fix CodeLlama FIM token checks ( #8144 )
...
* account for space prefix character
* use find instead
2024-06-27 10:46:41 +03:00
Concedo
4f369b0a0a
update colab
2024-06-27 15:41:06 +08:00
Concedo
11f0643fa4
fix pyinstallers
2024-06-27 15:19:44 +08:00
Raj Hammeer Singh Hada
ac146628e4
Fix llama-android.cpp for error - "common/common.h not found" ( #8145 )
...
- Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"
2024-06-27 03:57:57 +02:00
Daniel Bevenius
9b31a40c6d
clip : suppress unused variable warnings ( #8105 )
...
* clip : suppress unused variable warnings
This commit suppresses unused variable warnings for the variables e in
the catch blocks.
The motivation for this change is to suppress the warnings that are
generated on Windows when using the MSVC compiler. The warnings are
not displayed when using GCC because GCC will mark all catch parameters
as used.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* squash! clip : suppress unused variable warnings
Remove e (/*e*/) instead instead of using GGML_UNUSED.
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-06-27 01:50:09 +02:00
Georgi Gerganov
c70d117c37
scripts : fix filename sync
2024-06-26 23:25:22 +03:00
slaren
ae5d0f4b89
ci : publish new docker images only when the files change ( #8142 )
2024-06-26 21:59:28 +02:00
slaren
31ec3993f6
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS (cmake) ( #8140 )
2024-06-26 21:34:14 +02:00
slaren
c7ab7b612c
make : fix missing -O3 ( #8143 )
2024-06-26 21:20:22 +03:00
Georgi Gerganov
f2d48fffde
sync : ggml
2024-06-26 19:39:19 +03:00
Georgi Gerganov
4713bf3093
authors : regen
2024-06-26 19:36:44 +03:00
Georgi Gerganov
0e814dfc42
devops : remove clblast + LLAMA_CUDA -> GGML_CUDA ( #8139 )
...
ggml-ci
2024-06-26 19:32:07 +03:00
Georgi Gerganov
a95631ee97
readme : update API notes
2024-06-26 19:26:13 +03:00
Concedo
73b99a7266
add premade chat completions adapter
2024-06-27 00:13:06 +08:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Concedo
f3dfa96dbc
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# .github/workflows/docker.yml
# README.md
# llama.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
2024-06-26 18:59:10 +08:00
Concedo
24bfa54f3c
updated lite
2024-06-26 18:53:32 +08:00
Concedo
70000b47e2
Revert "Revert "set flags to optimize for mmq""
...
This reverts commit 7959e937a1 .
2024-06-26 16:47:12 +08:00
Concedo
7959e937a1
Revert "set flags to optimize for mmq"
...
This reverts commit 8ad0d29ef8 .
2024-06-26 14:32:58 +08:00
Isaac McFadyen
8854044561
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag ( #8115 )
...
* Add message about int8 support
* Add suggestions from review
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-06-26 08:29:28 +02:00