Diego Devesa
e0e912f49b
llama : add option to override model tensor buffers ( #11397 )
...
* llama : add option to override tensor buffers
* ggml : fix possible underflow in ggml_nbytes
2025-04-02 14:52:01 +02:00
Georgi Gerganov
a10b36c91a
llama : refactor kv cache guard ( #12695 )
...
* llama : refactor kv cache guard
ggml-ci
* cont : fix comment [no ci]
* llama : fix kv_cache restore logic
ggml-ci
* context : simplify kv cache updates
ggml-ci
* cont : better name [no ci]
* llama : fix llama_decode return code when could not find KV slot
ggml-ci
* context : change log err -> warn [no ci]
* kv-cache : add comment + warning
2025-04-02 14:32:59 +03:00
Concedo
7f1003be44
warning for max tokens being too high
2025-04-02 18:58:38 +08:00
Sigbjørn Skjæret
83a88bd6af
vocab : BailingMoE : change possessive quantifiers to greedy ( #12677 )
2025-04-02 11:21:48 +02:00
Xuan-Son Nguyen
42eb248f46
common : remove json.hpp from common.cpp ( #12697 )
...
* common : remove json.hpp from common.cpp
* fix comment
2025-04-02 09:58:34 +02:00
Chenguang Li
9bacd6b374
[CANN] get_rows and dup optimization ( #12671 )
...
* [CANN]get_rows and dup optimization.
Co-authored-by: hipudding <huafengchun@gmail.com>
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]GET_ROWS and CPY/DUP optimization
Co-authored-by: hipudding <huafengchun@gmail.com>
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]code style adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]code style adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]code style adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]code style adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
---------
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2025-04-02 15:22:13 +08:00
Concedo
669311365c
fixed gemma system prompt
2025-04-02 13:58:51 +08:00
Xuan-Son Nguyen
267c1399f1
common : refactor downloading system, handle mmproj with -hf option ( #12694 )
...
* (wip) refactor downloading system [no ci]
* fix all examples
* fix mmproj with -hf
* gemma3: update readme
* only handle mmproj in llava example
* fix multi-shard download
* windows: fix problem with std::min and std::max
* fix 2
2025-04-01 23:44:05 +02:00
Junil Kim
f423981ac8
opencl : fix memory allocation size ( #12649 )
...
issue:
https://github.com/CodeLinaro/llama.cpp/pull/17#issuecomment-2760611283
This patch fixes the memory allocation size
not exceeding the maximum size of the OpenCL device.
2025-04-01 09:54:34 -07:00
Concedo
fbf5c04c3c
silly me
2025-04-02 00:51:05 +08:00
Concedo
30e3d24ead
embd include name
2025-04-02 00:40:38 +08:00
Concedo
e37f27632f
clear cpu flag manually for templates, added truncation for embeddings
2025-04-02 00:18:30 +08:00
jklincn
e39e727e9a
llama : use LLM_KV_GENERAL_FILE_TYPE instead of gguf_find_key ( #12672 )
2025-04-01 14:54:28 +02:00
Sigbjørn Skjæret
5936a616e4
convert : BailingMoE : fix qkv split when head_dim is 0 ( #12687 )
...
NOTE: Ling-lite-base is broken, see https://huggingface.co/inclusionAI/Ling-lite-base/discussions/2
2025-04-01 14:37:13 +02:00
Concedo
8a4a9b8c19
Merge branch 'upstream' into concedo_experimental
2025-04-01 20:16:16 +08:00
Concedo
9e182b3e78
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/backend/SYCL.md
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/ggml-vulkan.cpp
# scripts/sync-ggml.last
# tests/test-chat-template.cpp
2025-04-01 20:16:07 +08:00
Georgi Gerganov
3fd072a540
metal : use F32 prec in FA kernels ( #12688 )
...
* metal : use F32 prec in FA kernels
ggml-ci
* cont : fix FA vec kernel
ggml-ci
2025-04-01 14:57:19 +03:00
Concedo
0fd94e19f3
made tool calls more robust and allowed tool call template customization
2025-04-01 19:16:45 +08:00
R0CKSTAR
a6f32f0b34
Fix clang warning in gguf_check_reserved_keys ( #12686 )
...
* Fix clang warning in gguf_check_reserved_keys
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Fix typo
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-01 13:12:53 +02:00
Wagner Bruna
2bb3597e42
vulkan: fix build when glslc doesn't support coopmat ( #12683 )
2025-04-01 11:38:07 +02:00
Romain Biessy
8293970542
SYCL: Rename oneMKL to oneMath ( #12192 )
...
* Rename oneMKL Interface to oneMath
* Use oneMath for Intel vendor
* Rename occurences to mkl
* clang-format
* Silence verbose warnings
* Set oneMath HIP_TARGETS
* Fix silence warnings
* Remove step to build oneMath from build instructions
* Use fixed oneMath version
* Remove INTEL_CPU
* Fold CMake oneDNN conditions
* Use Intel oneMKL for Intel devices
* Improve CMake message
* Link against MKL::MKL_SYCL::BLAS only
* Move oneMath documentation to Nvidia and AMD sections
2025-04-01 16:24:29 +08:00
Akarshan Biswas
8bbf26083d
SYCL: switch to SYCL namespace ( #12674 )
2025-04-01 10:11:39 +02:00
henk717
4291e1575b
Fix tool spec, this spec is kinda.... ( #1458 )
2025-04-01 10:39:02 +08:00
Sigbjørn Skjæret
35782aeedb
convert : BailingMoE : avoid setting rope_dim to 0 ( #12678 )
2025-03-31 23:09:48 +02:00
Daniel Bevenius
c80a7759da
vocab : add special infill tokens for CodeLlama ( #11850 )
...
* vocab : add special infill tokens for CodeLlama
The commit adds the following special tokens for CodeLlama infill:
- `▁<PRE>`
- `▁<SUF>`
- `▁<MID>`
The motivation for this is that currently the infill example uses
CodeLlama as a suggested model. But when using this model the following
error is generated:
```console
/llama.cpp-debug/examples/infill/infill.cpp:165: GGML_ASSERT(llama_vocab_fim_pre(vocab) >= 0) failed
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
305251 Aborted (core dumped)
./build/bin/llama-infill -t 10 -ngl 0 -m models/codellama-13b.Q5_K_S.gguf \
-c 4096 --temp 0.7 --repeat_penalty 1.1 -n 20 \
--in-prefix "def helloworld():\n print(\"hell" \
--in-suffix "\n print(\"goodbye world\")\n "
```
* squash! vocab : add special infill tokens for CodeLlama
Add _<EOT> as well.
2025-03-31 18:40:56 +02:00
Concedo
c0adaabfa4
Revert "try fix owui"
...
This reverts commit 12e5b8abdb
.
2025-04-01 00:27:31 +08:00
Concedo
12e5b8abdb
try fix owui
2025-04-01 00:23:45 +08:00
a3sh
250d7953e8
ggml : faster ssm scan ( #10558 )
...
* faster ssm_scan
* delete unused commnet
* clang format
* add space
* modify unnecessary calculations
* faster ssm conv implementatioin
* modify file name with dash
2025-03-31 18:05:13 +02:00
Concedo
0ed95fcccc
fixed l3 template, add index
2025-03-31 23:59:06 +08:00
Sigbjørn Skjæret
403fbacbbc
convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if present ( #12667 )
2025-03-31 16:36:25 +02:00
0cc4m
a8a1f33567
Vulkan: Add DP4A MMQ and Q8_1 quantization shader ( #12135 )
...
* Vulkan: Add DP4A MMQ and Q8_1 quantization shader
* Add q4_0 x q8_1 matrix matrix multiplication support
* Vulkan: Add int8 coopmat MMQ support
* Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code
* Add GL_EXT_integer_dot_product check
* Remove ggml changes, fix mmq pipeline picker
* Remove ggml changes, restore Intel coopmat behaviour
* Fix glsl compile attempt when integer vec dot is not supported
* Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq
* Remove redundant comment
* Fix integer dot check
* Fix compile issue with unsupported int dot glslc
* Update Windows build Vulkan SDK version
2025-03-31 14:37:01 +02:00
Georgi Gerganov
1790e73157
cmake : fix whitespace ( #0 )
2025-03-31 15:07:32 +03:00
Georgi Gerganov
0114a32da0
sync : ggml
...
ggml-ci
2025-03-31 15:07:32 +03:00
Sandro Hanea
a7724480fd
cmake: improve Vulkan cooperative matrix support checks (whisper/2966)
...
Co-authored-by: Sandro Hanea <me@sandro.rocks>
2025-03-31 15:07:32 +03:00
Sigbjørn Skjæret
1a85949067
llava : proper description fix ( #12668 )
2025-03-31 11:28:30 +02:00
Akarshan Biswas
6c02a032fa
SYCL: Remove misleading ggml_sycl_op_flatten function ( #12387 )
...
* SYCL: Remove misleading ggml_sycl_op_flatten function
* remove trailing whitespace
* Fix L2 norm from rebase
* remove try catch block from element_wise.cpp
* remove comment from common.hp
* ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward
* norm.cpp: remove try catch exception block
2025-03-31 11:25:24 +02:00
Sigbjørn Skjæret
f52d59d771
llava : fix clip loading GGUFs with missing description ( #12660 )
2025-03-31 11:07:07 +02:00
Concedo
1ebadc515e
add streaming support for oai tools (+2 squashed commit)
...
Squashed commit:
[4d080b37] qwen2.5vl surgery script
[4bebe7e5] add streaming support for oai tools
2025-03-31 16:49:15 +08:00
marcoStocchi
52de2e5949
tts : remove printfs ( #12640 )
...
* tts.cpp : llama tokens console output is done using LOG_INF instead of printf(). Therefore the options '--log-disable' and '--log-file' have now uniform impact on all output.
2025-03-31 11:20:30 +03:00
henk717
091eb367fc
More robust tool calling prompt ( #1455 )
...
* More robust tool checking prompt
* Inform UI we want a tool
2025-03-31 14:43:03 +08:00
Sigbjørn Skjæret
2c3f8b850a
llama : support BailingMoE (Ling) ( #12634 )
2025-03-30 22:21:03 +02:00
Georgi Gerganov
4663bd353c
metal : use constexpr in FA kernels + fix typedef ( #12659 )
...
* metal : use constexpr in FA kernels
ggml-ci
* cont
ggml-ci
* cont : fix typedef
ggml-ci
2025-03-30 22:04:04 +03:00
Juyoung Suk
b3de7cac73
llama : add Trillion 7B model support ( #12556 )
...
* Support Trillion 7B
* Update llama.h
* Update llama.h
* Update llama-vocab.cpp for Trillion
* Update llama-vocab.cpp
2025-03-30 20:38:33 +02:00
Sergei Vorobyov
7242dd9675
llama-chat : Add Yandex instruct model template support ( #12621 )
...
* add yandex template
* update yandex chat template
* fix tests
* adjust chat template
* fix style
* fix tool macro in template
* add clarify comment
---------
Co-authored-by: Sergei Vorobev <serv01@yandex-team.ru>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-03-30 20:12:03 +02:00
Concedo
77bd24a0b1
updated lite
2025-03-30 21:35:40 +08:00
Concedo
621aa7c825
fixed clblast. but this part might not actually be helpful speed wise
2025-03-30 21:27:52 +08:00
Concedo
e1d3c19673
clblast not working correctly
2025-03-30 21:02:30 +08:00
Concedo
e6337ff957
Merge commit ' e408d4351a
' into concedo_experimental
...
# Conflicts:
# ggml/CMakeLists.txt
2025-03-30 18:26:02 +08:00
Concedo
ce05aa722d
Merge commit ' 0bb2919335
' into concedo_experimental
...
# Conflicts:
# ggml/src/CMakeLists.txt
# src/llama-model.cpp
2025-03-30 18:18:20 +08:00
Concedo
61a73347c6
fixed mrope for multiple images in qwen2vl (+1 squashed commits)
...
Squashed commits:
[63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits)
Squashed commits:
[bb78db1e] wip fixing mrope
2025-03-30 17:23:58 +08:00