Georgi Gerganov
8551c44d84
context : always use non-causal attention for encoder graphs ( #12447 )
...
* context : always use non-causal attention for encoder graphs
ggml-ci
* context : move the change to llama_context::encode()
ggml-ci
2025-03-18 13:05:49 +02:00
Łukasz Ślusarczyk
35cae5ba05
SYCL: using graphs is configurable by environment variable and compile option ( #12371 )
...
* alberto changes
* enable sycl graphs by env variable
* fixed compilation warnings in ggml-sycl.cpp
* renamed graph variables
* fix markdown in docs/backend/SYCL.md
Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
* fix markdown in docs/backend/SYCL.md again
* compiling graphs by default, renamed graph_enable to graph_disable
---------
Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
2025-03-18 11:16:31 +01:00
Georgi Gerganov
810e0af3f5
server : fix warmup draft cache type ( #12446 )
...
ggml-ci
2025-03-18 12:05:42 +02:00
Prajwal B Mehendarkar
eba92d64c3
cmake : fix PowerPC build ( #12241 )
...
Closes #12240
2025-03-18 11:37:33 +02:00
fj-y-saito
d9a14523bb
ggml : add SVE support for q6_K_q8_K ( #12361 )
2025-03-18 10:14:39 +02:00
0cc4m
fd123cfead
Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues ( #12434 )
2025-03-18 07:21:40 +01:00
Łukasz Ślusarczyk
a53f7f7b88
fixed compilation warnings in ggml-sycl ( #12424 )
2025-03-18 08:51:25 +08:00
Molly Sophia
7dfad387e3
llama: Add support for RWKV v7 architecture ( #12412 )
...
* ggml: Add op l2_norm
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* ggml: Add op rwkv_wkv7
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* llama: Add support for RWKV7 and ARWKV7 models
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* llama: fix inference with RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* llama: add more (a)rwkv7 variants in size
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Apply code-format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* fix MUSA build
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* llama: fix shape error with rwkv using llama-parallel
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-03-18 07:27:50 +08:00
Sigbjørn Skjæret
60c902926c
docs : bring llama-cli conversation/template docs up-to-date ( #12426 )
2025-03-17 21:14:32 +01:00
Gaurav Garg
b1b132efcb
cuda : enable CUDA Graph on CUDA Toolkit < 12.x ( #12394 )
...
* Enable CUDA Graph on CTK < 12.x
`cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x.
* Fix compilation errors with MUSA
* Disable CUDA Graph for MUSA
2025-03-17 20:25:13 +02:00
Guus Waals
01e8f2138b
ggml-vulkan: remove unused find_program(glslc) ( #12416 )
...
It's already found by FindVulkan.cmake in the parent CMakeLists
2025-03-17 13:35:43 -03:00
Jeff Bolz
484a8ab513
vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader ( #12312 )
2025-03-17 09:26:18 -05:00
Concedo
ddaa8d5a38
fixed saving path for savedata
2025-03-17 22:19:52 +08:00
Concedo
0cfd8d23cb
handle symlinks (+1 squashed commits)
...
Squashed commits:
[fb8477b9] fixed makefile (+4 squashed commit)
Squashed commit:
[4a245bba] fixed a makefile issue
[d68eba69] alias usehipblas to usecublas
[a9ab0a7c] dynamic rocwmma selection
[fefe17c7] revert rocwmma
2025-03-17 21:03:30 +08:00
Daniele
cf2270e4d3
vulkan: subgroup size tuning ( #12087 )
...
* vulkan: subgroup size test
* Vulkan: Add device architecture enum and logic to recognize AMD generations
* vulkan: use new architecture logic to specify subgroup size
* Initial vulkan subgroup size tuning for RDNA3
* vulkan: commonize RDNA subgroup tuning
* vulkan: override subgroup size if required_subgroup_size = 0
* vulkan: disable warp 32 for RDNA3
* vulkan: fine tuned RDNA1 subgroup sizes
* vulkan: adjusted subgroup size map
* vulkan: fixed RDNA2 subgroup map
---------
Co-authored-by: 0cc4m <picard12@live.de>
2025-03-17 12:42:33 +01:00
Jeff Bolz
f07690c930
vulkan: use fp32 in coopmat2 q4_k dequant function ( #12309 )
2025-03-17 10:43:35 +01:00
Jeff Bolz
891c63956d
vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking ( #12273 )
...
* vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking
2025-03-17 10:41:59 +01:00
Jeff Bolz
2f21123c1d
vulkan: Adjust coopmat2 tile sizes and selection heuristic ( #12258 )
2025-03-17 10:35:00 +01:00
Christian Kastner
374101fd74
cmake : enable building llama.cpp using system libggml ( #12321 )
...
* cmake: Factor out compiler flag function from ggml
llama.cpps's build requires it, too, and we may want to make use of it
without add_subdirectory(ggml).
* cmake: Enable building against system ggml
This facilitates package maintenance for Linux distributions, where the
libggml library most likely will be shipped as an individual package
upon which a llama.cpp package depends.
2025-03-17 11:05:23 +02:00
Akarshan Biswas
b3c9a65673
SYCL: set extras only on GGML_TYPE_Q4_0 ( #12366 )
...
* SYCL: set extras only on GGML_TYPE_Q4_0
* release tensor_extras in reset buffer interface
2025-03-17 09:45:12 +08:00
Sigbjørn Skjæret
8ba95dca20
llama : fix OLMo-2-0325-32B-Instruct K-norm size ( #12400 )
2025-03-16 19:46:36 +02:00
Georgi Gerganov
dc079cfdff
context : fix init of n_outputs ( #12397 )
...
ggml-ci
2025-03-16 19:29:36 +02:00
Daniel Bevenius
7b61bcc87c
ci : add --symlinks to xcframework zip command ( #12409 )
...
This commit adds the --symlinks option to the zip command used to create
the xcframework zip file. This is necessary to create symlinks in the
zip file. Without this option, the Versions symlink is stored as a
regular directory entry in the zip file, rather than as a symlink in the
zip which causes the followig error in xcode:
```console
Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22)
```
Refs: https://github.com/ggml-org/llama.cpp/pull/11996#issuecomment-2727026377
2025-03-16 18:22:05 +01:00
Concedo
131107dc91
lite fix admin button display issue when preload story
2025-03-17 00:17:31 +08:00
Concedo
6888f5495d
allow quantkv with contextshift
2025-03-16 21:48:42 +08:00
Concedo
e466ce65e2
updated sd metadata
2025-03-16 20:12:43 +08:00
Concedo
8708403ee9
revert clean
2025-03-16 17:53:35 +08:00
Concedo
5ef1722d5f
fix for sd
2025-03-16 17:02:42 +08:00
Concedo
0954e9e476
improve model estimation
2025-03-16 16:14:13 +08:00
Concedo
5d7c5e9e33
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/tts/tts.cpp
2025-03-16 15:42:39 +08:00
Concedo
2401502cbd
improvement to tool calling, allowing specific tools to be used
2025-03-16 15:20:08 +08:00
Concedo
9f7fd63160
revert unwanted change to tool calling
2025-03-16 01:35:48 +08:00
marcoStocchi
f4c3dd5daa
llama-tts : add '-o' option ( #12398 )
...
* added -o option to specify an output file name
* llama-tts returns ENOENT in case of file write error
note : PR #12042 is closed as superseded with this one.
2025-03-15 17:23:11 +01:00
Concedo
98eade358a
more rocm include dir
2025-03-15 23:29:00 +08:00
aubreyli
3d35d87b41
SYCL: Delete redundant plus sign and space ( #12391 )
2025-03-15 15:49:03 +01:00
fairydreaming
b19bd064c0
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) ( #12399 )
...
* sycl : support non-contiguous tensors in binary ops
* sycl : silence unused variable warning
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-15 22:19:30 +08:00
Concedo
67851e5415
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/run/run.cpp
# ggml/src/ggml-cann/aclnn_ops.cpp
2025-03-15 19:54:19 +08:00
Concedo
e84596ec1a
add config for default gen tokens and bos toggle
2025-03-15 19:53:06 +08:00
Concedo
bfc30066c9
fixed a clip processing bug
2025-03-15 17:49:49 +08:00
Concedo
7272165e0e
verbosity
2025-03-15 12:13:04 +08:00
Concedo
4212f0b8e8
wip on multiple fixes
2025-03-15 10:50:36 +08:00
Chenguang Li
92a391327e
[CANN]MUL_MAT optimization ( #12382 )
2025-03-15 09:31:08 +08:00
Eric Curtin
9f2250ba72
Add CLI arg to llama-run to adjust the number of threads used ( #12370 )
...
We default to 4, sometimes we want to manually adjust this
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-03-14 16:41:20 +00:00
Sigbjørn Skjæret
774973b8f3
main : add -sysf / --system-prompt-file ( #12249 ) ( #12250 )
...
* add system_prompt_file
* add -sysf / --system-prompt-file
* remove system_prompt_file
2025-03-14 16:57:05 +01:00
Concedo
4a29e216e7
edit readme
2025-03-14 21:06:55 +08:00
fairydreaming
8fcb563613
Load all MoE experts during warmup ( #11571 )
...
* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup
* common : use new API to enable warmup mode during model warmup
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-14 13:47:05 +01:00
Concedo
d7498e7e8a
added model switching to gguf in admin mode (auto guess layers)
2025-03-14 19:45:55 +08:00
Concedo
30cb77a900
rename replace_instruct_placeholders field
2025-03-14 18:37:12 +08:00
Concedo
be3bba67ff
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# src/llama-model.cpp
2025-03-14 18:25:21 +08:00
Victor
add2a3aa5a
server: fix "--grammar-file" parameter ( #12285 )
2025-03-14 11:21:17 +01:00