0cc4m
c9c64dee57
Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output ( #13639 )
2025-05-20 10:11:56 +02:00
Georgi Gerganov
c00a2634be
metal : fix typo in FA kernel comments ( #13651 )
2025-05-20 10:41:40 +03:00
Georgi Gerganov
e298d2fbd0
kv-cache : add SWA support ( #13194 )
...
* kv-cache : prepare for SWA
ggml-ci
* kv-cache : initial iSWA implementation
ggml-ci
* kv-cache : rework error recovery logic
ggml-ci
* models : fix Phi-3 SWA parameters
ggml-ci
* model : adjust Granite to rope factor changes
ggml-ci
* server : check if context can do shifts
ggml-ci
* iswa : for now, always enable shifts (experiment)
ggml-ci
* kv-cache : simplify SWA logic
ggml-ci
* kv-cache : apply defrag when we fail to find slots for the batch
ggml-ci
* llama : update docs about llama_decode
ggml-ci
* kv-cache : update warning logs when no space for the batch is available
ggml-ci
* llama : add llama_kv_self_seq_pos_min()
* kv-cache : keep track of partial SWA computes and print warnings
* server : disallow use cases involving partial SWA context
ggml-ci
* llama : add param to control SWA cache size
ggml-ci
* minor : clean-up
ggml-ci
2025-05-20 08:05:46 +03:00
Xinpeng Dou
f0adb80bf7
CANN: Update CANN model support ( #13162 )
...
* Update CANN model support status
* Update of model support
* update
* update
* update
* fix format of CANN.md
* fix format of CANN.md
* fix format of CANN.md
2025-05-20 11:43:43 +08:00
Nicolò Scipione
f7c9429c85
sycl : Overcoming workaround for mmap() allocation on Windows ( #13482 )
...
* Remove mmap workaround on windows
After some testing I found that mmap is supported on windows and for
many GPUs on Linux. Therefore I remove the workaround for windows since
it is not necessary.
* Update llama-bench README
SYCL backend introduced a workaround that allows execution of
llama-bench also without specifying `--mmp 0` flag
2025-05-20 08:54:43 +08:00
psocolovsky
1dfbf2cf3a
common : add load_progress_callback ( #13617 )
2025-05-19 21:17:36 +02:00
Concedo
5a499a5d2e
updated ltie, fixed multi clip skip and seeds not incrementing (+2 squashed commit)
...
Squashed commit:
[a9328e29a] fixed multi clip skip and seeds not incrementing
[cad3aa9db] streamline some debug outputs
2025-05-19 23:59:58 +08:00
0cc4m
8960efd0a6
Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence ( #13607 )
2025-05-19 17:54:08 +02:00
Concedo
5f4923bf24
backend tag replacement for endtags. view results with debug mode.
2025-05-19 23:14:43 +08:00
Alberto Cabrera Pérez
725f23f1f3
sycl : backend documentation review ( #13544 )
...
* sycl: reviewing and updating docs
* Updates Runtime error codes
* Improves OOM troubleshooting entry
* Added a llama 3 sample
* Updated supported models
* Updated releases table
2025-05-19 14:38:20 +01:00
Xuan-Son Nguyen
92ecdcc06a
mtmd : add vision support for llama 4 ( #13282 )
...
* wip llama 4 conversion
* rm redundant __init__
* fix conversion
* fix conversion
* test impl
* try this
* reshape patch_embeddings_0
* fix view
* rm ffn_post_norm
* cgraph ok
* f32 for pos embd
* add image marker tokens
* Llama4UnfoldConvolution
* correct pixel shuffle
* fix merge conflicts
* correct
* add debug_graph
* logits matched, but it still preceives the image incorrectly
* fix style
* add image_grid_pinpoints
* handle llama 4 preprocessing
* rm load_image_size
* rm unused line
* fix
* small fix 2
* add test & docs
* fix llava-1.6 test
* test: add notion of huge models
* add comment
* add warn about degraded quality
2025-05-19 13:04:14 +02:00
Alberto Cabrera Pérez
f71f40a284
ci : upgraded oneAPI version in SYCL workflows and dockerfile ( #13532 )
2025-05-19 11:46:09 +01:00
Georgi Gerganov
d30cb5a7fa
sync : ggml
...
ggml-ci
2025-05-19 13:29:56 +03:00
Johannes Gäßler
6c35981a64
mnist: fix segmentation fault (ggml/1227)
2025-05-19 13:29:56 +03:00
Diego Devesa
8b5e19aea6
ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)
2025-05-19 13:29:56 +03:00
Daniel Tang
60aea028b5
ggml : Fix missing backtrace on Linux (ggml/1228)
...
* Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1
* Fixed lldb attach
* Simplify by having the child do ggml_print_backtrace_symbols
2025-05-19 13:29:56 +03:00
Nick
9c55e5c5c2
fix: check model pointer validity before use ( #13631 )
2025-05-19 13:25:41 +03:00
Concedo
710c747b60
minor noscript edit
2025-05-19 17:51:44 +08:00
Chenguang Li
33d7aed4a8
CANN: Support MOE Model MUL_MAT_ID ( #13042 )
...
Signed-off-by: noemotiovon <757486878@qq.com>
2025-05-19 14:21:17 +08:00
Concedo
59300dbdf5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/actions/windows-setup-curl/action.yml
# .github/workflows/build-linux-cross.yml
# README.md
# common/CMakeLists.txt
# examples/parallel/README.md
# examples/parallel/parallel.cpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# tools/server/README.md
2025-05-18 23:27:53 +08:00
Isaac McFadyen
6a2bc8bfb7
server : added --no-prefill-assistant flag ( #13608 )
...
* added no-prefill-assistant flag
* reworded documentation comment
* updated server README.md
2025-05-17 23:59:48 +02:00
Gilad S.
e3a7cf6c5b
cmake: use the current build config for vulkan-shaders-gen ( #13595 )
...
* fix: use the current build config for `vulkan-shaders-gen`
* fix: only pass a valid build type to `--config`
2025-05-17 15:26:43 -03:00
Concedo
be3e93c76a
bundle AGPL license and llama.cpp's MIT license into binaries. clarified some licensing terms, updated readme (+1 squashed commits)
...
Squashed commits:
[61c152daf] bundle AGPL license and llama.cpp's MIT license into binaries. clarified some licensing terms, updated readme
2025-05-18 02:21:27 +08:00
Concedo
c546cb638e
disable showgui if skiplauncher is used
2025-05-18 01:42:14 +08:00
Georgi Gerganov
518329b2d4
parallel : add option for non-shared and larger prompts ( #13598 )
...
* parallel : add option for non-shared and larger prompts
* parallel : update readme [no ci]
* cont : add note about base models [no ci]
* parallel : better var name
ggml-ci
2025-05-17 12:58:55 +03:00
Jeff Bolz
2f5a4e1e09
vulkan: move common FA code to flash_attn_base.comp ( #13556 )
...
* vulkan: move common FA code to flash_attn_base.comp
* vulkan: move common FA index/stride setup code to flash_attn_base.comp
* build fix
2025-05-17 09:14:55 +02:00
Jeff Bolz
4f41ee11d6
vulkan: use scalar FA rather than coopmat2 when N==1 ( #13554 )
2025-05-17 08:35:47 +02:00
Z
3e0be1cace
llguidance : official v0.7.20 release (no actual changes) [noci] ( #13594 )
2025-05-16 22:56:28 +02:00
Xuan-Son Nguyen
6aa892ec2a
server : do not return error out of context (with ctx shift disabled) ( #13577 )
2025-05-16 21:50:00 +02:00
Xuan-Son Nguyen
aea9f8b4e7
webui : improve accessibility for visually impaired people ( #13551 )
...
* webui : improve accessibility for visually impaired people
* add a11y for extra contents
* fix some labels being read twice
* add skip to main content
2025-05-16 21:49:01 +02:00
Xuan-Son Nguyen
06c1e4abc1
readme : add list of dependencies and their license ( #13591 )
2025-05-16 20:04:18 +02:00
Diego Devesa
415e40a357
releases : use arm version of curl for arm releases ( #13592 )
2025-05-16 19:36:51 +02:00
Georgi Gerganov
654a67794f
metal : add FA-vec kernel for head size 64 ( #13583 )
...
ggml-ci
2025-05-16 20:32:58 +03:00
Concedo
ca4274e384
added size info into HF searcher
2025-05-17 00:31:54 +08:00
Diego Devesa
5364ae4ba5
llama : print hint when loading a model when no backends are loaded ( #13589 )
2025-05-16 16:38:07 +02:00
Sigbjørn Skjæret
7c07ac244d
ci : add ppc64el to build-linux-cross ( #13575 )
2025-05-16 14:54:23 +02:00
Łukasz Ślusarczyk
0a338ed013
sycl : fixed compilation warnings ( #13582 )
2025-05-16 18:15:29 +08:00
Concedo
e5d26a2356
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/CMakeLists.txt
# docs/backend/SYCL.md
# ggml/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/binbcast.cpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/dmmv.cpp
# ggml/src/ggml-sycl/gemm.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# ggml/src/gguf.cpp
# scripts/compare-llama-bench.py
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2025-05-16 15:30:31 +08:00
Concedo
6cafc0e73e
Merge commit ' 71bdbdb587
' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cpu/CMakeLists.txt
# tools/batched-bench/batched-bench.cpp
# tools/mtmd/clip.h
2025-05-16 15:25:15 +08:00
Concedo
317a7ab14a
updated lite
2025-05-16 14:49:23 +08:00
Olivier Chafik
bc098c3cf0
minja: sync (qwen3) ( #13573 )
...
* minja: sync f06140fa52
- https://github.com/google/minja/pull/67 (@grf53)
- https://github.com/google/minja/pull/66 (@taha-yassine)
- https://github.com/google/minja/pull/63 (@grf53)
- https://github.com/google/minja/pull/58
---------
Co-authored-by: ochafik <ochafik@google.com>
2025-05-15 23:29:10 +01:00
Diego Devesa
c6a2c9e741
gguf : use ggml log system ( #13571 )
...
* gguf : use ggml log system
* llama : remove unnecessary new lines in exception messages
2025-05-15 19:13:11 +02:00
Daniel Tang
07ad2b6db3
gguf-py : fix disconnect-before-connect in editor-gui ( #13569 )
...
The bug caused a crash upon load with venvs created with
--system-site-packages to use
python3-pyside6.qtwidgets=python3-pyside6.qtwidgets=6.6.2-4
from Kubuntu 24.10.
2025-05-15 18:47:10 +02:00
Concedo
12e6928ec2
i'm gonna regret this, aren't i?
2025-05-15 23:59:55 +08:00
Xuan-Son Nguyen
c531edfa34
convert : fix conversion for llama 4 ( #13567 )
2025-05-15 17:40:07 +02:00
Atharva Dubey
02cdd2d8b0
sycl: simplify bin_bcast_kernel ( #13383 )
2025-05-15 17:39:52 +02:00
Svetlozar Georgiev
64bb51cf90
sycl: reordered Q4_K MMVQ ( #13109 )
2025-05-15 17:35:44 +02:00
Concedo
7a76e237b8
fixed clip quantize again
2025-05-15 23:22:12 +08:00
Łukasz Ślusarczyk
9c404ed54c
sycl: use oneDNN for matrices multiplication ( #12972 )
2025-05-15 16:53:41 +02:00
Diego Devesa
6c8b91500e
llama-bench : fix -ot with dl backends ( #13563 )
2025-05-15 15:46:55 +02:00