Johannes Gäßler
0cf6725e9f
CUDA: FA support for Deepseek (Ampere or newer) ( #13306 )
...
* CUDA: FA support for Deepseek (Ampere or newer)
* do loop unrolling via C++ template
2025-05-09 13:34:58 +02:00
Diego Devesa
27ebfcacba
llama : do not crash if there is no CPU backend ( #13395 )
...
* llama : do not crash if there is no CPU backend
* add checks to examples
2025-05-09 13:02:07 +02:00
Concedo
ea2e5ed1e9
mmq debug log
2025-05-09 18:30:11 +08:00
Johannes Gäßler
5c86c9ed3e
CUDA: fix crash on large batch size for MoE models ( #13384 )
2025-05-09 12:14:04 +02:00
Concedo
46849e80fb
updated lite
2025-05-09 18:11:27 +08:00
Bartowski
efb8b47eda
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation ( #13389 )
...
* Add --parse-special for enabling parsing of special tokens in imatrix calculation
* whitespace
2025-05-09 11:53:58 +02:00
Concedo
c4a0b323f0
remove fa restrictions for vulkan
2025-05-09 17:34:14 +08:00
R0CKSTAR
0527771dd8
llama-run: add support for downloading models from ModelScope ( #13370 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-05-09 10:25:50 +01:00
Concedo
0874cd231a
Merge remote-tracking branch 'jeffbolz/scalar_fa_3' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
2025-05-09 17:19:33 +08:00
Concedo
42f6930e13
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-rpc/ggml-rpc.cpp
2025-05-09 17:18:14 +08:00
Xuan-Son Nguyen
2189fd3b63
mtmd : fix batch_view for m-rope ( #13397 )
...
* mtmd : fix batch_view for m-rope
* nits : fix comment
2025-05-09 11:18:02 +02:00
Xuan-Son Nguyen
3f96aeff39
llama : one-off chat template fix for Mistral-Small-2503 ( #13398 )
...
* llama : one-off chat template fix for Mistral-Small-2503
* update readme
* add mistral-v7-tekken
2025-05-09 11:17:51 +02:00
Radoslav Gerganov
b486ba05bf
rpc : add rpc_msg_set_tensor_hash_req ( #13353 )
...
* rpc : add rpc_msg_set_tensor_hash_req
Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which
makes the code cleaner.
* fix
2025-05-09 10:31:07 +03:00
Jeff Bolz
02115dcd9a
vulkan: Allow up to 4096 elements for mul_mat_id row_ids ( #13326 )
...
This assert fired running Qwen_Qwen3-30B-A3B-Q2_K.gguf:
GGML_ASSERT(nei0 * nei1 <= 3072);
The tensor is 8 x 512. Increase this array size to accommodate.
2025-05-09 09:23:41 +02:00
Xuan-Son Nguyen
d9c4accaff
server : (webui) rename has_multimodal --> modalities ( #13393 )
...
* server : (webui) rename has_multimodal --> modalities
* allow converting SVG to PNG
* less complicated code
2025-05-09 09:06:37 +02:00
Concedo
2f5f4ee65a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# common/CMakeLists.txt
2025-05-09 14:18:20 +08:00
Diego Devesa
15e03282bb
ci : limit write permission to only the release step + fixes ( #13392 )
...
* ci : limit write permission to only the release step
* fix win cuda file name
* fix license file copy on multi-config generators
2025-05-08 23:45:22 +02:00
Jeff Bolz
20a6246f29
vulkan: avoid using Float16 capability in scalar FA
2025-05-08 14:55:52 -05:00
Jeff Bolz
615958f42c
vulkan: for scalar FA, select between 1 and 8 rows
2025-05-08 14:34:59 -05:00
Matt Clayton
f05a6d71a0
mtmd : Expose helper_decode_image_chunk ( #13366 )
...
* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free
* Slim down
* Cleanups
2025-05-08 20:25:39 +02:00
Xuan-Son Nguyen
ee01d71e58
server : (webui) fix a very small misalignment ( #13387 )
...
* server : (webui) fix a very small misalignment
* restore font-bold
2025-05-08 18:51:45 +02:00
Concedo
af857b1813
updated lite
2025-05-09 00:30:21 +08:00
Concedo
2439014a03
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# examples/embedding/embedding.cpp
# tools/imatrix/imatrix.cpp
# tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00
Concedo
b6220669f4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# Makefile
# examples/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
2025-05-08 23:07:33 +08:00
Jeff Bolz
00784e3d34
CI: increase timeout to accommodate newly-supported tests
2025-05-08 09:04:47 -05:00
Xuan-Son Nguyen
8c83449cb7
server : (webui) revamp the input area, plus many small UI improvements ( #13365 )
...
* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log
2025-05-08 15:37:29 +02:00
Sigbjørn Skjæret
1a844be132
convert : support rope_scaling type and rope_type ( #13349 )
2025-05-08 15:34:29 +02:00
welix
0ccc121354
mtmd : fix the calculation of n_tokens for smolvlm ( #13381 )
...
Co-authored-by: Taichi Nishimura <Taichi.A.Nishimura@sony.com>
2025-05-08 15:03:53 +02:00
Georgi Gerganov
6562e5a4d6
context : allow cache-less context for embeddings ( #13108 )
...
* context : allow cache-less context for embeddings
ggml-ci
* context : enable reranking with encode()
ggml-ci
* context : encode() clears embd_seq
ggml-ci
* examples : use llama_encode() when appropriate
ggml-ci
* models : nomic bert moe does not require KV cache
* llama : update comments for llama_decode/llama_encode
ggml-ci
* context : update warning log [no ci]
2025-05-08 14:28:33 +03:00
Georgi Gerganov
51fb96b1ff
context : remove logits_all flag ( #13284 )
...
* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci
2025-05-08 14:26:50 +03:00
Diego Devesa
70a6991edf
ci : move release workflow to a separate file ( #13362 )
2025-05-08 13:15:28 +02:00
Diego Devesa
f061021206
llama : print size and type of overridden tensors ( #13364 )
2025-05-08 13:15:15 +02:00
Alberto Cabrera Pérez
8733e0cf6e
sycl: addressing non-contiguous src1 mul_mats (nc and batched) ( #13343 )
...
* sycl: fixed non-contiguous src1 mul_mats (nc and batched)
* Fixed wrong static_cast inside kernel
2025-05-08 10:08:01 +01:00
Jeff Bolz
e66094276b
vulkan: support q4_0/q8_0 KV in scalar FA
2025-05-07 23:53:38 -05:00
Jeff Bolz
989bfb18fc
vulkan: load each Q value once. optimize O reduction. more tuning
2025-05-07 15:57:38 -05:00
Jeff Bolz
c747227a57
vulkan: reduce register usage in scalar FA, but perf may be slightly worse
2025-05-07 15:02:11 -05:00
Jeff Bolz
a6c940bb79
vulkan: remove PV matrix, helps with register usage
2025-05-07 13:46:35 -05:00
Jeff Bolz
876e6617a7
vulkan: use vector loads in scalar flash attention shader
2025-05-07 13:35:13 -05:00
Concedo
7c5d47f688
multigpu warning only once
2025-05-08 00:55:09 +08:00
Diego Devesa
814f795e06
docker : disable arm64 and intel images ( #13356 )
2025-05-07 16:36:33 +02:00
Georgi Gerganov
d879433824
sync : ggml
...
ggml-ci
2025-05-07 17:28:36 +03:00
Daniel Bevenius
13b0a04597
whisper: remove MSVC warnings pragmas (whisper/3090)
...
* ggml : remove MSVC warnings pragmas
This commit removes the MSVC-specific pragmas as these are now handled
in ggml/CMakeLists.txt.
* whisper : remove MSVC warning pragmas
This commit removes the MSVC-specific pragmas. These are now handled in
the ggml/CMakeLists.txt file.
2025-05-07 17:28:36 +03:00
Jared Tweed
bba9d945c1
cmake : removed stdc++fs (whisper/3097)
...
* removed stdc++fs
* kept line, but removed stdc++fs
2025-05-07 17:28:36 +03:00
Concedo
38b3bffcef
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakePresets.json
# ggml/src/ggml-cuda/CMakeLists.txt
# tests/test-sampling.cpp
# tools/mtmd/clip.cpp
2025-05-07 19:47:44 +08:00
Sigbjørn Skjæret
bc4e1128f7
llama : deci : support ffn-free with attention ( #13296 )
2025-05-07 12:49:27 +02:00
Concedo
4e97b69657
updated lite
2025-05-07 18:43:35 +08:00
Concedo
fa22c1a5a4
fixed cfg scale, but turns out it sucks. embedded aria2c into pyinstaller
2025-05-07 18:30:36 +08:00
Ycros
39e73ae0d6
common : Add a warning when we can't match samplers from a string or char. ( #13330 )
2025-05-07 11:23:28 +03:00
R0CKSTAR
1f73301b63
cuda : remove nrows_x in mul_mat_q_process_tile ( #13325 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-05-07 09:48:23 +02:00
Georgi Gerganov
4773d7a02f
examples : remove infill ( #13283 )
...
ggml-ci
2025-05-07 10:28:02 +03:00