Concedo
5cf5f35540
added vulkan build target for main.exe
2025-05-11 21:53:08 +08:00
Concedo
1eb6d25010
truncate middle instead of end for long strings
2025-05-11 20:26:17 +08:00
Concedo
f841b29c41
fixed unicode paths
2025-05-11 14:05:54 +08:00
Concedo
75ec0ba279
fixed lite tags
2025-05-10 21:31:18 +08:00
Concedo
48c3682c2c
improve search
2025-05-10 19:25:26 +08:00
Concedo
50e1064ffe
better passthrough handling
2025-05-10 19:11:09 +08:00
Georgi Gerganov
a62e1dfea1
metal : optimize MoE for large batches ( #13388 )
...
ggml-ci
(cherry picked from commit 611aa914ef )
2025-05-10 00:33:51 +08:00
Concedo
6bb44391bd
Merge commit ' 5c86c9ed3e' into concedo_experimental
...
# Conflicts:
# tools/imatrix/imatrix.cpp
# tools/mtmd/README.md
# tools/run/README.md
# tools/run/run.cpp
2025-05-10 00:30:18 +08:00
Concedo
702e8a63df
updated lite
2025-05-10 00:23:24 +08:00
Concedo
ea2e5ed1e9
mmq debug log
2025-05-09 18:30:11 +08:00
Johannes Gäßler
5c86c9ed3e
CUDA: fix crash on large batch size for MoE models ( #13384 )
2025-05-09 12:14:04 +02:00
Concedo
46849e80fb
updated lite
2025-05-09 18:11:27 +08:00
Bartowski
efb8b47eda
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation ( #13389 )
...
* Add --parse-special for enabling parsing of special tokens in imatrix calculation
* whitespace
2025-05-09 11:53:58 +02:00
Concedo
c4a0b323f0
remove fa restrictions for vulkan
2025-05-09 17:34:14 +08:00
R0CKSTAR
0527771dd8
llama-run: add support for downloading models from ModelScope ( #13370 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-05-09 10:25:50 +01:00
Concedo
0874cd231a
Merge remote-tracking branch 'jeffbolz/scalar_fa_3' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
2025-05-09 17:19:33 +08:00
Concedo
42f6930e13
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-rpc/ggml-rpc.cpp
2025-05-09 17:18:14 +08:00
Xuan-Son Nguyen
2189fd3b63
mtmd : fix batch_view for m-rope ( #13397 )
...
* mtmd : fix batch_view for m-rope
* nits : fix comment
2025-05-09 11:18:02 +02:00
Xuan-Son Nguyen
3f96aeff39
llama : one-off chat template fix for Mistral-Small-2503 ( #13398 )
...
* llama : one-off chat template fix for Mistral-Small-2503
* update readme
* add mistral-v7-tekken
2025-05-09 11:17:51 +02:00
Radoslav Gerganov
b486ba05bf
rpc : add rpc_msg_set_tensor_hash_req ( #13353 )
...
* rpc : add rpc_msg_set_tensor_hash_req
Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which
makes the code cleaner.
* fix
2025-05-09 10:31:07 +03:00
Jeff Bolz
02115dcd9a
vulkan: Allow up to 4096 elements for mul_mat_id row_ids ( #13326 )
...
This assert fired running Qwen_Qwen3-30B-A3B-Q2_K.gguf:
GGML_ASSERT(nei0 * nei1 <= 3072);
The tensor is 8 x 512. Increase this array size to accommodate.
2025-05-09 09:23:41 +02:00
Xuan-Son Nguyen
d9c4accaff
server : (webui) rename has_multimodal --> modalities ( #13393 )
...
* server : (webui) rename has_multimodal --> modalities
* allow converting SVG to PNG
* less complicated code
2025-05-09 09:06:37 +02:00
Concedo
2f5f4ee65a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# common/CMakeLists.txt
2025-05-09 14:18:20 +08:00
Diego Devesa
15e03282bb
ci : limit write permission to only the release step + fixes ( #13392 )
...
* ci : limit write permission to only the release step
* fix win cuda file name
* fix license file copy on multi-config generators
2025-05-08 23:45:22 +02:00
Jeff Bolz
20a6246f29
vulkan: avoid using Float16 capability in scalar FA
2025-05-08 14:55:52 -05:00
Jeff Bolz
615958f42c
vulkan: for scalar FA, select between 1 and 8 rows
2025-05-08 14:34:59 -05:00
Matt Clayton
f05a6d71a0
mtmd : Expose helper_decode_image_chunk ( #13366 )
...
* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free
* Slim down
* Cleanups
2025-05-08 20:25:39 +02:00
Xuan-Son Nguyen
ee01d71e58
server : (webui) fix a very small misalignment ( #13387 )
...
* server : (webui) fix a very small misalignment
* restore font-bold
2025-05-08 18:51:45 +02:00
Concedo
af857b1813
updated lite
2025-05-09 00:30:21 +08:00
Concedo
2439014a03
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# examples/embedding/embedding.cpp
# tools/imatrix/imatrix.cpp
# tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00
Concedo
b6220669f4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# Makefile
# examples/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
2025-05-08 23:07:33 +08:00
Jeff Bolz
00784e3d34
CI: increase timeout to accommodate newly-supported tests
2025-05-08 09:04:47 -05:00
Xuan-Son Nguyen
8c83449cb7
server : (webui) revamp the input area, plus many small UI improvements ( #13365 )
...
* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log
2025-05-08 15:37:29 +02:00
Sigbjørn Skjæret
1a844be132
convert : support rope_scaling type and rope_type ( #13349 )
2025-05-08 15:34:29 +02:00
welix
0ccc121354
mtmd : fix the calculation of n_tokens for smolvlm ( #13381 )
...
Co-authored-by: Taichi Nishimura <Taichi.A.Nishimura@sony.com>
2025-05-08 15:03:53 +02:00
Georgi Gerganov
6562e5a4d6
context : allow cache-less context for embeddings ( #13108 )
...
* context : allow cache-less context for embeddings
ggml-ci
* context : enable reranking with encode()
ggml-ci
* context : encode() clears embd_seq
ggml-ci
* examples : use llama_encode() when appropriate
ggml-ci
* models : nomic bert moe does not require KV cache
* llama : update comments for llama_decode/llama_encode
ggml-ci
* context : update warning log [no ci]
2025-05-08 14:28:33 +03:00
Georgi Gerganov
51fb96b1ff
context : remove logits_all flag ( #13284 )
...
* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci
2025-05-08 14:26:50 +03:00
Diego Devesa
70a6991edf
ci : move release workflow to a separate file ( #13362 )
2025-05-08 13:15:28 +02:00
Diego Devesa
f061021206
llama : print size and type of overridden tensors ( #13364 )
2025-05-08 13:15:15 +02:00
Alberto Cabrera Pérez
8733e0cf6e
sycl: addressing non-contiguous src1 mul_mats (nc and batched) ( #13343 )
...
* sycl: fixed non-contiguous src1 mul_mats (nc and batched)
* Fixed wrong static_cast inside kernel
2025-05-08 10:08:01 +01:00
Jeff Bolz
e66094276b
vulkan: support q4_0/q8_0 KV in scalar FA
2025-05-07 23:53:38 -05:00
Jeff Bolz
989bfb18fc
vulkan: load each Q value once. optimize O reduction. more tuning
2025-05-07 15:57:38 -05:00
Jeff Bolz
c747227a57
vulkan: reduce register usage in scalar FA, but perf may be slightly worse
2025-05-07 15:02:11 -05:00
Jeff Bolz
a6c940bb79
vulkan: remove PV matrix, helps with register usage
2025-05-07 13:46:35 -05:00
Jeff Bolz
876e6617a7
vulkan: use vector loads in scalar flash attention shader
2025-05-07 13:35:13 -05:00
Concedo
7c5d47f688
multigpu warning only once
2025-05-08 00:55:09 +08:00
Diego Devesa
814f795e06
docker : disable arm64 and intel images ( #13356 )
2025-05-07 16:36:33 +02:00
Georgi Gerganov
d879433824
sync : ggml
...
ggml-ci
2025-05-07 17:28:36 +03:00
Daniel Bevenius
13b0a04597
whisper: remove MSVC warnings pragmas (whisper/3090)
...
* ggml : remove MSVC warnings pragmas
This commit removes the MSVC-specific pragmas as these are now handled
in ggml/CMakeLists.txt.
* whisper : remove MSVC warning pragmas
This commit removes the MSVC-specific pragmas. These are now handled in
the ggml/CMakeLists.txt file.
2025-05-07 17:28:36 +03:00
Jared Tweed
bba9d945c1
cmake : removed stdc++fs (whisper/3097)
...
* removed stdc++fs
* kept line, but removed stdc++fs
2025-05-07 17:28:36 +03:00