Xuan-Son Nguyen
d9c4accaff
server : (webui) rename has_multimodal --> modalities ( #13393 )
...
* server : (webui) rename has_multimodal --> modalities
* allow converting SVG to PNG
* less complicated code
2025-05-09 09:06:37 +02:00
Concedo
2f5f4ee65a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# common/CMakeLists.txt
2025-05-09 14:18:20 +08:00
Diego Devesa
15e03282bb
ci : limit write permission to only the release step + fixes ( #13392 )
...
* ci : limit write permission to only the release step
* fix win cuda file name
* fix license file copy on multi-config generators
2025-05-08 23:45:22 +02:00
Jeff Bolz
20a6246f29
vulkan: avoid using Float16 capability in scalar FA
2025-05-08 14:55:52 -05:00
Jeff Bolz
615958f42c
vulkan: for scalar FA, select between 1 and 8 rows
2025-05-08 14:34:59 -05:00
Matt Clayton
f05a6d71a0
mtmd : Expose helper_decode_image_chunk ( #13366 )
...
* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free
* Slim down
* Cleanups
2025-05-08 20:25:39 +02:00
Xuan-Son Nguyen
ee01d71e58
server : (webui) fix a very small misalignment ( #13387 )
...
* server : (webui) fix a very small misalignment
* restore font-bold
2025-05-08 18:51:45 +02:00
Concedo
af857b1813
updated lite
2025-05-09 00:30:21 +08:00
Concedo
2439014a03
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# examples/embedding/embedding.cpp
# tools/imatrix/imatrix.cpp
# tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00
Concedo
b6220669f4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# Makefile
# examples/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
2025-05-08 23:07:33 +08:00
Jeff Bolz
00784e3d34
CI: increase timeout to accommodate newly-supported tests
2025-05-08 09:04:47 -05:00
Xuan-Son Nguyen
8c83449cb7
server : (webui) revamp the input area, plus many small UI improvements ( #13365 )
...
* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log
2025-05-08 15:37:29 +02:00
Sigbjørn Skjæret
1a844be132
convert : support rope_scaling type and rope_type ( #13349 )
2025-05-08 15:34:29 +02:00
welix
0ccc121354
mtmd : fix the calculation of n_tokens for smolvlm ( #13381 )
...
Co-authored-by: Taichi Nishimura <Taichi.A.Nishimura@sony.com>
2025-05-08 15:03:53 +02:00
Georgi Gerganov
6562e5a4d6
context : allow cache-less context for embeddings ( #13108 )
...
* context : allow cache-less context for embeddings
ggml-ci
* context : enable reranking with encode()
ggml-ci
* context : encode() clears embd_seq
ggml-ci
* examples : use llama_encode() when appropriate
ggml-ci
* models : nomic bert moe does not require KV cache
* llama : update comments for llama_decode/llama_encode
ggml-ci
* context : update warning log [no ci]
2025-05-08 14:28:33 +03:00
Georgi Gerganov
51fb96b1ff
context : remove logits_all flag ( #13284 )
...
* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci
2025-05-08 14:26:50 +03:00
Diego Devesa
70a6991edf
ci : move release workflow to a separate file ( #13362 )
2025-05-08 13:15:28 +02:00
Diego Devesa
f061021206
llama : print size and type of overridden tensors ( #13364 )
2025-05-08 13:15:15 +02:00
Alberto Cabrera Pérez
8733e0cf6e
sycl: addressing non-contiguous src1 mul_mats (nc and batched) ( #13343 )
...
* sycl: fixed non-contiguous src1 mul_mats (nc and batched)
* Fixed wrong static_cast inside kernel
2025-05-08 10:08:01 +01:00
Jeff Bolz
e66094276b
vulkan: support q4_0/q8_0 KV in scalar FA
2025-05-07 23:53:38 -05:00
Jeff Bolz
989bfb18fc
vulkan: load each Q value once. optimize O reduction. more tuning
2025-05-07 15:57:38 -05:00
Jeff Bolz
c747227a57
vulkan: reduce register usage in scalar FA, but perf may be slightly worse
2025-05-07 15:02:11 -05:00
Jeff Bolz
a6c940bb79
vulkan: remove PV matrix, helps with register usage
2025-05-07 13:46:35 -05:00
Jeff Bolz
876e6617a7
vulkan: use vector loads in scalar flash attention shader
2025-05-07 13:35:13 -05:00
Concedo
7c5d47f688
multigpu warning only once
2025-05-08 00:55:09 +08:00
Diego Devesa
814f795e06
docker : disable arm64 and intel images ( #13356 )
2025-05-07 16:36:33 +02:00
Georgi Gerganov
d879433824
sync : ggml
...
ggml-ci
2025-05-07 17:28:36 +03:00
Daniel Bevenius
13b0a04597
whisper: remove MSVC warnings pragmas (whisper/3090)
...
* ggml : remove MSVC warnings pragmas
This commit removes the MSVC-specific pragmas as these are now handled
in ggml/CMakeLists.txt.
* whisper : remove MSVC warning pragmas
This commit removes the MSVC-specific pragmas. These are now handled in
the ggml/CMakeLists.txt file.
2025-05-07 17:28:36 +03:00
Jared Tweed
bba9d945c1
cmake : removed stdc++fs (whisper/3097)
...
* removed stdc++fs
* kept line, but removed stdc++fs
2025-05-07 17:28:36 +03:00
Concedo
38b3bffcef
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakePresets.json
# ggml/src/ggml-cuda/CMakeLists.txt
# tests/test-sampling.cpp
# tools/mtmd/clip.cpp
2025-05-07 19:47:44 +08:00
Sigbjørn Skjæret
bc4e1128f7
llama : deci : support ffn-free with attention ( #13296 )
2025-05-07 12:49:27 +02:00
Concedo
4e97b69657
updated lite
2025-05-07 18:43:35 +08:00
Concedo
fa22c1a5a4
fixed cfg scale, but turns out it sucks. embedded aria2c into pyinstaller
2025-05-07 18:30:36 +08:00
Ycros
39e73ae0d6
common : Add a warning when we can't match samplers from a string or char. ( #13330 )
2025-05-07 11:23:28 +03:00
R0CKSTAR
1f73301b63
cuda : remove nrows_x in mul_mat_q_process_tile ( #13325 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-05-07 09:48:23 +02:00
Georgi Gerganov
4773d7a02f
examples : remove infill ( #13283 )
...
ggml-ci
2025-05-07 10:28:02 +03:00
piDack
6c7fd67b64
llama : support tie embedding for chatglm models ( #13328 )
2025-05-07 09:23:11 +02:00
Concedo
b951310ca5
tryout smaller binaries
2025-05-07 14:56:34 +08:00
Jeff Bolz
3a8d954e0c
vulkan: always use fp32 for scalar flash attention
2025-05-06 23:08:39 -05:00
Johannes Gäßler
141a908a59
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF ( #13135 )
2025-05-06 23:35:51 +02:00
Xuan-Son Nguyen
32916a4907
clip : refactor graph builder ( #13321 )
...
* mtmd : refactor graph builder
* fix qwen2vl
* clean up siglip cgraph
* pixtral migrated
* move minicpmv to a dedicated build function
* move max_feature_layer to build_llava
* use build_attn for minicpm resampler
* fix windows build
* add comment for batch_size
* also support tinygemma3 test model
* qwen2vl does not use RMS norm
* fix qwen2vl norm (2)
2025-05-06 22:40:24 +02:00
DocShotgun
ffc727203a
sampling : make top_n_sigma no-op at <=0 or a single candidate ( #13345 )
2025-05-06 22:36:24 +02:00
oobabooga
91a86a6f35
sampling : don't consider -infinity values in top_n_sigma ( #13344 )
2025-05-06 20:24:15 +02:00
Diego Devesa
f4ed10b69c
cmake : remove arm64 msvc presets ( #13342 )
2025-05-06 20:15:31 +02:00
Concedo
a5b6f372a3
cfg scale wip
2025-05-07 00:36:00 +08:00
Concedo
ffe23f0e93
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-sycl/ggml-sycl.cpp
# pyproject.toml
2025-05-06 23:39:45 +08:00
Concedo
0fa435b2a6
Merge commit ' 9b61acf060
' into concedo_experimental
...
# Conflicts:
# Makefile
# docs/multimodal/MobileVLM.md
# docs/multimodal/glmedge.md
# docs/multimodal/llava.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.5.md
# docs/multimodal/minicpmv2.6.md
# requirements/requirements-all.txt
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
# tools/mtmd/android/adb_run.sh
# tools/mtmd/android/build_64.sh
# tools/mtmd/clip-quantize-cli.cpp
2025-05-06 23:34:21 +08:00
Concedo
1377a93a73
Merge commit ' 5215b91e93
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# cmake/x64-windows-llvm.cmake
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/CMakeLists.txt
# tools/imatrix/imatrix.cpp
# tools/llava/clip.cpp
# tools/rpc/rpc-server.cpp
2025-05-06 23:15:04 +08:00
Concedo
38a8778f24
wip cfg scale
2025-05-06 23:06:25 +08:00
Akarshan Biswas
1e333d5bba
SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled ( #13254 )
...
* SYCL: Do not set tensor extras when reorder optimize is disabled
* SYCL: Disable reorder optimize by default
2025-05-06 20:27:06 +05:30