LostRuins Concedo
a5cc934f04
cherry pick patch in https://github.com/leejet/stable-diffusion.cpp/pull/957 and https://github.com/leejet/stable-diffusion.cpp/pull/935
2025-11-10 23:05:38 +08:00
Georgi Gerganov
c27efd2bd1
metal : enable tensor API for A19 ( #17087 )
2025-11-10 15:38:42 +02:00
LostRuins Concedo
8c60a886e1
add jinja2 to build environments
2025-11-10 21:24:48 +08:00
fj-y-saito
df70bedda7
arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_… ( #15277 )
...
* add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K
* Surround SVE function with compiler directive
* fix compile switch
* fix coding style
* ggml : fix indent
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-10 15:12:59 +02:00
LostRuins Concedo
cdc18f0945
linting (+1 squashed commits)
...
Squashed commits:
[994427d3c ] linting
2025-11-10 20:54:44 +08:00
Georgi Gerganov
f914544b16
batched-bench : add "separate text gen" mode ( #17103 )
2025-11-10 12:59:29 +02:00
Xuan-Son Nguyen
4b13a684c5
mtmd: fix patch_size initialized to random value in audio models ( #17128 )
...
* mtmd: fix patch_size initialized to random value in audio models
* add default hparams
2025-11-10 11:41:05 +01:00
Georgi Gerganov
9898b57cbe
editorconfig : ignore benches/ ( #17140 )
...
[no ci]
2025-11-10 12:17:19 +02:00
Wagner Bruna
2ae6bff5bd
split memory detection functions and add debug command ( #1832 )
2025-11-10 18:07:15 +08:00
Acly
1032256ec9
cuda/vulkan : bicubic interpolation ( #17022 )
...
* vulkan : implement upscale with bicubic interpolation
* cuda : implement upscale with bicubic interpolation
* tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests
* adapt OpenCL backend to not support the OP in that case so tests don't fail
* print scale mode & flags in test-backend-ops
2025-11-10 10:19:39 +01:00
Georgi Gerganov
15274c0c50
benches : add eval results ( #17139 )
...
[no ci]
2025-11-10 10:44:10 +02:00
LostRuins Concedo
df6e303fd3
merge https://github.com/ggml-org/llama.cpp/pull/17128
2025-11-10 11:24:04 +08:00
LostRuins Concedo
d02cb1b117
Revert "fix divide by zero error"
...
This reverts commit 6cce98eca5 .
2025-11-10 11:22:50 +08:00
LostRuins Concedo
6cce98eca5
fix divide by zero error
2025-11-10 01:38:55 +08:00
Georgi Gerganov
b8595b16e6
mtmd : fix embedding size for image input ( #17123 )
2025-11-09 18:31:02 +02:00
Ruben Ortlam
392e09a608
vulkan: fix memory allocations ( #17122 )
2025-11-09 16:14:41 +01:00
compilade
802cef44bf
convert : parse safetensors directly ( #15667 )
...
* convert : parse safetensors directly
* gguf-py : order safetensors tensors by name
Applies to both local and remote safetensors custom parsing.
This matches the behavior of the official safetensors implementation.
* convert : rename from_safetensors_meta to from_local_tensor
For consistency with from_remote_tensor
* convert : fix no-lazy dtypes from direct safetensors
2025-11-09 09:49:40 -05:00
compilade
1c07c0c68c
convert : handle compressed-tensors quant method ( #17069 )
...
* convert : handle compressed-tensors quant method
* convert : handle int-quantized models
* convert : handle naive-quantized models
* gguf-py : __pos__ is also unary
* convert : fix flake8 lint
* convert : use F32 for dequant of pack-quantized tensors
2025-11-09 09:45:50 -05:00
Georgi Gerganov
cb1adf8851
server : handle failures to restore host cache ( #17078 )
...
* server : handle failures to restore host cache
* server : add tests for the prompt cache
2025-11-09 14:27:05 +02:00
Georgi Gerganov
ef1d826997
benches : add folder with benchmarks ( #16931 )
...
* benches : add folder with benchmarks
* benches : update dgx-spark bench
2025-11-09 12:53:29 +02:00
Eric Curtin
86fde91e62
Switch to using Ubuntu 25.10 vulkan/mesa ( #16497 )
...
Because "Ubuntu packages to be discontinued in Vulkan SDK"
Signed-off-by: Eric Curtin <eric.curtin@docker.com>
2025-11-09 10:25:38 +01:00
LostRuins Concedo
60a74bdd89
make tool calling work with jinja. but still need to fix qwen omni first (+1 squashed commits)
...
Squashed commits:
[e394da61e] make tool calling work with jinja. but still need to fix qwen omni first
2025-11-09 16:56:14 +08:00
Ruben Ortlam
7f3e9d339c
vulkan: iGPU memory reporting fix ( #17110 )
...
* vulkan: use all device-local heaps for memory availability reporting
Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
* use all available heaps for iGPU memory reporting
* Allow multiple memory types per buffer request for devices with split heaps
---------
Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-11-09 09:54:47 +01:00
Ruben Ortlam
8a3519b708
vulkan: fix mmq out of bounds reads ( #17108 )
...
* vulkan: fix mmq out of bounds reads, streamline outdated matmul host code
* fix mul_mat_id quantization call
* Fix compiler warnings
2025-11-09 09:52:57 +01:00
Jeff Bolz
80a6cf6347
vulkan: fuse mul_mat_id + mul ( #17095 )
...
* vulkan: fuse mul_mat_id + mul
This comes up in qwen3 moe.
* split mul_mat_id fusion tests into a separate class
2025-11-09 09:48:42 +01:00
Georgi Gerganov
0750a59903
metal : retain src and dst buffers during async ops ( #17101 )
2025-11-09 08:28:51 +02:00
Xuan-Son Nguyen
aa3b7a90b4
arg: add --cache-list argument to list cached models ( #17073 )
...
* arg: add --cache-list argument to list cached models
* new manifest naming format
* improve naming
* Update common/arg.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-08 21:54:14 +01:00
chansikpark
333f2595a3
webui: fix keyboard shortcuts for new chat & edit chat title ( #17007 )
2025-11-08 20:52:35 +01:00
Jeff Bolz
53d7d21e61
vulkan: Use spec constants for conv2d s/d/p and kernel W/H ( #16978 )
...
* vulkan: Use spec constants for conv2d s/d/p and kernel W/H
Also add some additional unroll hints, which seems to help.
* lock around map lookup
2025-11-08 13:24:29 -06:00
LostRuins Concedo
4fc022a51f
revert qwen vl warmup size
2025-11-09 02:24:49 +08:00
LostRuins Concedo
d6a2ad8455
still not really working right
2025-11-09 01:57:48 +08:00
LostRuins Concedo
e6ca0aa8d0
Merge commit ' 2f0c2db43e' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# README.md
# docs/backend/OPENCL.md
# docs/ops.md
# docs/ops/CUDA.csv
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/set_rows.tmpl.wgsl
# scripts/sync-ggml.last
# src/CMakeLists.txt
# tools/server/README.md
2025-11-08 23:27:59 +08:00
LostRuins Concedo
055fdcef63
update model path
...
jinja tojson
2025-11-08 21:51:50 +08:00
Aidan
eeee367de5
server: fix correct time_ms calculation in prompt_progress ( #17093 )
...
Python Type-Check / pyright type-check (push) Has been cancelled
* fix: correct time_ms calculation in send_partial_response
The time_ms field was incorrectly calculated. The division was happening
before the subtraction leading to incorrect values.
Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After:
(ggml_time_us() - slot.t_start_process_prompt) / 1000
* docs : document time_ms field in prompt_progress
2025-11-08 15:12:11 +02:00
Aman Gupta
64fe17fbb8
Revert "CUDA: add expert reduce kernel ( #16857 )" ( #17100 )
2025-11-08 21:05:19 +08:00
Aman Gupta
c1b187688d
CUDA: skip fusion for repeating adds in bias ( #17080 )
2025-11-08 16:58:05 +08:00
SavicStefan
b8a5cfd11a
vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp ( #16636 )
...
Signed-off-by: Stefan Savic <stefan.savic@huawei.com>
Co-authored-by: Stefan Savic <stefan.savic@huawei.com>
2025-11-08 09:28:22 +01:00
Aleksei Nikiforov
08416ebe7f
ggml: disable vxe for cross-compilation by default ( #16966 )
...
Otherwise compilation will fail due to enabling -mvx -mzvector
and not setting corresponding -march options.
2025-11-08 16:00:20 +08:00
Jeff Bolz
b4e335d8dc
vulkan: fuse rms_norm + mul + rope (+ view + set_rows) ( #16977 )
...
This change combines the rms_norm+mul and rope+view+set_rows fusions to
allow fusing the whole sequence together. This comes up in Qwen3, Bailing,
and some other models.
2025-11-08 08:52:15 +01:00
Jeff Bolz
d6fe40fa00
vulkan: Fix test-thread-safety crashes ( #17024 )
...
The std::map pipeline_flash_attn_f32_f16 could be searched and inserted at the
same time, which needs to hold the lock. To be safe, hold the lock for all of
ggml_vk_load_shaders.
2025-11-08 08:39:45 +01:00
Johannes Gäßler
e14e842e87
CUDA: fix MMQ stream-k fixup ne1 indices ( #17089 )
2025-11-08 08:26:18 +01:00
Reese Levine
647b960bd8
ggml webgpu: faster matrix multiplication/matrix-vector multiplication ( #17031 )
...
* Faster tensors (#8 )
Add fast matrix and matrix/vector multiplication.
* Use map for shader replacements instead of pair of strings
2025-11-07 19:27:20 -08:00
LostRuins Concedo
64a1cd95a7
fixed missing headers
2025-11-08 11:09:49 +08:00
LostRuins Concedo
dfb0966ed2
not working
2025-11-08 10:49:10 +08:00
LostRuins Concedo
fdcb281a3a
Merge commit ' 2f966b8ed8' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# docs/docker.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-thread-safety.cpp
# tools/batched-bench/batched-bench.cpp
# tools/mtmd/clip.cpp
2025-11-08 10:34:17 +08:00
LostRuins Concedo
7061cd1cc9
Merge commit ' e4a71599e5' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# tools/mtmd/clip.cpp
2025-11-08 10:28:49 +08:00
LostRuins Concedo
7e787c2083
Revert "Kcpp triage for rowsplit: revert https://github.com/ggml-org/llama.cpp/pull/16715 until https://github.com/ggml-org/llama.cpp/issues/16799 is resolved"
...
This reverts commit 3aec5ed0fd .
2025-11-08 10:16:54 +08:00
LostRuins Concedo
af94884971
update props
2025-11-08 10:15:13 +08:00
bssrdf
299f5d782c
CUDA: properly handle nb00=nb02 case for cpy ( #17081 )
Python Type-Check / pyright type-check (push) Waiting to run
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Update Operations Documentation / update-ops-docs (push) Has been cancelled
2025-11-07 23:41:58 +01:00
Acly
ac76d36201
vulkan : refactor buffer handling in vk_op_f32 ( #16840 )
...
* vulkan : refactor/simplify buffer handling in vk_op_* functions
* Combine UMA handling into ggml_vk_tensor_subbuffer
2025-11-07 21:08:50 +01:00