Concedo
f545f4df75
with forced wmma for cu11
2025-08-22 17:36:29 +08:00
Concedo
257992d6b8
possibly unstable, needs testing for fa
2025-08-22 17:35:32 +08:00
Aaron Teo
ad5c975c2d
ggml-cpu: Support Q5_0 and Q5_1 on s390x ( #15486 )
...
* ggml-cpu: initial q5_0 impl for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: updated q5_0 code for better performance
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: use optimised hsum for better performance
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: introduce q5_1 simd + refactor q5_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix incorrect return type vec_hsum
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: refactor q5_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: q5_1 update loop unroll to 4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update q5_0 unroll to 4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update build-s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update unused variables q5_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update the last update date
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-22 16:11:04 +08:00
Chenguang Li
a0f98dd604
CANN: Optimize RMS_NORM using cache ( #15419 )
...
* [CANN] Optimize RMS_NORM using cache
Signed-off-by: noemotiovon <757486878@qq.com>
* fix typo
Signed-off-by: noemotiovon <757486878@qq.com>
* fix review comment
Signed-off-by: noemotiovon <757486878@qq.com>
* codestyle adjustment
Signed-off-by: noemotiovon <757486878@qq.com>
---------
Signed-off-by: noemotiovon <757486878@qq.com>
2025-08-22 14:12:07 +08:00
Concedo
d8c174bcae
try fix vulkan build
2025-08-22 10:04:47 +08:00
Diego Devesa
54a241f505
sched : fix possible use of wrong ids tensor when offloading moe prompt processing ( #15488 )
2025-08-21 23:09:32 +02:00
Acly
97ae5961a4
vulkan : support conv_2d_dw with f16 weights ( #15392 )
2025-08-21 17:01:51 +02:00
Dong Won Kim
20c2dac8c6
vulkan: add exp operation ( #15456 )
...
Co-authored-by: aeseulgi <kim2h7903@gmail.com>
2025-08-21 17:00:16 +02:00
Jeff Bolz
96452a3fa4
vulkan: Reuse conversion results in prealloc_y ( #15410 )
...
* vulkan: Reuse conversion results in prealloc_y
Cache the pipeline and tensor that were most recently used to fill prealloc_y,
and skip the conversion if the current pipeline/tensor match.
* don't use shared pointer for prealloc_y_last_pipeline_used
2025-08-21 16:55:00 +02:00
Concedo
b50f94ae27
this commit removes ggml_cuda_f16 targets. Merge commit ' 7a6e91ad26' into concedo_experimental
...
# Conflicts:
# docs/build.md
# docs/multimodal/MobileVLM.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
2025-08-21 19:25:29 +08:00
Xuan-Son Nguyen
945e1f12a6
ggml : fix condition of im2col on Metal backend ( #15460 )
2025-08-21 08:32:26 +03:00
R0CKSTAR
8ad038c0fd
musa: add GGML_UNUSED_VARS ( #15446 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-21 11:06:05 +08:00
Diego Devesa
5682a3745f
sched : copy only the used experts when offloading prompt processing ( #15346 )
2025-08-21 01:35:28 +02:00
Johannes Gäßler
13aeb7aef2
CUDA: refactor FA support/selection code ( #15454 )
2025-08-20 23:14:14 +02:00
Johannes Gäßler
7a6e91ad26
CUDA: replace GGML_CUDA_F16 with CUDA arch checks ( #15433 )
2025-08-20 16:58:49 +02:00
Jeff Bolz
fec9519802
vulkan: shorten pipeline name strings ( #15431 )
...
These detailed strings were causing increased build time on gcc.
2025-08-20 16:33:14 +02:00
Concedo
80480316fc
early merge of https://github.com/ggml-org/llama.cpp/pull/15431 , thanks jeff
2025-08-20 20:40:18 +08:00
Concedo
1c41c38a6a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cuda.Dockerfile
# CODEOWNERS
# README.md
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-opencl/ggml-opencl.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/test-chat.cpp
# tools/batched-bench/batched-bench.cpp
# tools/mtmd/clip.h
2025-08-20 20:34:45 +08:00
Concedo
35707f4e97
split vulkan into two compilation units for faster build
2025-08-20 12:12:47 +08:00
R0CKSTAR
a094f38143
musa: fix build warnings ( #15258 )
...
Python Type-Check / pyright type-check (push) Has been cancelled
* musa: fix build warnings
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare]
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-20 10:17:37 +08:00
lhez
fb22dd07a6
opencl: mark argsort unsupported if cols exceed workgroup limit ( #15375 )
2025-08-19 11:25:51 -07:00
SHUAI YANG
a6d3cfe7fa
CANN: optimize rope operator ( #15335 )
...
* optimize rope ops
* amendment
* delete trailing whitespace
* change the variable name
2025-08-19 21:28:22 +08:00
R0CKSTAR
67f09a3a27
musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 ( #15413 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-19 12:33:47 +02:00
Marvin Gießing
6424594c56
ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware ( #15385 )
...
* Added VSX intrinsics for Power9+ systems
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
* Manual unrolling for minor perf improvement
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
* Update ggml/src/ggml-cpu/arch/powerpc/quants.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-08-19 11:54:31 +03:00
Concedo
a6bbd449c4
Merge commit ' 19f4decae0' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# tests/test-backend-ops.cpp
2025-08-19 15:00:32 +08:00
Concedo
140ae92886
Merge commit ' 65349f26f2' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# tests/test-backend-ops.cpp
2025-08-19 11:35:32 +08:00
compilade
f44f793172
ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors ( #15379 )
...
* ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors
* ggml-quants : avoid division by zero in make_q3_quants
2025-08-18 09:23:56 +02:00
Jeff Bolz
ae532eac2c
vulkan: disable spirv-opt for bfloat16 shaders ( #15352 )
2025-08-18 07:56:29 +02:00
Jeff Bolz
21c17b5bef
vulkan: Use larger workgroups for mul_mat_vec when M is small ( #15355 )
...
* vulkan: Use larger workgroups for mul_mat_vec when M is small
Also use subgroup instructions for (part of) the reduction when supported.
Without this, the more expensive reductions would eat into the benefits of
the larger workgroups.
* update heuristic for amd/intel
Co-authored-by: 0cc4m <picard12@live.de>
---------
Co-authored-by: 0cc4m <picard12@live.de>
2025-08-17 18:08:57 +02:00
Dong Won Kim
19f4decae0
vulkan: support sqrt ( #15370 )
2025-08-17 16:03:09 +02:00
Jeff Bolz
de5627910d
vulkan: Optimize argsort ( #15354 )
...
- Launch an appropriate number of invocations (next larger power of two).
32 invocations is common and the barrier is much cheaper there.
- Specialize for "needs bounds checking" vs not.
- Make the code less branchy and [[unroll]] the loops. In the final code,
I see no branches inside the main loop (only predicated stores) when
needs_bounds_check is false.
- Always sort ascending, then apply the ascending vs descending option when
doing the final stores to memory.
- Copy the values into shared memory, makes them slightly cheaper to access.
2025-08-17 10:41:45 +02:00
Concedo
52606e9b1d
tts cpp model is now loadable in kcpp
2025-08-17 15:47:22 +08:00
Concedo
cfc1a0d4ef
tts cpp cli builds and runs fine.
2025-08-17 13:53:27 +08:00
Jeff Bolz
1fe00296f5
vulkan: fuse adds ( #15252 )
...
* vulkan: fuse adds
Fuse adds that have the same shape, which are common in MoE models.
It will currently fuse up to 6 adds, because we assume no more than
8 descriptors per dispatch. But this could be changed.
* check runtimeDescriptorArray feature
* disable multi_add for Intel due to likely driver bug
2025-08-16 11:48:22 -05:00
Jeff Bolz
de2192794f
vulkan: Support mul_mat_id with f32 accumulators ( #15337 )
...
* vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id
* vulkan: Support mul_mat_id with f32 accumulators, but they are not hooked up
- There's no explicit way to request f32 precision for mul_mat_id, but there
probably should be, and this gets the code in place for that.
- A couple fixes to check_results.
- Remove casts to fp16 in coopmat1 FA shader (found by inspection).
2025-08-16 11:18:31 +02:00
Jeff Bolz
2e2b22ba66
vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id ( #15334 )
2025-08-16 10:58:38 +02:00
Concedo
2bf128587d
modify ggml core to support tts
2025-08-16 16:52:34 +08:00
rmatif
912ff8c119
OpenCL: add initial FA support ( #14987 )
...
* add F16/F16 fa support
* fix kernel init
* use mad instead of fma
* use inline function
* mark FA with sinks as unsupported for now
* add pragma unroll to loops
2025-08-16 01:05:55 -07:00
Concedo
d876898476
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/labeler.yml
# .github/workflows/build.yml
# .github/workflows/release.yml
# CODEOWNERS
# README.md
# docs/build-s390x.md
# docs/ops.md
# examples/eval-callback/eval-callback.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/transpose.cl
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-opt.cpp
2025-08-16 12:39:25 +08:00
lhez
e2c1bfff53
opencl: add initial mxfp4 support via mv ( #15270 )
...
* opencl: add reference `mul_mv_mxfp4_f32`
* opencl: add reference `mul_mv_id` for mxfp4
* Q4_0 tranpose fix for Adreno
---------
Co-authored-by: shawngu-quic <shawngu@qti.qualcomm.com>
2025-08-15 09:52:14 -07:00
Georgi Gerganov
5edf1592fd
vulkan : fix out-of-bounds access in argmax kernel ( #15342 )
...
ggml-ci
2025-08-15 16:16:36 +02:00
Georgi Gerganov
db3010bd23
vulkan : fix compile warnings on macos ( #15340 )
...
ggml-ci
2025-08-15 15:28:28 +02:00
Aaron Teo
ff27f80a74
ggml: initial IBM zDNN backend ( #14975 )
...
* ggml-zdnn: inital backend impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: tensor->extra logging check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add output buffer check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: run compute and store into tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add more loggers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update set_tensor logging to check only for matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: last working matmul version
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add comments to prevent accidentally deleting lines
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: support op out_prod
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update op out_prod to use tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rewrite the backend implementation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: bugfix new impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix compiler warnings and bugfixes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: test ztensor finding in init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: implement at least 1 op to test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: assign tensor->extra to buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add check for view tensors to prevent init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rework init_tensor to create new buffers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: switch to std vector instead of array
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: switch buffers back and set to arbitrary number
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: impl init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update supports_op matmul matrix
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: impl matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix compiler error missing type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing data transform call
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add bias init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: tighten memory usage, change string allocation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add bias ztensor and data free
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add bias data transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add more debug info for extra buffer transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add logger to check if mat mul ops go through set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: activate bias transform in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: move weights transform into mulmat
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add more safeguards in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix sequencing of transforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: bugfix transform ztensor vs origtensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: figure out why sigtrap is happening
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: move everything back to local declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: move bias data to local also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: bring back working matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rewrite into mre
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing vector import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing vector import in header
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt to fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix invalid ztensor buffer release
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add logging to debug free buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: remove free_buffer debug info
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add parmblkformat detections
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add nnpa installed detection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add zdnn_init call for static libs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* chore: add codeowners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: disable batched matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt at fixing tensor views during matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: deny all view tensors directly
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix pr comments
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update ops docs for zdnn
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: redo test-backend-ops for ops.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix typo in build-s390x.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* codeowners: remove taronaeo for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "codeowners: remove taronaeo for now"
This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f.
* ggml-zdnn: remove unused ggml_zdnn macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-15 21:11:22 +08:00
Johannes Gäßler
4227c9be42
CUDA: fix negative KV_max values in FA ( #15321 )
2025-08-14 23:21:24 +02:00
uvos
5ba36f6103
HIP: Cleanup hipification header ( #15285 )
...
add expicit conversion operator to support older versions of rocm
Switch over to hip_bf16 from legacy hip_bfloat16
Simplify RDNA3 define
Reduce swap over of new hipblas api to rocm 6.5 as this version is used for rocm 7.0 previews
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-08-14 16:23:56 +02:00
Concedo
7ac0102ed3
hope i didnt break anything
2025-08-14 21:42:24 +08:00
Jeff Bolz
863d341eeb
vulkan: perf_logger improvements ( #15246 )
...
* vulkan: perf_logger improvements
- Account for batch dimension in flops calculation.
- Fix how "_VEC" is detected for mat_mul_id.
- Fix "n" dimension for mat_mul_id (in case of broadcasting).
- Include a->type in name.
* use <=mul_mat_vec_max_cols rather than ==1
2025-08-14 08:38:10 -05:00
Concedo
d5876024ec
Merge commit ' f4586ee598' into concedo_experimental
...
# Conflicts:
# README.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.6.md
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/add.cl
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tools/perplexity/perplexity.cpp
# tools/server/README.md
2025-08-14 21:29:52 +08:00
Jason Ni
5ade3000bd
ggml: fix ggml_conv_1d_dw bug (ggml/1323)
...
* ggml: fix ggml_conv_1d_dw bug
* Fixed conv1d_dw weight tensor dimension.
2025-08-14 14:59:27 +03:00
Sigbjørn Skjæret
4ebd0c125b
cuda : fix GGML_CUDA_GRAPHS=OFF ( #15300 )
...
* fix USE_CUDA_GRAPH=OFF
ggml-ci
* check capture status
* completely disable capturing check instead
2025-08-14 13:22:07 +03:00