Concedo
ca99f79ea9
cu11 just always stick to wmma
2025-06-06 14:02:34 +08:00
Concedo
eec5a8ad16
breaking change: due to cuda12 upgrade, release filenames will change. standardize them to windows naming for the future. (+1 squashed commits)
...
Squashed commits:
[75842919a] cuda12.4 test
2025-06-06 14:02:34 +08:00
Concedo
50a27793d3
upgrade windows runners to windows 2022, cu11 still uses vs2019
...
this should finally work (+21 squashed commit)
Squashed commit:
[5edac5b59] Revert "quick dbg"
This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f.
[fd62a997c] quick dbg
[bcccae7e6] sanity check 2
[568e2eb08] sanity check
[2f30d573a] please work 2
[cf8765221] please work
[c535e60d9] try a small trick
[d4ba79b80] 2022 test
[3f146b000] t2
[4a3b9a9b4] revert and test
[4bdc9a149] reverted test2
[5081cb4a3] reverted test
[ea9a826f3] broken test
[3c11ae389] compare 2019
[8ecec4fec] not for cu12
[0be964f3a] added vs2019 for the other runners
[5d24641cb] debugging 4
[1dee79207] debugging 3
[ab172f133] more debugging 2
[b1a895e84] more debugging
[5d21d8bd0] vs2019 setup
2025-06-06 14:02:34 +08:00
Concedo
bc89b465a8
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# .github/workflows/server.yml
# README.md
# docs/build.md
# docs/install.md
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
2025-06-05 11:03:34 +08:00
Concedo
a341188f84
add install for vs2019
2025-06-05 10:32:57 +08:00
Concedo
f6bbc350f2
various qol fixes
2025-06-05 10:26:02 +08:00
Concedo
a74d8669b3
try hardcoded path (+1 squashed commits)
...
Squashed commits:
[711b43d9d] let's see if VS2019 can work
2025-06-05 10:26:02 +08:00
Ervin Áron Tasnádi
0d3984424f
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D ( #13813 )
...
* * ggml-vulkan: adds op CONV_TRANSPOSE_1D
* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D
* Missing barrier added to shader.
Number of additional tests reduced to 108.
* * Fixes typo in variable name.
* Removes extra whitespaces.
* Adds int64->int32 casts to prevent possible warnings.
* Problem size reduced in tests to pass tests with llvmpipe.
* supports_op condition moved from unintended position
2025-06-04 22:02:00 +02:00
Georgi Gerganov
3e63a58ef7
kv-cache : refactor the update/defrag mechanism ( #13988 )
...
* kv-cache : refactor update mechanism
ggml-ci
* memory : improve status handling
* defrag : reset head + add comments
ggml-ci
* cont : minor fixes
ggml-ci
2025-06-04 18:58:20 +03:00
Concedo
736030bb9f
save and load state upgraded to 3 available states
2025-06-04 22:09:40 +08:00
Diego Devesa
2589ad3704
ci : remove cuda 11.7 releases, switch runner to windows 2022 ( #13997 )
2025-06-04 15:37:40 +02:00
Concedo
06d2bc3404
ollama compat fixes
2025-06-04 19:22:29 +08:00
Diego Devesa
482548716f
releases : use dl backend for linux release, remove arm64 linux release ( #13996 )
2025-06-04 13:15:54 +02:00
Concedo
2fdb0acd59
slightly clean up cmake file (+3 squashed commit)
...
Squashed commit:
[e050f83db] Revert "test cu11 build on win2022"
This reverts commit 1bf989f2b3789c99aa9883cfe70550de6c26db23.
[1bf989f2b] test cu11 build on win2022
[5dc94eae8] updated lite
2025-06-04 18:42:07 +08:00
Xuan-Son Nguyen
3ac67535c8
llama-graph : use ggml_repeat_4d ( #13998 )
2025-06-04 10:11:26 +02:00
Johannes Gäßler
0b4be4c435
CUDA: fix FTZ in FA for Gemma 3 ( #13991 )
2025-06-04 08:57:05 +02:00
Georgi Gerganov
e0e806f52e
kv-cache : fix unified::seq_rm to work with seq_id < 0 ( #13985 )
...
ggml-ci
2025-06-04 09:50:32 +03:00
Jeff Bolz
7e00e60ef8
vulkan: fix warnings in perf logger querypool code ( #13937 )
2025-06-03 20:30:22 +02:00
Concedo
53f1511396
use a static buffer for kv reloads instead. also, added into lite ui
2025-06-03 22:32:46 +08:00
Xuan-Son Nguyen
ea1431b0fa
docs : add "Quick start" section for new users ( #13862 )
...
* docs : add "Quick start" section for non-technical users
* rm flox
* Update README.md
2025-06-03 13:09:36 +02:00
Concedo
4b57108508
Save KV State and Load KV State to memory added. GUI not yet updated
2025-06-03 17:46:29 +08:00
lhez
71e74a3ac9
opencl: add backend_synchronize ( #13939 )
...
* This is not needed by the normal use where the result is read
using `tensor_get`, but it allows perf mode of `test-backend-ops`
to properly measure performance.
2025-06-02 16:54:58 -07:00
rmatif
bfb1e012a0
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat ( #13840 )
...
* add concat, pad, repeat, tsembd, tanh, upscale
* small fixes
2025-06-02 16:53:36 -07:00
Georgi Gerganov
3637576288
server : disable speculative decoding for SWA models ( #13970 )
...
* server : use swa-full fo draft context
ggml-ci
* server : disable speculative decoding for SWA models
2025-06-02 21:34:40 +03:00
Georgi Gerganov
ea394d7ab1
metal : use F32 accumulators in FA kernels ( #13975 )
...
ggml-ci
2025-06-02 21:33:40 +03:00
Georgi Gerganov
5582c49c39
gemma : more consistent attention scaling for v2 and v3 ( #13951 )
...
* gemma : fix attn scale for 27B
* cont : apply scale before attn
* cont : consistent attention scaling
2025-06-02 20:54:26 +03:00
Olivier Chafik
c9bbc77931
server: update deepseek reasoning format (pass reasoning_content as diffs) (#13933 )
...
* server: update deepseek reasoning format (now in reasoning_content diffs), add legacy option for compat
* update unit/test_tool_call.py::test_thoughts
2025-06-02 10:15:44 -07:00
Concedo
b42b618897
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/parallel/parallel.cpp
# ggml/src/CMakeLists.txt
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/gguf.cpp
# scripts/sync-ggml.last
# tests/test-gguf.cpp
2025-06-02 23:26:43 +08:00
Concedo
5e667659ec
Merge commit ' 0fc16b42e8' into concedo_experimental
...
# Conflicts:
# src/CMakeLists.txt
# src/llama-kv-cache.cpp
2025-06-02 23:14:23 +08:00
Xuan-Son Nguyen
bfd322796c
mtmd : fix memory leak in mtmd_helper_eval_chunk_single ( #13961 )
...
* mtmd : fix memory in mtmd_helper_eval_chunk_single
* mtmd-cli : fix mem leak
* Update tools/mtmd/mtmd-cli.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-02 16:29:28 +02:00
Concedo
6ce85c54d6
not working correctly
2025-06-02 22:12:10 +08:00
shalinib-ibm
093e3f1feb
cmake : Handle mixed-case 'Power' strings in POWER CPU detection ( #13966 )
...
Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".
This patch provides a fix by first converting the string to uppercase before applying the regex.
Signed-off-by: root <root@rheldb2v.pperf.tadn.ibm.com>
Co-authored-by: root <root@rheldb2v.pperf.tadn.ibm.com>
2025-06-02 15:18:36 +03:00
Concedo
c7d42a5a07
updated lite
2025-06-02 17:28:36 +08:00
Atharva Dubey
663445b0de
sycl: quantize and reorder the input to q8_1 when reorder is enabled ( #13826 )
...
* [WIP]: fuse q8 quantization and reorder
* wip2: fuse q8 quantization and reorder
* working q8 reorder commit
* restored common.hpp
* remove debug prints
* remove unnecessary headers and remove trailing whitespace
* Update ggml/src/ggml-sycl/ggml-sycl.cpp
Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>
---------
Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@intel.com>
2025-06-02 10:12:20 +01:00
Concedo
8e1ebc55b5
dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored
2025-06-02 12:49:53 +08:00
Concedo
51dc1cf920
added scale for text lora
2025-06-02 00:13:42 +08:00
Johannes Gäßler
7675c555a1
gguf: fix failure on version == 0 ( #13956 )
2025-06-01 18:08:05 +02:00
Sigbjørn Skjæret
5e1c3aed40
convert : fix nomic-bert-moe mask token ( #13757 )
2025-06-01 18:07:21 +02:00
Sigbjørn Skjæret
c496fe0b1d
convert : fix vocab padding code for bert models ( #13954 )
2025-06-01 17:23:11 +02:00
Aaron Teo
e57bb87ced
ggml: check if non-native endian model is being loaded ( #13943 )
...
* gguf: prevent non-native endian models from being loaded
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* gguf: update error message
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* gguf: make the non-native endian check more verbose
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_assert location
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: reword the endianness check error message
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-01 16:53:57 +02:00
Concedo
74ef097c4a
added ability to set koboldcpp as default handler for gguf and kcpps
2025-06-01 22:36:41 +08:00
Georgi Gerganov
f3a4b1659c
sync : ggml
...
ggml-ci
2025-06-01 13:43:57 +03:00
Kai Pastor
108009f5c7
vulkan : Remove unexpected ; (ggml/1253)
2025-06-01 13:43:57 +03:00
Kai Pastor
d337252acf
cmake : Fix broken CMake error messages (ggml/1252)
2025-06-01 13:43:57 +03:00
Radoslav Gerganov
af6f91db47
ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247)
...
The implementation is already deleted with commit 9d0762e.
closes : #1235
2025-06-01 13:43:57 +03:00
Georgi Gerganov
a7b8d35f78
sync : whisper.cpp (ggml/1250)
...
* ggml : Fix backtrace breaking Windows build (whisper/3203)
* sync : whisper.cpp
ggml-ci
---------
Co-authored-by: Daniel Tang <danielzgtg.opensource@gmail.com>
2025-06-01 13:43:57 +03:00
Radoslav Gerganov
6eba72b71c
ggml : install dynamic backends (ggml/1240)
...
* ggml : install dynamic backends
Make sure dynamic backends are installed in $CMAKE_INSTALL_BINDIR
2025-06-01 13:43:57 +03:00
Daniel Tang
fedf034a98
ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)
...
The goal is to have what users call "full logs" contain the backtrace.
This is registered upon ggml_init. Also fixes a minor fd leak on Linux.
2025-06-01 13:43:57 +03:00
ddh0
8726392d3d
readme : update bindings ( #13950 )
2025-06-01 11:44:30 +03:00
Georgi Gerganov
c04621711a
parallel : fix n_junk == 0 ( #13952 )
2025-06-01 11:42:16 +03:00