Commit graph

8329 commits

Author SHA1 Message Date
Concedo
6effb65cfe change singleinstance order 2025-06-06 21:20:30 +08:00
Concedo
d18938fc70 fixed build 2025-06-06 18:05:44 +08:00
Concedo
d33c88b1f4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	ci/run.sh
#	examples/embedding/embedding.cpp
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	src/CMakeLists.txt
2025-06-06 17:56:51 +08:00
Concedo
2b5d8e467b updated lite 2025-06-06 17:49:56 +08:00
Concedo
740f91e3fd lower aria interval 2025-06-06 17:43:38 +08:00
Concedo
8b141d8647 stick to cu12.1 for linux for now 2025-06-06 17:38:28 +08:00
Sigbjørn Skjæret
d17a809ef0
llama : support multiple classifier outputs and labels (#13940) 2025-06-06 09:03:25 +02:00
Concedo
9cf32e5fee step limits over adapter for sd 2025-06-06 14:12:43 +08:00
Concedo
5f38594dc0 remove debug prints 2025-06-06 14:08:57 +08:00
Concedo
ca99f79ea9 cu11 just always stick to wmma 2025-06-06 14:02:34 +08:00
Concedo
eec5a8ad16 breaking change: due to cuda12 upgrade, release filenames will change. standardize them to windows naming for the future. (+1 squashed commits)
Squashed commits:

[75842919a] cuda12.4 test
2025-06-06 14:02:34 +08:00
Concedo
50a27793d3 upgrade windows runners to windows 2022, cu11 still uses vs2019
this should finally work (+21 squashed commit)

Squashed commit:

[5edac5b59] Revert "quick dbg"

This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f.

[fd62a997c] quick dbg

[bcccae7e6] sanity check 2

[568e2eb08] sanity check

[2f30d573a] please work 2

[cf8765221] please work

[c535e60d9] try a small trick

[d4ba79b80] 2022 test

[3f146b000] t2

[4a3b9a9b4] revert and test

[4bdc9a149] reverted test2

[5081cb4a3] reverted test

[ea9a826f3] broken test

[3c11ae389] compare 2019

[8ecec4fec] not for cu12

[0be964f3a] added vs2019 for the other runners

[5d24641cb] debugging 4

[1dee79207] debugging 3

[ab172f133] more debugging 2

[b1a895e84] more debugging

[5d21d8bd0] vs2019 setup
2025-06-06 14:02:34 +08:00
Sigbjørn Skjæret
1caae7fc6c
gguf-py : add add_classifier_output_labels method to writer (#14031)
* add add_classifier_output_labels

* use add_classifier_output_labels
2025-06-05 17:42:31 +02:00
Masato Nakasaka
669c13e0f6
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001)
* allowing B580 and U9-288V

* experimenting code to detect Xe2

* allowing coopmat only for Xe2 GPUs

* fixed comment wording

* fixed comment wording

* removed unnecessary driver check
2025-06-05 16:00:29 +02:00
pockers21
146b88e8b3
ci: fix CUDA build failure on autodl cloud machines (#14005)
Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.

Co-authored-by: pockers21 <liyang2@uniontech.com>
2025-06-05 16:25:29 +03:00
Georgi Gerganov
7f37b6cf1e
memory : migrate from llama_kv_cache to more generic llama_memory (#14006)
* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API

ggml-ci

* context : fix casts

ggml-ci
2025-06-05 15:29:22 +03:00
Diego Devesa
3a077146a4
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) 2025-06-05 11:57:42 +02:00
Olexandr88
d01d112abb
readme : add badge (#13938) 2025-06-05 10:50:55 +03:00
Sigbjørn Skjæret
9f47fa5792
vocab : warn about missing mask token (#14022) 2025-06-05 09:29:18 +02:00
Georgi Gerganov
9e31bec4fd
context : fix pos_min initialization upon error decode (#14008)
ggml-ci
2025-06-05 09:06:29 +03:00
Jeff Bolz
5a8ae3053c
vulkan: automatically deduce size of push constants (#13936) 2025-06-05 07:17:58 +02:00
Concedo
bc89b465a8 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.github/workflows/server.yml
#	README.md
#	docs/build.md
#	docs/install.md
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
2025-06-05 11:03:34 +08:00
Concedo
a341188f84 add install for vs2019 2025-06-05 10:32:57 +08:00
Concedo
f6bbc350f2 various qol fixes 2025-06-05 10:26:02 +08:00
Concedo
a74d8669b3 try hardcoded path (+1 squashed commits)
Squashed commits:

[711b43d9d] let's see if VS2019 can work
2025-06-05 10:26:02 +08:00
Ervin Áron Tasnádi
0d3984424f
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)
* * ggml-vulkan: adds op CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

* Missing barrier added to shader.
Number of additional tests reduced to 108.

* * Fixes typo in variable name.

* Removes extra whitespaces.

* Adds int64->int32 casts to prevent possible warnings.

* Problem size reduced in tests to pass tests with llvmpipe.

* supports_op condition moved from unintended position
2025-06-04 22:02:00 +02:00
Georgi Gerganov
3e63a58ef7
kv-cache : refactor the update/defrag mechanism (#13988)
* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci
2025-06-04 18:58:20 +03:00
Concedo
736030bb9f save and load state upgraded to 3 available states 2025-06-04 22:09:40 +08:00
Diego Devesa
2589ad3704
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997) 2025-06-04 15:37:40 +02:00
Concedo
06d2bc3404 ollama compat fixes 2025-06-04 19:22:29 +08:00
Diego Devesa
482548716f
releases : use dl backend for linux release, remove arm64 linux release (#13996) 2025-06-04 13:15:54 +02:00
Concedo
2fdb0acd59 slightly clean up cmake file (+3 squashed commit)
Squashed commit:

[e050f83db] Revert "test cu11 build on win2022"

This reverts commit 1bf989f2b3789c99aa9883cfe70550de6c26db23.

[1bf989f2b] test cu11 build on win2022

[5dc94eae8] updated lite
2025-06-04 18:42:07 +08:00
Xuan-Son Nguyen
3ac67535c8
llama-graph : use ggml_repeat_4d (#13998) 2025-06-04 10:11:26 +02:00
Johannes Gäßler
0b4be4c435
CUDA: fix FTZ in FA for Gemma 3 (#13991) 2025-06-04 08:57:05 +02:00
Georgi Gerganov
e0e806f52e
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)
ggml-ci
2025-06-04 09:50:32 +03:00
Jeff Bolz
7e00e60ef8
vulkan: fix warnings in perf logger querypool code (#13937) 2025-06-03 20:30:22 +02:00
Concedo
53f1511396 use a static buffer for kv reloads instead. also, added into lite ui 2025-06-03 22:32:46 +08:00
Xuan-Son Nguyen
ea1431b0fa
docs : add "Quick start" section for new users (#13862)
* docs : add "Quick start" section for non-technical users

* rm flox

* Update README.md
2025-06-03 13:09:36 +02:00
Concedo
4b57108508 Save KV State and Load KV State to memory added. GUI not yet updated 2025-06-03 17:46:29 +08:00
lhez
71e74a3ac9
opencl: add backend_synchronize (#13939)
* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.
2025-06-02 16:54:58 -07:00
rmatif
bfb1e012a0
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)
* add concat, pad, repeat, tsembd, tanh, upscale

* small fixes
2025-06-02 16:53:36 -07:00
Georgi Gerganov
3637576288
server : disable speculative decoding for SWA models (#13970)
* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models
2025-06-02 21:34:40 +03:00
Georgi Gerganov
ea394d7ab1
metal : use F32 accumulators in FA kernels (#13975)
ggml-ci
2025-06-02 21:33:40 +03:00
Georgi Gerganov
5582c49c39
gemma : more consistent attention scaling for v2 and v3 (#13951)
* gemma : fix attn scale for 27B

* cont : apply scale before attn

* cont : consistent attention scaling
2025-06-02 20:54:26 +03:00
Olivier Chafik
c9bbc77931
server: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
* server: update deepseek reasoning format (now in reasoning_content diffs), add legacy option for compat
* update unit/test_tool_call.py::test_thoughts
2025-06-02 10:15:44 -07:00
Concedo
b42b618897 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	examples/parallel/parallel.cpp
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-blas/CMakeLists.txt
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/gguf.cpp
#	scripts/sync-ggml.last
#	tests/test-gguf.cpp
2025-06-02 23:26:43 +08:00
Concedo
5e667659ec Merge commit '0fc16b42e8' into concedo_experimental
# Conflicts:
#	src/CMakeLists.txt
#	src/llama-kv-cache.cpp
2025-06-02 23:14:23 +08:00
Xuan-Son Nguyen
bfd322796c
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)
* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-02 16:29:28 +02:00
Concedo
6ce85c54d6 not working correctly 2025-06-02 22:12:10 +08:00
shalinib-ibm
093e3f1feb
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966)
Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".

This patch provides a fix by first converting the string to uppercase before applying the regex.

Signed-off-by: root <root@rheldb2v.pperf.tadn.ibm.com>
Co-authored-by: root <root@rheldb2v.pperf.tadn.ibm.com>
2025-06-02 15:18:36 +03:00