Commit graph

8342 commits

Author SHA1 Message Date
Concedo
6c5c8be48d try to make rocm work for the github ci, requires disabling rocwmma 2025-06-08 21:52:29 +08:00
Concedo
7f57846c2f update bundled vcrts 2025-06-08 19:39:42 +08:00
Concedo
2d4c1aa5a0 chroma support is now usable 2025-06-08 18:53:59 +08:00
Concedo
30cf433ab4 merge base support for chroma, however its not working correctly 2025-06-08 18:06:23 +08:00
Concedo
dcf88d6e78 Revert "make tts use gpu by default. use --ttscpu to disable"
This reverts commit 669f80265b.
2025-06-08 17:08:04 +08:00
Concedo
669f80265b make tts use gpu by default. use --ttscpu to disable 2025-06-08 17:06:19 +08:00
Concedo
7132d6b15c test rocm rolling (+1 squashed commits)
Squashed commits:

[43c8f7fc6] test rocm rolling (+4 squashed commit)

Squashed commit:

[16a60aa77] test clobber 4

[a6c866450] test clobber 3

[9322f17f6] test clobber 2

[b7a420cbe] testing clobber
2025-06-08 15:33:05 +08:00
henk717
5d8f499f03
Remove 32GB of rocm dependencies with this one special trick (#1585)
* One file to remove them all

* That one lib wasn't versioned
2025-06-08 11:16:15 +08:00
Concedo
a80dfa5c10 various minor fixes 2025-06-08 01:11:42 +08:00
Concedo
301450b1eb attempt to use system glslc first before using bundled glslc 2025-06-07 16:54:25 +08:00
Concedo
38ce7e06cc updated readme 2025-06-07 10:23:41 +08:00
Concedo
cfcdfd69bd allow embeddings models to use mmap 2025-06-07 10:14:00 +08:00
Concedo
abc272d89f breaking change: standardize ci binary names 2025-06-07 00:40:46 +08:00
Concedo
6effb65cfe change singleinstance order 2025-06-06 21:20:30 +08:00
Concedo
d18938fc70 fixed build 2025-06-06 18:05:44 +08:00
Concedo
d33c88b1f4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	ci/run.sh
#	examples/embedding/embedding.cpp
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	src/CMakeLists.txt
2025-06-06 17:56:51 +08:00
Concedo
2b5d8e467b updated lite 2025-06-06 17:49:56 +08:00
Concedo
740f91e3fd lower aria interval 2025-06-06 17:43:38 +08:00
Concedo
8b141d8647 stick to cu12.1 for linux for now 2025-06-06 17:38:28 +08:00
Sigbjørn Skjæret
d17a809ef0
llama : support multiple classifier outputs and labels (#13940) 2025-06-06 09:03:25 +02:00
Concedo
9cf32e5fee step limits over adapter for sd 2025-06-06 14:12:43 +08:00
Concedo
5f38594dc0 remove debug prints 2025-06-06 14:08:57 +08:00
Concedo
ca99f79ea9 cu11 just always stick to wmma 2025-06-06 14:02:34 +08:00
Concedo
eec5a8ad16 breaking change: due to cuda12 upgrade, release filenames will change. standardize them to windows naming for the future. (+1 squashed commits)
Squashed commits:

[75842919a] cuda12.4 test
2025-06-06 14:02:34 +08:00
Concedo
50a27793d3 upgrade windows runners to windows 2022, cu11 still uses vs2019
this should finally work (+21 squashed commit)

Squashed commit:

[5edac5b59] Revert "quick dbg"

This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f.

[fd62a997c] quick dbg

[bcccae7e6] sanity check 2

[568e2eb08] sanity check

[2f30d573a] please work 2

[cf8765221] please work

[c535e60d9] try a small trick

[d4ba79b80] 2022 test

[3f146b000] t2

[4a3b9a9b4] revert and test

[4bdc9a149] reverted test2

[5081cb4a3] reverted test

[ea9a826f3] broken test

[3c11ae389] compare 2019

[8ecec4fec] not for cu12

[0be964f3a] added vs2019 for the other runners

[5d24641cb] debugging 4

[1dee79207] debugging 3

[ab172f133] more debugging 2

[b1a895e84] more debugging

[5d21d8bd0] vs2019 setup
2025-06-06 14:02:34 +08:00
Sigbjørn Skjæret
1caae7fc6c
gguf-py : add add_classifier_output_labels method to writer (#14031)
* add add_classifier_output_labels

* use add_classifier_output_labels
2025-06-05 17:42:31 +02:00
Masato Nakasaka
669c13e0f6
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001)
* allowing B580 and U9-288V

* experimenting code to detect Xe2

* allowing coopmat only for Xe2 GPUs

* fixed comment wording

* fixed comment wording

* removed unnecessary driver check
2025-06-05 16:00:29 +02:00
pockers21
146b88e8b3
ci: fix CUDA build failure on autodl cloud machines (#14005)
Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.

Co-authored-by: pockers21 <liyang2@uniontech.com>
2025-06-05 16:25:29 +03:00
Georgi Gerganov
7f37b6cf1e
memory : migrate from llama_kv_cache to more generic llama_memory (#14006)
* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API

ggml-ci

* context : fix casts

ggml-ci
2025-06-05 15:29:22 +03:00
Diego Devesa
3a077146a4
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) 2025-06-05 11:57:42 +02:00
Olexandr88
d01d112abb
readme : add badge (#13938) 2025-06-05 10:50:55 +03:00
Sigbjørn Skjæret
9f47fa5792
vocab : warn about missing mask token (#14022) 2025-06-05 09:29:18 +02:00
Georgi Gerganov
9e31bec4fd
context : fix pos_min initialization upon error decode (#14008)
ggml-ci
2025-06-05 09:06:29 +03:00
Jeff Bolz
5a8ae3053c
vulkan: automatically deduce size of push constants (#13936) 2025-06-05 07:17:58 +02:00
Concedo
bc89b465a8 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.github/workflows/server.yml
#	README.md
#	docs/build.md
#	docs/install.md
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
2025-06-05 11:03:34 +08:00
Concedo
a341188f84 add install for vs2019 2025-06-05 10:32:57 +08:00
Concedo
f6bbc350f2 various qol fixes 2025-06-05 10:26:02 +08:00
Concedo
a74d8669b3 try hardcoded path (+1 squashed commits)
Squashed commits:

[711b43d9d] let's see if VS2019 can work
2025-06-05 10:26:02 +08:00
Ervin Áron Tasnádi
0d3984424f
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)
* * ggml-vulkan: adds op CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

* Missing barrier added to shader.
Number of additional tests reduced to 108.

* * Fixes typo in variable name.

* Removes extra whitespaces.

* Adds int64->int32 casts to prevent possible warnings.

* Problem size reduced in tests to pass tests with llvmpipe.

* supports_op condition moved from unintended position
2025-06-04 22:02:00 +02:00
Georgi Gerganov
3e63a58ef7
kv-cache : refactor the update/defrag mechanism (#13988)
* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci
2025-06-04 18:58:20 +03:00
Concedo
736030bb9f save and load state upgraded to 3 available states 2025-06-04 22:09:40 +08:00
Diego Devesa
2589ad3704
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997) 2025-06-04 15:37:40 +02:00
Concedo
06d2bc3404 ollama compat fixes 2025-06-04 19:22:29 +08:00
Diego Devesa
482548716f
releases : use dl backend for linux release, remove arm64 linux release (#13996) 2025-06-04 13:15:54 +02:00
Concedo
2fdb0acd59 slightly clean up cmake file (+3 squashed commit)
Squashed commit:

[e050f83db] Revert "test cu11 build on win2022"

This reverts commit 1bf989f2b3789c99aa9883cfe70550de6c26db23.

[1bf989f2b] test cu11 build on win2022

[5dc94eae8] updated lite
2025-06-04 18:42:07 +08:00
Xuan-Son Nguyen
3ac67535c8
llama-graph : use ggml_repeat_4d (#13998) 2025-06-04 10:11:26 +02:00
Johannes Gäßler
0b4be4c435
CUDA: fix FTZ in FA for Gemma 3 (#13991) 2025-06-04 08:57:05 +02:00
Georgi Gerganov
e0e806f52e
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)
ggml-ci
2025-06-04 09:50:32 +03:00
Jeff Bolz
7e00e60ef8
vulkan: fix warnings in perf logger querypool code (#13937) 2025-06-03 20:30:22 +02:00
Concedo
53f1511396 use a static buffer for kv reloads instead. also, added into lite ui 2025-06-03 22:32:46 +08:00