Concedo
dcf88d6e78
Revert "make tts use gpu by default. use --ttscpu to disable"
...
This reverts commit 669f80265b .
2025-06-08 17:08:04 +08:00
Concedo
669f80265b
make tts use gpu by default. use --ttscpu to disable
2025-06-08 17:06:19 +08:00
Concedo
7132d6b15c
test rocm rolling (+1 squashed commits)
...
Squashed commits:
[43c8f7fc6] test rocm rolling (+4 squashed commit)
Squashed commit:
[16a60aa77] test clobber 4
[a6c866450] test clobber 3
[9322f17f6] test clobber 2
[b7a420cbe] testing clobber
2025-06-08 15:33:05 +08:00
henk717
5d8f499f03
Remove 32GB of rocm dependencies with this one special trick ( #1585 )
...
* One file to remove them all
* That one lib wasn't versioned
2025-06-08 11:16:15 +08:00
Concedo
a80dfa5c10
various minor fixes
2025-06-08 01:11:42 +08:00
Concedo
301450b1eb
attempt to use system glslc first before using bundled glslc
2025-06-07 16:54:25 +08:00
Concedo
38ce7e06cc
updated readme
2025-06-07 10:23:41 +08:00
Concedo
cfcdfd69bd
allow embeddings models to use mmap
2025-06-07 10:14:00 +08:00
Concedo
abc272d89f
breaking change: standardize ci binary names
2025-06-07 00:40:46 +08:00
Concedo
6effb65cfe
change singleinstance order
2025-06-06 21:20:30 +08:00
Concedo
d18938fc70
fixed build
2025-06-06 18:05:44 +08:00
Concedo
d33c88b1f4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# ci/run.sh
# examples/embedding/embedding.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# src/CMakeLists.txt
2025-06-06 17:56:51 +08:00
Concedo
2b5d8e467b
updated lite
2025-06-06 17:49:56 +08:00
Concedo
740f91e3fd
lower aria interval
2025-06-06 17:43:38 +08:00
Concedo
8b141d8647
stick to cu12.1 for linux for now
2025-06-06 17:38:28 +08:00
Sigbjørn Skjæret
d17a809ef0
llama : support multiple classifier outputs and labels ( #13940 )
2025-06-06 09:03:25 +02:00
Concedo
9cf32e5fee
step limits over adapter for sd
2025-06-06 14:12:43 +08:00
Concedo
5f38594dc0
remove debug prints
2025-06-06 14:08:57 +08:00
Concedo
ca99f79ea9
cu11 just always stick to wmma
2025-06-06 14:02:34 +08:00
Concedo
eec5a8ad16
breaking change: due to cuda12 upgrade, release filenames will change. standardize them to windows naming for the future. (+1 squashed commits)
...
Squashed commits:
[75842919a] cuda12.4 test
2025-06-06 14:02:34 +08:00
Concedo
50a27793d3
upgrade windows runners to windows 2022, cu11 still uses vs2019
...
this should finally work (+21 squashed commit)
Squashed commit:
[5edac5b59] Revert "quick dbg"
This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f.
[fd62a997c] quick dbg
[bcccae7e6] sanity check 2
[568e2eb08] sanity check
[2f30d573a] please work 2
[cf8765221] please work
[c535e60d9] try a small trick
[d4ba79b80] 2022 test
[3f146b000] t2
[4a3b9a9b4] revert and test
[4bdc9a149] reverted test2
[5081cb4a3] reverted test
[ea9a826f3] broken test
[3c11ae389] compare 2019
[8ecec4fec] not for cu12
[0be964f3a] added vs2019 for the other runners
[5d24641cb] debugging 4
[1dee79207] debugging 3
[ab172f133] more debugging 2
[b1a895e84] more debugging
[5d21d8bd0] vs2019 setup
2025-06-06 14:02:34 +08:00
Sigbjørn Skjæret
1caae7fc6c
gguf-py : add add_classifier_output_labels method to writer ( #14031 )
...
* add add_classifier_output_labels
* use add_classifier_output_labels
2025-06-05 17:42:31 +02:00
Masato Nakasaka
669c13e0f6
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs ( #14001 )
...
* allowing B580 and U9-288V
* experimenting code to detect Xe2
* allowing coopmat only for Xe2 GPUs
* fixed comment wording
* fixed comment wording
* removed unnecessary driver check
2025-06-05 16:00:29 +02:00
pockers21
146b88e8b3
ci: fix CUDA build failure on autodl cloud machines ( #14005 )
...
Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.
Co-authored-by: pockers21 <liyang2@uniontech.com>
2025-06-05 16:25:29 +03:00
Georgi Gerganov
7f37b6cf1e
memory : migrate from llama_kv_cache to more generic llama_memory ( #14006 )
...
* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API
ggml-ci
* context : fix casts
ggml-ci
2025-06-05 15:29:22 +03:00
Diego Devesa
3a077146a4
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources ( #14013 )
2025-06-05 11:57:42 +02:00
Olexandr88
d01d112abb
readme : add badge ( #13938 )
2025-06-05 10:50:55 +03:00
Sigbjørn Skjæret
9f47fa5792
vocab : warn about missing mask token ( #14022 )
2025-06-05 09:29:18 +02:00
Georgi Gerganov
9e31bec4fd
context : fix pos_min initialization upon error decode ( #14008 )
...
ggml-ci
2025-06-05 09:06:29 +03:00
Jeff Bolz
5a8ae3053c
vulkan: automatically deduce size of push constants ( #13936 )
2025-06-05 07:17:58 +02:00
Concedo
bc89b465a8
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# .github/workflows/server.yml
# README.md
# docs/build.md
# docs/install.md
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
2025-06-05 11:03:34 +08:00
Concedo
a341188f84
add install for vs2019
2025-06-05 10:32:57 +08:00
Concedo
f6bbc350f2
various qol fixes
2025-06-05 10:26:02 +08:00
Concedo
a74d8669b3
try hardcoded path (+1 squashed commits)
...
Squashed commits:
[711b43d9d] let's see if VS2019 can work
2025-06-05 10:26:02 +08:00
Ervin Áron Tasnádi
0d3984424f
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D ( #13813 )
...
* * ggml-vulkan: adds op CONV_TRANSPOSE_1D
* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D
* Missing barrier added to shader.
Number of additional tests reduced to 108.
* * Fixes typo in variable name.
* Removes extra whitespaces.
* Adds int64->int32 casts to prevent possible warnings.
* Problem size reduced in tests to pass tests with llvmpipe.
* supports_op condition moved from unintended position
2025-06-04 22:02:00 +02:00
Georgi Gerganov
3e63a58ef7
kv-cache : refactor the update/defrag mechanism ( #13988 )
...
* kv-cache : refactor update mechanism
ggml-ci
* memory : improve status handling
* defrag : reset head + add comments
ggml-ci
* cont : minor fixes
ggml-ci
2025-06-04 18:58:20 +03:00
Concedo
736030bb9f
save and load state upgraded to 3 available states
2025-06-04 22:09:40 +08:00
Diego Devesa
2589ad3704
ci : remove cuda 11.7 releases, switch runner to windows 2022 ( #13997 )
2025-06-04 15:37:40 +02:00
Concedo
06d2bc3404
ollama compat fixes
2025-06-04 19:22:29 +08:00
Diego Devesa
482548716f
releases : use dl backend for linux release, remove arm64 linux release ( #13996 )
2025-06-04 13:15:54 +02:00
Concedo
2fdb0acd59
slightly clean up cmake file (+3 squashed commit)
...
Squashed commit:
[e050f83db] Revert "test cu11 build on win2022"
This reverts commit 1bf989f2b3789c99aa9883cfe70550de6c26db23.
[1bf989f2b] test cu11 build on win2022
[5dc94eae8] updated lite
2025-06-04 18:42:07 +08:00
Xuan-Son Nguyen
3ac67535c8
llama-graph : use ggml_repeat_4d ( #13998 )
2025-06-04 10:11:26 +02:00
Johannes Gäßler
0b4be4c435
CUDA: fix FTZ in FA for Gemma 3 ( #13991 )
2025-06-04 08:57:05 +02:00
Georgi Gerganov
e0e806f52e
kv-cache : fix unified::seq_rm to work with seq_id < 0 ( #13985 )
...
ggml-ci
2025-06-04 09:50:32 +03:00
Jeff Bolz
7e00e60ef8
vulkan: fix warnings in perf logger querypool code ( #13937 )
2025-06-03 20:30:22 +02:00
Concedo
53f1511396
use a static buffer for kv reloads instead. also, added into lite ui
2025-06-03 22:32:46 +08:00
Xuan-Son Nguyen
ea1431b0fa
docs : add "Quick start" section for new users ( #13862 )
...
* docs : add "Quick start" section for non-technical users
* rm flox
* Update README.md
2025-06-03 13:09:36 +02:00
Concedo
4b57108508
Save KV State and Load KV State to memory added. GUI not yet updated
2025-06-03 17:46:29 +08:00
lhez
71e74a3ac9
opencl: add backend_synchronize ( #13939 )
...
* This is not needed by the normal use where the result is read
using `tensor_get`, but it allows perf mode of `test-backend-ops`
to properly measure performance.
2025-06-02 16:54:58 -07:00
rmatif
bfb1e012a0
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat ( #13840 )
...
* add concat, pad, repeat, tsembd, tanh, upscale
* small fixes
2025-06-02 16:53:36 -07:00