Reithan
f1c9db4174
fix-loss-of-destroyed-tokens-in-grammar-pre-pass ( #1600 )
2025-06-13 18:46:38 +08:00
Concedo
5bac0fb3d5
remove debug prints for now, they were kind of cluttered
2025-06-13 16:00:23 +08:00
Reithan
5af9138ebe
Improve GNBF performance by attempting culled grammar search first ( #1597 )
...
* cull tokens with top_3k first before running grammar, fallback to unculled if none found
* fix errors
* fix improvement and test against concedo's GBNF
* revert non-culling changes
2025-06-13 15:57:27 +08:00
Concedo
1cbe716e45
allow setting maingpu
2025-06-12 17:53:43 +08:00
Concedo
7a688e07cd
remove gfx12 until amd wakes up
2025-06-12 16:52:55 +08:00
Concedo
1970d8c9e8
uvos said it might work
2025-06-12 16:44:46 +08:00
Concedo
5cdb2d3fc6
cleanup
2025-06-11 01:35:40 +08:00
henk717
f151648f03
Pyinstaller launcher and dependency updates
...
This PR adds a new launcher executable to the unpack feature, eliminating the need to have python and its dependencies in the unpacked version. It also does a few dependency changes to help future proof.
2025-06-10 23:08:02 +08:00
Concedo
8386546e08
Switched VS2019 for revert cu12.1 build, hopefully solves dll issues
...
try change order (+3 squashed commit)
Squashed commit:
[457f02507] try newer jimver
[64af28862 ] windows pyinstaller shim. the final loader will be moved into the packed directory later.
[0272ecf2d ] try alternative way of getting cuda toolkit 12.4 since jimver wont work, also fix rocm
try again (+3 squashed commit)
Squashed commit:
[133e81633] try without pwsh
[4d99cefba] try without pwsh
[bdfa91e7d] try alternative way of getting cuda toolkit 12.4, also fix rocm
2025-06-10 23:08:02 +08:00
Concedo
28b35ca879
allow wmma flag for rocm
2025-06-10 01:23:48 +08:00
Concedo
7d8aa31f1f
fixed embeddings, added new parameter to limit max embeddings context
2025-06-10 01:11:55 +08:00
Concedo
8780b33c64
consolidate imports
2025-06-09 17:48:54 +08:00
Concedo
deece4be69
missed a build target
2025-06-09 17:05:56 +08:00
Concedo
68ec00909b
updated lite (+1 squashed commits)
...
Squashed commits:
[375c5768b] updated lite
2025-06-09 16:33:42 +08:00
Concedo
82d7c53b85
embeddings handle base64
2025-06-09 00:26:40 +08:00
Concedo
7de88802f9
revert padding change for sd chroma
2025-06-08 23:48:46 +08:00
Concedo
1cf7648305
fixed adapter
2025-06-08 23:24:11 +08:00
Concedo
771bd7197b
updated lite (+1 squashed commits)
...
Squashed commits:
[907f10f2f] updated lite
2025-06-08 23:22:26 +08:00
Concedo
6c5c8be48d
try to make rocm work for the github ci, requires disabling rocwmma
2025-06-08 21:52:29 +08:00
Concedo
7f57846c2f
update bundled vcrts
2025-06-08 19:39:42 +08:00
Concedo
2d4c1aa5a0
chroma support is now usable
2025-06-08 18:53:59 +08:00
Concedo
30cf433ab4
merge base support for chroma, however its not working correctly
2025-06-08 18:06:23 +08:00
Concedo
dcf88d6e78
Revert "make tts use gpu by default. use --ttscpu to disable"
...
This reverts commit 669f80265b .
2025-06-08 17:08:04 +08:00
Concedo
669f80265b
make tts use gpu by default. use --ttscpu to disable
2025-06-08 17:06:19 +08:00
Concedo
7132d6b15c
test rocm rolling (+1 squashed commits)
...
Squashed commits:
[43c8f7fc6] test rocm rolling (+4 squashed commit)
Squashed commit:
[16a60aa77] test clobber 4
[a6c866450] test clobber 3
[9322f17f6] test clobber 2
[b7a420cbe] testing clobber
2025-06-08 15:33:05 +08:00
henk717
5d8f499f03
Remove 32GB of rocm dependencies with this one special trick ( #1585 )
...
* One file to remove them all
* That one lib wasn't versioned
2025-06-08 11:16:15 +08:00
Concedo
a80dfa5c10
various minor fixes
2025-06-08 01:11:42 +08:00
Concedo
301450b1eb
attempt to use system glslc first before using bundled glslc
2025-06-07 16:54:25 +08:00
Concedo
38ce7e06cc
updated readme
2025-06-07 10:23:41 +08:00
Concedo
cfcdfd69bd
allow embeddings models to use mmap
2025-06-07 10:14:00 +08:00
Concedo
abc272d89f
breaking change: standardize ci binary names
2025-06-07 00:40:46 +08:00
Concedo
6effb65cfe
change singleinstance order
2025-06-06 21:20:30 +08:00
Concedo
d18938fc70
fixed build
2025-06-06 18:05:44 +08:00
Concedo
d33c88b1f4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# ci/run.sh
# examples/embedding/embedding.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# src/CMakeLists.txt
2025-06-06 17:56:51 +08:00
Concedo
2b5d8e467b
updated lite
2025-06-06 17:49:56 +08:00
Concedo
740f91e3fd
lower aria interval
2025-06-06 17:43:38 +08:00
Concedo
8b141d8647
stick to cu12.1 for linux for now
2025-06-06 17:38:28 +08:00
Sigbjørn Skjæret
d17a809ef0
llama : support multiple classifier outputs and labels ( #13940 )
2025-06-06 09:03:25 +02:00
Concedo
9cf32e5fee
step limits over adapter for sd
2025-06-06 14:12:43 +08:00
Concedo
5f38594dc0
remove debug prints
2025-06-06 14:08:57 +08:00
Concedo
ca99f79ea9
cu11 just always stick to wmma
2025-06-06 14:02:34 +08:00
Concedo
eec5a8ad16
breaking change: due to cuda12 upgrade, release filenames will change. standardize them to windows naming for the future. (+1 squashed commits)
...
Squashed commits:
[75842919a] cuda12.4 test
2025-06-06 14:02:34 +08:00
Concedo
50a27793d3
upgrade windows runners to windows 2022, cu11 still uses vs2019
...
this should finally work (+21 squashed commit)
Squashed commit:
[5edac5b59] Revert "quick dbg"
This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f.
[fd62a997c] quick dbg
[bcccae7e6] sanity check 2
[568e2eb08] sanity check
[2f30d573a] please work 2
[cf8765221] please work
[c535e60d9] try a small trick
[d4ba79b80] 2022 test
[3f146b000] t2
[4a3b9a9b4] revert and test
[4bdc9a149] reverted test2
[5081cb4a3] reverted test
[ea9a826f3] broken test
[3c11ae389] compare 2019
[8ecec4fec] not for cu12
[0be964f3a] added vs2019 for the other runners
[5d24641cb] debugging 4
[1dee79207] debugging 3
[ab172f133] more debugging 2
[b1a895e84] more debugging
[5d21d8bd0] vs2019 setup
2025-06-06 14:02:34 +08:00
Sigbjørn Skjæret
1caae7fc6c
gguf-py : add add_classifier_output_labels method to writer ( #14031 )
...
* add add_classifier_output_labels
* use add_classifier_output_labels
2025-06-05 17:42:31 +02:00
Masato Nakasaka
669c13e0f6
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs ( #14001 )
...
* allowing B580 and U9-288V
* experimenting code to detect Xe2
* allowing coopmat only for Xe2 GPUs
* fixed comment wording
* fixed comment wording
* removed unnecessary driver check
2025-06-05 16:00:29 +02:00
pockers21
146b88e8b3
ci: fix CUDA build failure on autodl cloud machines ( #14005 )
...
Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.
Co-authored-by: pockers21 <liyang2@uniontech.com>
2025-06-05 16:25:29 +03:00
Georgi Gerganov
7f37b6cf1e
memory : migrate from llama_kv_cache to more generic llama_memory ( #14006 )
...
* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API
ggml-ci
* context : fix casts
ggml-ci
2025-06-05 15:29:22 +03:00
Diego Devesa
3a077146a4
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources ( #14013 )
2025-06-05 11:57:42 +02:00
Olexandr88
d01d112abb
readme : add badge ( #13938 )
2025-06-05 10:50:55 +03:00
Sigbjørn Skjæret
9f47fa5792
vocab : warn about missing mask token ( #14022 )
2025-06-05 09:29:18 +02:00