Commit graph

11039 commits

Author SHA1 Message Date
Concedo
76ef726ec8 adaptive p sharpness to 10.0f 2025-12-31 17:28:30 +08:00
Concedo
20ea081594 updated lite (+3 squashed commit)
Squashed commit:

[605fef9ca] updated lite

[dad606fad] updated sdui

[22246d7eb] updated lite
2025-12-30 22:38:56 +08:00
Concedo
329c0e7e32 mini qol to prevent fake tool calls 2025-12-29 17:54:27 +08:00
Concedo
0e26e4d354 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/011-bug-results.yml
#	.github/ISSUE_TEMPLATE/019-bug-misc.yml
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-rpc/ggml-rpc.cpp
2025-12-28 23:47:55 +08:00
Concedo
58d8635827 fixed autofit 2025-12-28 23:15:06 +08:00
Concedo
82d562ad7b unstable merge 2025-12-28 23:03:03 +08:00
Concedo
9082403a43 disable vk events until directio pr or jeff's fix is added. (+1 squashed commits)
Squashed commits:

[4796db21a] disable vk events until directio pr or jeff's fix is added.
2025-12-28 21:54:25 +08:00
Concedo
a94d5ffbec Revert "Triage: revert https://github.com/ggml-org/llama.cpp/pull/18047 and https://github.com/ggml-org/llama.cpp/pull/18302"
This reverts commit dfa1b72d2f.
2025-12-28 21:48:55 +08:00
Concedo
4c1daf886a updated lite 2025-12-28 21:43:18 +08:00
Concedo
07fb18a04b handle case differences 2025-12-28 21:41:56 +08:00
Aman Gupta
07a0c4ba92
Revert "ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATIVE=ON (#18413)" (#18426) 2025-12-28 20:53:36 +08:00
o7si
60f17f56da
rpc: fix segfault on invalid endpoint format (#18387)
* rpc: fix segfault on invalid endpoint format

* rpc: add error log for failed endpoint connection
2025-12-28 12:34:41 +02:00
Concedo
46891b3c0a updated lite 2025-12-28 18:07:13 +08:00
Johannes Gäßler
f8d561eb87
llama-fit-params: fix step size for last device (#18415) 2025-12-28 10:52:09 +01:00
Johannes Gäßler
e59efe6a78
github: update issue templates [no ci] (#18410)
* github: update issue templates [no ci]

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-28 10:50:56 +01:00
Xuan-Son Nguyen
cffa5c46ea
mtmd: clarify that we no longer accept AI-generated PRs (#18406) 2025-12-28 09:57:04 +01:00
Boian Berberov
94de74e7b1
cmake: Added more x86_64 CPU backends when building with GGML_CPU_ALL_VARIANTS=On (#18186)
* minor: Consolidated `#include <immintrin.h>` under `ggml-cpu-impl.h`

* cmake: Added more x86-64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On`

- `ivybridge`
- `piledriver`
- `cannonlake`
- `cascadelake`
- `cooperlake`
- `zen4`

Resolves: #17966
2025-12-28 09:33:29 +02:00
Concedo
21d801f6d5 init total weight for adaptive p 2025-12-28 15:33:06 +08:00
Concedo
ec95655f3c fixed default handling for special keys 2025-12-28 13:56:05 +08:00
Concedo
27261bfc26 adaptive decay as an overridable param (+1 squashed commits)
Squashed commits:

[d94df7843] adaptive decay as an overridable param
2025-12-28 13:34:20 +08:00
QDelta
4fd59e8427
ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when GGML_NATIVE=ON (#18413) 2025-12-28 09:33:14 +08:00
lhez
08566977a7
opencl: allow resizing transpose buffers (#18384)
* opencl: allow resizing transpose buffers instead of using fixed sizes

* opencl: remove commented code
2025-12-27 15:51:14 -08:00
Johannes Gäßler
a4bf35889e
llama-fit-params: fix overflow check (#18354) 2025-12-27 20:20:45 +01:00
Johannes Gäßler
026d2ad472
llama: fix magic number of 999 for GPU layers (#18266)
* llama: fix magic number of 999 for GPU layers

* use strings for -ngl, -ngld

* enacapsulate n_gpu_layers, split_mode
2025-12-27 20:18:35 +01:00
Concedo
1051313cb2 added deprecated item sdgendefaults (+1 squashed commits)
Squashed commits:

[efc14a5d9] fixed sd error
2025-12-27 22:47:43 +08:00
Aman Gupta
06705fdcb3
ggml-cuda: Use same regex for GGML_NATIVE=OFF (#18407) 2025-12-27 19:56:27 +08:00
Concedo
f5282e114d allow ANY api field to have specified defaults, and to be overwritten by value specified at load time 2025-12-27 18:57:04 +08:00
Concedo
6548645aaa rename power law sampler to adaptive p 2025-12-27 17:50:58 +08:00
Johannes Gäßler
a52dc60ba3
llama_fit_params: return enum for fail vs. error (#18374) 2025-12-27 09:59:19 +01:00
Johannes Gäßler
9045c9afe5
llama-fit-params: fix Gemma 3 calculation (#18372) 2025-12-27 09:56:04 +01:00
Concedo
445aad5e00 remove sdcpp qwen image lora hack 2025-12-27 16:31:29 +08:00
Wagner Bruna
84765f5967
sd: sync to master-447-ccb6b0a (#1898)
* sd: sync to master-438-298b110

* sd: sync to master-440-3e81246

* sd: sync to master-444-a0adcfb

* sd: sync to master-447-ccb6b0a
2025-12-27 16:30:52 +08:00
Concedo
9bb362cce9 revised power law sampling 2025-12-27 10:59:46 +08:00
Concedo
91d8863f18 power law sampler added 2025-12-27 09:46:06 +08:00
Jeff Bolz
c9ced4910b
vulkan: preprocess mul_mat_id experts and discard workgroups more quickly (#18352)
Run a preprocess to count how many times each expert is used, and use this to
quickly discard workgroups that aren't needed.
2025-12-26 16:12:58 -06:00
Jeff Bolz
7ac8902133
vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader (#18349)
* vulkan: Use BK=32 for coopmat2 mul_mat_id

* vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader

Disable robustness, remove the OOB check in decodeFuncB, and initialize the
row_ids to zero to avoid OOB access.

Don't slice/offset the B matrix to ic * BN, only to adjust the coord back down
to the range [0, BN) in decodeFuncB. Instead just slice with a row offset of
zero and remove the '& (BN - 1)'. This allows the compiler to common some of
the shared memory loads.
2025-12-26 18:15:50 +01:00
Jeff Bolz
9bf20d8ac3
vulkan: Use BK=32 for coopmat2 mul_mat_id (#18332) 2025-12-26 18:15:02 +01:00
Eve
cb999704fb
vulkan: small dequantization improvements (#18380)
* iq4_xs

* quants
2025-12-26 18:12:11 +01:00
Jeff Bolz
b96b82fc85
vulkan: Support UPSCALE w/antialias (#18327) 2025-12-26 17:00:57 +01:00
Jeff Bolz
10dc500bdb
vulkan: handle rope with large number of rows (#18306) 2025-12-26 16:53:46 +01:00
o7si
4893cc07bb
server : fix crash when seq_rm fails for hybrid/recurrent models (#18391)
* server : fix crash when seq_rm fails for hybrid/recurrent models

* server : add allow_processing param to clear_slot
2025-12-26 16:35:29 +01:00
Francisco Herrera
af3be131c0
docs: added note for pre SYCL Intel hardware (#18016)
Specify that it's for pre sycl hardware
2025-12-26 10:34:30 +08:00
0Marble
b07cda687c
CANN: implement the SSM_CONV operator (#17737)
* CANN: implement SSM_CONV operator

Co-authored-by: Aleksei Lobanov, <zeromarblectm@gmail.com>
Co-authored-by: Sujin Kang, <waterjin326@gmail.com>

* CANN: remove custom error limit for SSM_CONV

* CANN: merge SSM_CONV tensor shape/strides into one line

---------

Co-authored-by: Sujin Kang, <waterjin326@gmail.com>
2025-12-26 09:12:04 +08:00
Aman Gupta
85c40c9b02
ggml-cuda: fix regex for arch list (#18371)
* ggml-cuda: fix regex for arch list

* make regex exact
2025-12-26 01:35:14 +08:00
Concedo
dfa1b72d2f Triage: revert https://github.com/ggml-org/llama.cpp/pull/18047 and https://github.com/ggml-org/llama.cpp/pull/18302
Revert "vulkan: Implement set_tensor_async and the event interfaces (#18047)"

This reverts commit e1f15b454f. (+1 squashed commits)

Squashed commits:

[3cfbc7b1a] Revert "vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (#18302)"

This reverts commit 2a9ea2020c.
2025-12-26 01:20:31 +08:00
Concedo
399fc9c57e rename tokens tab to context, move fa to hardware 2025-12-26 00:06:07 +08:00
Aman Gupta
83b3b1c271
cuda: optimize cumsum cub path (#18362)
Some checks failed
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* cuda: optimize cumsum cub path

* remove heavy perf test
2025-12-25 23:55:38 +08:00
Concedo
062f8b28eb fixed sdui gen queue 2025-12-25 23:21:33 +08:00
Aman Gupta
b0fb0f0aee
ggml-cuda: fix blackwell native builds (#18361)
* ggml-cuda: fix blackwell native builds

Replace 12x in native architectures by 12xa

* replace for GGML_NATIVE=OFF too

* only replace for native

* remove 120f-virtual for default compilation

---------

Co-authored-by: Aman Gupta <aman>
2025-12-25 22:12:11 +08:00
Concedo
cf4201e213 wip power law sampling 2025-12-25 22:01:16 +08:00