Concedo
82d562ad7b
unstable merge
2025-12-28 23:03:03 +08:00
Concedo
9082403a43
disable vk events until directio pr or jeff's fix is added. (+1 squashed commits)
...
Squashed commits:
[4796db21a] disable vk events until directio pr or jeff's fix is added.
2025-12-28 21:54:25 +08:00
Concedo
a94d5ffbec
Revert "Triage: revert https://github.com/ggml-org/llama.cpp/pull/18047 and https://github.com/ggml-org/llama.cpp/pull/18302 "
...
This reverts commit dfa1b72d2f .
2025-12-28 21:48:55 +08:00
Concedo
4c1daf886a
updated lite
2025-12-28 21:43:18 +08:00
Concedo
07fb18a04b
handle case differences
2025-12-28 21:41:56 +08:00
Concedo
46891b3c0a
updated lite
2025-12-28 18:07:13 +08:00
Concedo
21d801f6d5
init total weight for adaptive p
2025-12-28 15:33:06 +08:00
Concedo
ec95655f3c
fixed default handling for special keys
2025-12-28 13:56:05 +08:00
Concedo
27261bfc26
adaptive decay as an overridable param (+1 squashed commits)
...
Squashed commits:
[d94df7843] adaptive decay as an overridable param
2025-12-28 13:34:20 +08:00
Concedo
1051313cb2
added deprecated item sdgendefaults (+1 squashed commits)
...
Squashed commits:
[efc14a5d9] fixed sd error
2025-12-27 22:47:43 +08:00
Concedo
f5282e114d
allow ANY api field to have specified defaults, and to be overwritten by value specified at load time
2025-12-27 18:57:04 +08:00
Concedo
6548645aaa
rename power law sampler to adaptive p
2025-12-27 17:50:58 +08:00
Johannes Gäßler
9045c9afe5
llama-fit-params: fix Gemma 3 calculation ( #18372 )
2025-12-27 09:56:04 +01:00
Concedo
445aad5e00
remove sdcpp qwen image lora hack
2025-12-27 16:31:29 +08:00
Wagner Bruna
84765f5967
sd: sync to master-447-ccb6b0a ( #1898 )
...
* sd: sync to master-438-298b110
* sd: sync to master-440-3e81246
* sd: sync to master-444-a0adcfb
* sd: sync to master-447-ccb6b0a
2025-12-27 16:30:52 +08:00
Concedo
9bb362cce9
revised power law sampling
2025-12-27 10:59:46 +08:00
Concedo
91d8863f18
power law sampler added
2025-12-27 09:46:06 +08:00
Jeff Bolz
c9ced4910b
vulkan: preprocess mul_mat_id experts and discard workgroups more quickly ( #18352 )
...
Run a preprocess to count how many times each expert is used, and use this to
quickly discard workgroups that aren't needed.
2025-12-26 16:12:58 -06:00
Jeff Bolz
7ac8902133
vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader ( #18349 )
...
* vulkan: Use BK=32 for coopmat2 mul_mat_id
* vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader
Disable robustness, remove the OOB check in decodeFuncB, and initialize the
row_ids to zero to avoid OOB access.
Don't slice/offset the B matrix to ic * BN, only to adjust the coord back down
to the range [0, BN) in decodeFuncB. Instead just slice with a row offset of
zero and remove the '& (BN - 1)'. This allows the compiler to common some of
the shared memory loads.
2025-12-26 18:15:50 +01:00
Jeff Bolz
9bf20d8ac3
vulkan: Use BK=32 for coopmat2 mul_mat_id ( #18332 )
2025-12-26 18:15:02 +01:00
Eve
cb999704fb
vulkan: small dequantization improvements ( #18380 )
...
* iq4_xs
* quants
2025-12-26 18:12:11 +01:00
Jeff Bolz
b96b82fc85
vulkan: Support UPSCALE w/antialias ( #18327 )
2025-12-26 17:00:57 +01:00
Jeff Bolz
10dc500bdb
vulkan: handle rope with large number of rows ( #18306 )
2025-12-26 16:53:46 +01:00
o7si
4893cc07bb
server : fix crash when seq_rm fails for hybrid/recurrent models ( #18391 )
...
* server : fix crash when seq_rm fails for hybrid/recurrent models
* server : add allow_processing param to clear_slot
2025-12-26 16:35:29 +01:00
Francisco Herrera
af3be131c0
docs: added note for pre SYCL Intel hardware ( #18016 )
...
Specify that it's for pre sycl hardware
2025-12-26 10:34:30 +08:00
0Marble
b07cda687c
CANN: implement the SSM_CONV operator ( #17737 )
...
* CANN: implement SSM_CONV operator
Co-authored-by: Aleksei Lobanov, <zeromarblectm@gmail.com>
Co-authored-by: Sujin Kang, <waterjin326@gmail.com>
* CANN: remove custom error limit for SSM_CONV
* CANN: merge SSM_CONV tensor shape/strides into one line
---------
Co-authored-by: Sujin Kang, <waterjin326@gmail.com>
2025-12-26 09:12:04 +08:00
Aman Gupta
85c40c9b02
ggml-cuda: fix regex for arch list ( #18371 )
...
* ggml-cuda: fix regex for arch list
* make regex exact
2025-12-26 01:35:14 +08:00
Concedo
dfa1b72d2f
Triage: revert https://github.com/ggml-org/llama.cpp/pull/18047 and https://github.com/ggml-org/llama.cpp/pull/18302
...
Revert "vulkan: Implement set_tensor_async and the event interfaces (#18047 )"
This reverts commit e1f15b454f . (+1 squashed commits)
Squashed commits:
[3cfbc7b1a] Revert "vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (#18302 )"
This reverts commit 2a9ea2020c .
2025-12-26 01:20:31 +08:00
Concedo
399fc9c57e
rename tokens tab to context, move fa to hardware
2025-12-26 00:06:07 +08:00
Aman Gupta
83b3b1c271
cuda: optimize cumsum cub path ( #18362 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* cuda: optimize cumsum cub path
* remove heavy perf test
2025-12-25 23:55:38 +08:00
Concedo
062f8b28eb
fixed sdui gen queue
2025-12-25 23:21:33 +08:00
Aman Gupta
b0fb0f0aee
ggml-cuda: fix blackwell native builds ( #18361 )
...
* ggml-cuda: fix blackwell native builds
Replace 12x in native architectures by 12xa
* replace for GGML_NATIVE=OFF too
* only replace for native
* remove 120f-virtual for default compilation
---------
Co-authored-by: Aman Gupta <aman>
2025-12-25 22:12:11 +08:00
Concedo
cf4201e213
wip power law sampling
2025-12-25 22:01:16 +08:00
Penglin Cai
e68c19b0fd
CANN: Add support for CONV_TRANSPOSE_1D when kernel size > 255 ( #17934 )
...
* CONV_TRANSPOSE_1D kernel_size>255
* remove condition check
* fix the bug of type conversion
* removing trailing whitespaces
* fix: return true in the switch case
2025-12-25 16:46:09 +08:00
Aadeshveer Singh
c54bba869d
ggml : optimize cuda cumsum fallback kernel ( #18343 )
2025-12-25 12:11:13 +08:00
Xuan-Son Nguyen
f5acfb2ffa
server: (router) add stop-timeout option ( #18350 )
...
* server: (router) add stop-timeout option
* also allow stop while loading
* add docs
* unload_lru: also wait for unload to complete
2025-12-24 23:47:49 +01:00
Xuan-Son Nguyen
4cbafad4f0
model: support MiMo-V2-Flash ( #18328 )
...
* mimov2: convert ok
* rename mimov2 --> mimo2
* fix conversion
* runnable not incorrect
* use sink
* add_sliding_window_pattern
* add swa and per-layer n_head_kv
* correct params
* somewhat working
* correct gating func
* nits
* mimo2: wire RMS eps + MoE bias + converter guards
* add co-author
Co-authored-by: Aaryan-Kapoor <Aaryan-Kapoor@users.noreply.github.com>
* use add_rope_freq_base_swa
---------
Co-authored-by: Aaryan Kapoor <aaryankapoor2006@gmail.com>
Co-authored-by: Aaryan-Kapoor <Aaryan-Kapoor@users.noreply.github.com>
2025-12-24 23:07:08 +01:00
Concedo
6cc71db85a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/backend/SYCL.md
# examples/model-conversion/Makefile
# examples/model-conversion/scripts/causal/run-org-model.py
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
2025-12-25 00:06:27 +08:00
Concedo
3589a5e136
Merge commit ' 12ee1763a6' into concedo_experimental
...
# Conflicts:
# docs/backend/hexagon/README.md
# docs/backend/hexagon/developer.md
# examples/gen-docs/gen-docs.cpp
# examples/model-conversion/scripts/embedding/run-original-model.py
# examples/model-conversion/scripts/utils/semantic_check.py
# examples/sycl/run-llama2.sh
# examples/sycl/run-llama3.sh
# examples/sycl/win-run-llama2.bat
# examples/sycl/win-run-llama3.bat
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp-utils.h
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/htp-dma.c
# ggml/src/ggml-hexagon/htp/htp-dma.h
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-opencl/kernels/transpose.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/snapdragon/adb/run-cli.sh
# src/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2025-12-24 23:57:41 +08:00
Concedo
afe41b6eea
Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental
2025-12-24 23:42:52 +08:00
Concedo
d1983959d2
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# AGENTS.md
# common/CMakeLists.txt
# docs/development/parsing.md
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-vulkan/ggml-vulkan.cpp
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
# tests/test-grammar-llguidance.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
# tools/batched-bench/batched-bench.cpp
# tools/cli/cli.cpp
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2025-12-24 23:42:28 +08:00
Aadeshveer Singh
c184284230
fit-params : fix race condition in fit-params output ( #18276 )
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
2025-12-24 15:57:38 +01:00
Aman Gupta
c8a2417d7b
CUDA: experimental native mxfp4 support for blackwell ( #17906 )
...
* CUDA: experimental native mxfp4 support for blackwell
* optimize load_tiles
* optimize quantize_mxfp4
* cleanup
* first pass review: formatting
* use interleaved layout for mma
* mmq: add assert for size
* use __nv_fp4x4_e2m1
* use iter_k as 512, cleanup
* Use 1200 as blackwell instead of 1000
* address review comments
* mmq: fix stride
* quantize.cu: use reference impl of e8m0 scale
* address review comments
* add 120f-virtual + minor fixes
---------
Co-authored-by: Aman Gupta <aman>
2025-12-24 22:28:26 +08:00
Wagner Bruna
f30da43b7f
sd: get the available schedulers directly from sd.cpp ( #1900 )
...
Avoids a hardcoded list on the Python side.
2025-12-24 21:55:24 +08:00
Saba Fallah
54132f1b1f
model : support for LlamaBidirectionalModel architecture ( #18220 )
...
* model: llama-embed-nemotron
* minor: python lint
* changed arch-name
* templated llm_build_llama to be used for both llama and llama-embed arch
2025-12-24 14:02:36 +01:00
Jeff Bolz
2a9ea2020c
vulkan: fix command buffer corruption in ggml_backend_vk_event_wait ( #18302 )
2025-12-24 12:36:34 +01:00
Concedo
26d89bf589
support for downloading AVI from sdui
2025-12-24 18:40:10 +08:00
Wang Weixuan
ce7a6dc0fc
CANN : refactor ACL graph cache ( #17752 )
...
Move the graph property checking code into methods of LRU cache.
Signed-off-by: Wang Weixuan <wangweixvan@gmail.com>
2025-12-24 17:50:24 +08:00
Jesse Ikonen
1ce0126b18
docs: Fix typos in SYCL documentation ( #18269 )
2025-12-24 17:19:47 +08:00
Ruben Ortlam
7f459c98e7
vulkan: use fewer FA rows for small cache runs ( #18280 )
2025-12-24 08:59:14 +01:00