Concedo
962daa1fdc
updated lite (+1 squashed commits)
...
Squashed commits:
[8da24c85b ] updated lite
2025-08-11 21:41:28 +08:00
Concedo
30e2f25c05
alias tensorsplit , fixed python error
2025-08-10 22:38:14 +08:00
Concedo
300e20be6c
allow termux to launch existing downloaded models
2025-08-10 21:29:51 +08:00
Concedo
8e6d27f629
handle if assistant_message_gen and assistant_message_gen!=assistant_message_start, replace final output tag with unspaced (gen) version if exists
2025-08-10 16:51:34 +08:00
kallewoof
204739e7f1
Adapter fixes ( #1659 )
...
* test adapters
* add assistant_gen adapter key
* add support for chat templates stored as .jinja files
* removed mistakenly commited gated-tokenizers link
* autoguess: Harmony: add missing newline prefixes to system_end
2025-08-10 16:19:50 +08:00
Concedo
57db0ce9cd
allow uploading tagged pinned versions for rocm
2025-08-10 11:04:49 +08:00
Concedo
1515d67c2c
oldpc build is now fixed (+2 squashed commit)
...
Squashed commit:
[d11ac6cef] temp test
[cfbc008b1] test no f16 as well
2025-08-10 10:52:45 +08:00
Concedo
89266ac6b8
autoguess adapter make case insensitive
2025-08-10 00:58:47 +08:00
Concedo
487d509b44
try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95 (+1 squashed commits)
...
Squashed commits:
[940f0c639] try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95
2025-08-10 00:10:37 +08:00
Concedo
4c1faf61b2
increment version (+1 squashed commits)
...
Squashed commits:
[6e5080ad2] increment version
2025-08-09 20:53:26 +08:00
Concedo
0fb25bb165
Merge branch 'upstream' into concedo_experimental
2025-08-09 20:31:36 +08:00
Concedo
5f95fc1122
update lite
2025-08-09 20:31:15 +08:00
Aman Gupta
34c9d765bf
CUDA: add attention sinks for tile and wmma ( #15178 )
...
* CUDA: add attention sinks for tile and wmma
* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma
2025-08-09 20:00:24 +08:00
Concedo
ced98823a1
kai api tool calling
2025-08-09 10:51:10 +08:00
Concedo
4c7b82e982
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# scripts/server-bench.py
2025-08-09 10:34:24 +08:00
Concedo
fc551470d4
updated lite
2025-08-09 10:33:59 +08:00
compilade
e54d41befc
gguf-py : add Numpy MXFP4 de/quantization support ( #15111 )
...
* gguf-py : add MXFP4 de/quantization support
* ggml-quants : handle zero amax for MXFP4
2025-08-08 17:48:26 -04:00
Johannes Gäßler
4850b52aed
server-bench: external OAI servers, sqlite ( #15179 )
...
* server-bench: external OAI servers, sqlite
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update scripts/server-bench.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* raise_for_status
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-08-08 23:04:36 +02:00
Concedo
9e7a940ce4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/softmax_4_f16.cl
# ggml/src/ggml-opencl/kernels/softmax_4_f32.cl
# ggml/src/ggml-opencl/kernels/softmax_f16.cl
# ggml/src/ggml-opencl/kernels/softmax_f32.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
2025-08-09 01:24:52 +08:00
Concedo
7087aeb4bc
anti bsod only for nvidia
2025-08-09 01:23:38 +08:00
Concedo
67e0072245
fixed clblast repacking
2025-08-09 01:08:02 +08:00
Concedo
3468c2834d
fixed adv mode
2025-08-08 22:26:36 +08:00
kallewoof
866cc346ab
tweak OpenAI Harmony autoguess developer prefix and assistant end token ( #1673 )
...
* tweak OpenAI Harmony autoguess developer prefix
* use <|end|> for adapter end
2025-08-08 21:15:11 +08:00
AN Long
cd6983d56d
ggml : fix field name when new ggml_backend ( #14944 )
2025-08-08 14:37:22 +02:00
Olivier Chafik
6c7e9a5440
vendor: sync minja ( #15161 )
...
* vendor: sync minja
* Update minja.hpp
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-08-08 10:45:18 +01:00
Johannes Gäßler
1425f587a8
CUDA: attention sinks for mma FlashAttention ( #15157 )
2025-08-08 08:19:58 +02:00
lhez
aaa3d07ae7
opencl: support sink in soft_max (attn sinks) ( #15152 )
2025-08-07 21:47:03 -07:00
Concedo
d5b5e79035
should fix vulkan bsod
2025-08-08 10:57:50 +08:00
Wagner Bruna
eed5577aaa
fix unintended sd model quantization ( #1672 )
...
The recent ggml update added another quant type, GGML_TYPE_MXFP4,
which got the same value as SD_TYPE_COUNT. That made the embedded
sd.cpp quantize to GGML_TYPE_MXFP4 by default.
Photomaker in particular ends up crashing due to
"Missing CPY op for types: f32 mxfp4".
2025-08-08 10:19:58 +08:00
Xuan-Son Nguyen
50aa938901
convert : support non-mxfp4 HF model ( #15153 )
...
* convert : support non-mxfp4 HF model
* rm redundant check
* disable debug check
2025-08-07 23:26:03 +02:00
Jeff Bolz
c4f53563df
vulkan: support fattn sinks ( #15126 )
2025-08-07 22:44:20 +02:00
Jeff Bolz
a0552c8bee
vulkan: Add env var to disable host visible vidmem ( #15109 )
2025-08-07 22:07:11 +02:00
RunningLeon
99acbc9921
llama : Support intern-s1 ( #14875 )
...
* support internvl
* support interns1
* resolve comments
* put interns1 in tensor mapping
* resolve comment
* move tokenizer changes to sub class
2025-08-07 18:20:40 +02:00
Concedo
8f15461bea
updated lite
2025-08-08 00:04:47 +08:00
uvos
7ad67ba9fe
HIP: add cmake option to enable compiler output of kernel resource usage metrics ( #15103 )
2025-08-07 16:44:14 +02:00
Concedo
8a71eb03c0
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# ggml/cmake/ggml-config.cmake.in
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/fattn.cu
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# requirements/requirements-convert_hf_to_gguf.txt
# scripts/compare-llama-bench.py
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
2025-08-07 21:23:09 +08:00
Concedo
338b1fe97e
readjusted mistral and oai template, fixed compile issue on termux, updated lite, show generated token ids in debug mode
2025-08-07 21:14:48 +08:00
Christian Kastner
9a96389544
ggml: Skip backend library linking code when GGML_BACKEND_DL=ON ( #15094 )
...
Any available libraries are found and loaded dynamically at runtime.
2025-08-07 13:45:41 +02:00
Johannes Gäßler
1d72c84188
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 ( #15131 )
...
* CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16
2025-08-07 10:53:21 +02:00
Johannes Gäßler
20638e4f16
scripts: fix crash when --tool is not set ( #15133 )
2025-08-07 08:50:30 +02:00
Daniel Bevenius
36d3f00e14
requirements : fix PyTorch uint64 compatibility ( #15134 )
...
This commit addresses an issue with the convert_hf_to_gguf script
which is currently failing with:
```console
AttributeError: module 'torch' has no attribute 'uint64'
```
This occurred because safetensors expects torch.uint64 to be available
in the public API, but PyTorch 2.2.x only provides limited support for
unsigned types beyond uint8 it seems. The torch.uint64 dtype exists but
is not exposed in the standard torch namespace
(see pytorch/pytorch#58734 ).
PyTorch 2.4.0 properly exposes torch.uint64 in the public API, resolving
the compatibility issue with safetensors. This also required torchvision
to updated to =0.19.0 for compatibility.
Refs: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/186#68938de803e47d990aa087fb
Refs: https://github.com/pytorch/pytorch/issues/58734
2025-08-07 05:31:48 +02:00
Reese Levine
5fd160bbd9
ggml: Add basic SET_ROWS support in WebGPU ( #15137 )
...
* Begin work on set_rows
* Work on set rows
* Add error buffers for reporting unsupported SET_ROWS indices
* Remove extra comments
2025-08-06 15:14:40 -07:00
rmatif
756cfea826
fix profiling crash ( #15072 )
2025-08-06 14:17:51 -07:00
lhez
e725a1a982
opencl: add swiglu_oai and add_id ( #15121 )
...
* opencl: add `swiglu-oai`
* opencl: add `add_id`
* opencl: add missing `add_id.cl`
2025-08-06 12:12:17 -07:00
Sachin Desai
3db4da56a5
chat : support Granite model reasoning and tool call ( #14864 )
2025-08-06 20:27:30 +02:00
Juk Armstrong
476aa3fd57
Fixed name -override-tensors to -override-tensor ( #15129 )
2025-08-06 17:28:48 +01:00
Diego Devesa
0d8831543c
ggml : fix fallback to CPU for ununsupported ops ( #15118 )
2025-08-06 14:37:35 +02:00
Sigbjørn Skjæret
65c797c4fa
chat : fix yandex chat template ( #15116 )
2025-08-06 13:26:49 +02:00
stevenkuang
25726898e8
chat : fix hunyuan auto-detection ( #15114 )
...
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
2025-08-06 11:48:30 +02:00
Concedo
61c19fea56
fixed glm4 sop, lower regex max stacks (+2 squashed commit)
...
Squashed commit:
[47e39ae5d] lower regex max stack again
[0a32ca232] lower regex max stack again
2025-08-06 17:10:57 +08:00