Concedo
4d8a7a6594
fix occasional clip segfault, fix glm4 (+1 squashed commits)
...
Squashed commits:
[bd71cd688] GLM4 fix wip
2025-04-29 01:42:50 +08:00
Concedo
e659cadf48
more sanitization for user inputs
2025-04-28 15:01:50 +08:00
Concedo
94c2572cb5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/llama-bench/llama-bench.cpp
2025-04-28 14:42:57 +08:00
Concedo
a9bc1a2ee2
do not use shell true instead
2025-04-28 14:26:55 +08:00
4onen
c0a97b762e
llama-bench : Add --override-tensors arg ( #12922 )
...
* Add --override-tensors option to llama-bench
* Correct llama-bench --override-tensors to --override-tensor
* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.
* Make new llama-bench util functions static to fix Ubuntu CI
* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)
2025-04-27 23:48:26 +02:00
matteo
ced44be342
llama-chat : fix wrong template in GLM4-0414 ( #13140 )
...
* fix wrong template in GLM4-0414
* fix spaces
* no bos token since it is already in the template
* moved the chatgml4 check to higher priority
* restored template for old GLM models
* moved the GLM4 template check in the correct place with correct check
2025-04-27 21:57:32 +02:00
Concedo
ca281bd5ba
fix sanity check
2025-04-28 00:00:07 +08:00
Concedo
87cd8e6a00
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/llava/clip.cpp
2025-04-27 23:51:19 +08:00
Concedo
5fa9e02bc3
add debugging info to zenity check
2025-04-27 23:48:23 +08:00
R0CKSTAR
e291450b76
musa: fix build warning ( #13129 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-27 13:22:49 +02:00
LostRuins Concedo
59e991c23c
Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete ( #13133 )
2025-04-27 12:43:37 +02:00
Concedo
f77574765e
termux script fix
2025-04-27 17:56:00 +08:00
Concedo
37060f54da
backwards compat handle older HimarIO quants
2025-04-27 17:38:22 +08:00
Concedo
f8b7ddeac0
emergency fix for q25vl
2025-04-27 16:46:33 +08:00
HimariO
ca2bb89eac
clip : Add Qwen2.5VL support ( #12402 )
...
* implment vision model architecture, gguf convertor
* handle window attention inputs
* add debug utils
* fix few incorrect tensor memory layout
* move position id remap out of ggml to avoid int32 cuda operations
* cleaning up
* ignore transformers Qwen2_5_xxx type check
* remove not so often use `qwen2vl-cli` debug functions
* remove commented-out code blocks
* fix attn weight scaling after rebase
* add `PROJECTOR_TYPE_QWEN2_5_VL`
* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`
* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`
* remove `attn_window_size` from gguf
* fix model conversion
* clean up
* fix merging problem
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-27 10:10:34 +02:00
Concedo
1b0481f4b1
wip qwen25vl merge
2025-04-27 13:07:07 +08:00
Concedo
36c8db1248
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/llava/clip-impl.h
# examples/llava/clip.cpp
# tests/test-arg-parser.cpp
# tests/test-json-schema-to-grammar.cpp
2025-04-27 12:51:02 +08:00
Xuan Son Nguyen
53a15d014f
add test
2025-04-26 23:00:41 +02:00
Xuan-Son Nguyen
2d451c8059
common : add common_remote_get_content ( #13123 )
...
* common : add common_remote_get_content
* support max size and timeout
* add tests
2025-04-26 22:58:12 +02:00
Xuan Son Nguyen
89be919988
fix merging problem
2025-04-26 22:54:41 +02:00
Xuan Son Nguyen
82f8e72ecd
Merge branch 'master' into qwen25-vl
2025-04-26 22:45:06 +02:00
Xuan-Son Nguyen
4753791e70
clip : improve projector naming ( #13118 )
...
* clip : improve projector naming
* no more kv has_llava_projector
* rm unused kv
* rm more unused
2025-04-26 22:39:47 +02:00
Xuan Son Nguyen
0c74ea54f5
clean up
2025-04-26 22:37:05 +02:00
Xuan Son Nguyen
5085dbb293
Merge branch 'master' into qwen25-vl
2025-04-26 22:24:04 +02:00
Xuan Son Nguyen
516735ad21
fix model conversion
2025-04-26 22:23:48 +02:00
Concedo
378c3dd40c
updated lite
2025-04-27 01:46:52 +08:00
Concedo
77b9a83956
tryout termux autoinstaller (+1 squashed commits)
...
Squashed commits:
[9aeb5e902] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[0e33b5934] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[70232ea70] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[050770315] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[27bfc75a2] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[6a32c1f93] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[1e53b9d48] tryout termux autoinstaller
2025-04-27 01:27:23 +08:00
SXX
77d5e9a76a
ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs ( #13107 )
...
* ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion
* move fp converter to ggml-cpu
* Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32
2025-04-26 16:05:31 +02:00
HimariO
7e1bb0437a
remove attn_window_size from gguf
2025-04-26 20:19:51 +08:00
frob
d5fe4e81bd
grammar : handle maxItems == 0 in JSON schema ( #13117 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-26 10:10:20 +02:00
Concedo
4dcd215b27
handle explicit null
2025-04-26 13:06:38 +08:00
Concedo
cb1c182673
add more warmup (+1 squashed commits)
...
Squashed commits:
[9578d5352] updated lite
2025-04-26 10:22:09 +08:00
Concedo
4decd6bea1
GLM4 batch clamp
2025-04-26 09:42:17 +08:00
Concedo
3f545eadbe
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-backend-ops.cpp
2025-04-26 09:12:40 +08:00
kallewoof
7cb815b727
AutoGuess: GLM-4 ( #1502 )
...
* AutoGuess: GLM-4
* add 'chat_start' field to adapters
* GLM-4 fix
2025-04-26 08:47:42 +08:00
Concedo
35dc8387e9
fixed rwkv7 handling
2025-04-26 02:13:06 +08:00
Concedo
5e87c04056
improved memory estimation (+2 squashed commit)
...
Squashed commit:
[3319540f9] mem estimation
[43bad21db] mem estimation
2025-04-26 02:03:09 +08:00
Diego Devesa
295354ea68
llama : fix K-shift with quantized K and BLAS backend ( #13113 )
2025-04-25 19:40:11 +02:00
HimariO
77b144a8e7
replace KEY_FULLATTN_BLK_IDX with KEY_WIN_ATTN_PATTERN
2025-04-26 01:00:00 +08:00
HimariO
f69e9fa04d
remove KEY_USE_GLU_MLP, KEY_USE_RMS_NORM
2025-04-26 00:16:27 +08:00
HimariO
caa7e57ec5
add PROJECTOR_TYPE_QWEN2_5_VL
2025-04-26 00:03:02 +08:00
HimariO
a3cd0e52f2
fix attn weight scaling after rebase
2025-04-25 22:12:55 +08:00
HimariO
7f530ac040
remove commented-out code blocks
2025-04-25 22:12:55 +08:00
HimariO
2de5dc3a14
remove not so often use qwen2vl-cli debug functions
2025-04-25 22:12:55 +08:00
HimariO
91fbdd781d
ignore transformers Qwen2_5_xxx type check
2025-04-25 22:12:26 +08:00
HimariO
d1af45988a
cleaning up
2025-04-25 22:12:26 +08:00
HimariO
2eb32933ea
move position id remap out of ggml to avoid int32 cuda operations
2025-04-25 22:12:26 +08:00
HimariO
444e47c088
fix few incorrect tensor memory layout
2025-04-25 22:11:48 +08:00
HimariO
69b39addd2
add debug utils
2025-04-25 22:11:48 +08:00
HimariO
3d5198ee05
handle window attention inputs
2025-04-25 22:11:13 +08:00