Commit graph

7739 commits

Author SHA1 Message Date
Concedo
f8b7ddeac0 emergency fix for q25vl 2025-04-27 16:46:33 +08:00
Concedo
1b0481f4b1 wip qwen25vl merge 2025-04-27 13:07:07 +08:00
Concedo
36c8db1248 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/llava/clip-impl.h
#	examples/llava/clip.cpp
#	tests/test-arg-parser.cpp
#	tests/test-json-schema-to-grammar.cpp
2025-04-27 12:51:02 +08:00
Xuan Son Nguyen
53a15d014f add test 2025-04-26 23:00:41 +02:00
Xuan-Son Nguyen
2d451c8059
common : add common_remote_get_content (#13123)
* common : add common_remote_get_content

* support max size and timeout

* add tests
2025-04-26 22:58:12 +02:00
Xuan Son Nguyen
89be919988 fix merging problem 2025-04-26 22:54:41 +02:00
Xuan Son Nguyen
82f8e72ecd Merge branch 'master' into qwen25-vl 2025-04-26 22:45:06 +02:00
Xuan-Son Nguyen
4753791e70
clip : improve projector naming (#13118)
* clip : improve projector naming

* no more kv has_llava_projector

* rm unused kv

* rm more unused
2025-04-26 22:39:47 +02:00
Xuan Son Nguyen
0c74ea54f5 clean up 2025-04-26 22:37:05 +02:00
Xuan Son Nguyen
5085dbb293 Merge branch 'master' into qwen25-vl 2025-04-26 22:24:04 +02:00
Xuan Son Nguyen
516735ad21 fix model conversion 2025-04-26 22:23:48 +02:00
Concedo
378c3dd40c updated lite 2025-04-27 01:46:52 +08:00
Concedo
77b9a83956 tryout termux autoinstaller (+1 squashed commits)
Squashed commits:

[9aeb5e902] tryout termux autoinstaller (+1 squashed commits)

Squashed commits:

[0e33b5934] tryout termux autoinstaller (+1 squashed commits)

Squashed commits:

[70232ea70] tryout termux autoinstaller (+1 squashed commits)

Squashed commits:

[050770315] tryout termux autoinstaller (+1 squashed commits)

Squashed commits:

[27bfc75a2] tryout termux autoinstaller (+1 squashed commits)

Squashed commits:

[6a32c1f93] tryout termux autoinstaller (+1 squashed commits)

Squashed commits:

[1e53b9d48] tryout termux autoinstaller
2025-04-27 01:27:23 +08:00
SXX
77d5e9a76a
ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (#13107)
* ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion

* move fp converter to ggml-cpu

* Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32
2025-04-26 16:05:31 +02:00
HimariO
7e1bb0437a remove attn_window_size from gguf 2025-04-26 20:19:51 +08:00
frob
d5fe4e81bd
grammar : handle maxItems == 0 in JSON schema (#13117)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-26 10:10:20 +02:00
Concedo
4dcd215b27 handle explicit null 2025-04-26 13:06:38 +08:00
Concedo
cb1c182673 add more warmup (+1 squashed commits)
Squashed commits:

[9578d5352] updated lite
2025-04-26 10:22:09 +08:00
Concedo
4decd6bea1 GLM4 batch clamp 2025-04-26 09:42:17 +08:00
Concedo
3f545eadbe Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-backend-ops.cpp
2025-04-26 09:12:40 +08:00
kallewoof
7cb815b727
AutoGuess: GLM-4 (#1502)
* AutoGuess: GLM-4

* add 'chat_start' field to adapters

* GLM-4 fix
2025-04-26 08:47:42 +08:00
Concedo
35dc8387e9 fixed rwkv7 handling 2025-04-26 02:13:06 +08:00
Concedo
5e87c04056 improved memory estimation (+2 squashed commit)
Squashed commit:

[3319540f9] mem estimation

[43bad21db] mem estimation
2025-04-26 02:03:09 +08:00
Diego Devesa
295354ea68
llama : fix K-shift with quantized K and BLAS backend (#13113) 2025-04-25 19:40:11 +02:00
HimariO
77b144a8e7 replace KEY_FULLATTN_BLK_IDX with KEY_WIN_ATTN_PATTERN 2025-04-26 01:00:00 +08:00
HimariO
f69e9fa04d remove KEY_USE_GLU_MLP, KEY_USE_RMS_NORM 2025-04-26 00:16:27 +08:00
HimariO
caa7e57ec5 add PROJECTOR_TYPE_QWEN2_5_VL 2025-04-26 00:03:02 +08:00
HimariO
a3cd0e52f2 fix attn weight scaling after rebase 2025-04-25 22:12:55 +08:00
HimariO
7f530ac040 remove commented-out code blocks 2025-04-25 22:12:55 +08:00
HimariO
2de5dc3a14 remove not so often use qwen2vl-cli debug functions 2025-04-25 22:12:55 +08:00
HimariO
91fbdd781d ignore transformers Qwen2_5_xxx type check 2025-04-25 22:12:26 +08:00
HimariO
d1af45988a cleaning up 2025-04-25 22:12:26 +08:00
HimariO
2eb32933ea move position id remap out of ggml to avoid int32 cuda operations 2025-04-25 22:12:26 +08:00
HimariO
444e47c088 fix few incorrect tensor memory layout 2025-04-25 22:11:48 +08:00
HimariO
69b39addd2 add debug utils 2025-04-25 22:11:48 +08:00
HimariO
3d5198ee05 handle window attention inputs 2025-04-25 22:11:13 +08:00
HimariO
d9f2d71bc2 implment vision model architecture, gguf convertor 2025-04-25 22:11:13 +08:00
City
558a764713
Force FP32 compute in GLM4 FFN Down (#13101)
* Force FP32 compute in cuBLAS GEMM

* Revert "Force FP32 compute in cuBLAS GEMM"

This reverts commit 6efd872732159ab88ee7b3c1d77ba5ebc83079bd.

* Force F32 compute in GLM4 ffn down

* Edit comment to clarify issue

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-04-25 14:38:34 +02:00
Xuan-Son Nguyen
edb18b6e8f
clip : fix pixtral on some GPU backends (#13097)
* clip : fix pixtral on some GPU backends

* refactor inp_raw set

* rm outdated comment

* fix dynamic size

* add TODO
2025-04-25 14:31:42 +02:00
Neo Zhang Jianyu
514c45608f
change the reorder tensor from init to execute OP (#13003) 2025-04-25 17:37:51 +08:00
Concedo
6b6597ebf1 allow for single token prompt processing (actual batch size 1) 2025-04-25 16:54:46 +08:00
Radoslav Gerganov
553a5c3a9f
rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943)
RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.

The performance impact of this change depends on the network latency.
2025-04-25 10:08:08 +03:00
Xuan-Son Nguyen
13be08daf9
clip : remove boi/eoi embeddings for GLM-edge model (#13081) 2025-04-24 22:17:04 +02:00
Georgi Gerganov
226251ed56
embeddings : fix batch sizes (#13076)
ggml-ci
2025-04-24 22:29:22 +03:00
Concedo
d32d0b382a glm4 template 2025-04-25 00:41:15 +08:00
Georgi Gerganov
87616f0680 ggml : fix trailing whitespaces (#0) 2025-04-24 17:32:47 +03:00
Georgi Gerganov
63b4911494 sync : ggml
ggml-ci
2025-04-24 17:32:47 +03:00
Acly
c6e8cc28c1 ggml : Depthwise 2D convolution (ggml/1152)
* ggml-cpu : kernels for faster depthwise 2D convolution

* fix compile: remove static after moving to ops.cpp

* add dilation for depthwise_conv_2d

* review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace

* review: rename depthwise_conv_2d -> conv_2d_dw everywhere
2025-04-24 17:32:47 +03:00
Johannes Gäßler
b10d8bfdb1
CUDA: use switch statements in constexpr functions (#13095) 2025-04-24 15:57:10 +02:00
Georgi Gerganov
13b4548877
cmake : do not include ./src as public for libllama (#13062)
* cmake : do not include ./src as public for libllama

ggml-ci

* cmake : rework tests

ggml-ci

* llguidance : remove unicode include

ggml-ci

* cmake : make c++17 private

ggml-ci
2025-04-24 16:00:10 +03:00