Concedo
378c3dd40c
updated lite
2025-04-27 01:46:52 +08:00
Concedo
77b9a83956
tryout termux autoinstaller (+1 squashed commits)
...
Squashed commits:
[9aeb5e902] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[0e33b5934] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[70232ea70] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[050770315] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[27bfc75a2] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[6a32c1f93] tryout termux autoinstaller (+1 squashed commits)
Squashed commits:
[1e53b9d48] tryout termux autoinstaller
2025-04-27 01:27:23 +08:00
SXX
77d5e9a76a
ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs ( #13107 )
...
* ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion
* move fp converter to ggml-cpu
* Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32
2025-04-26 16:05:31 +02:00
HimariO
7e1bb0437a
remove attn_window_size
from gguf
2025-04-26 20:19:51 +08:00
frob
d5fe4e81bd
grammar : handle maxItems == 0 in JSON schema ( #13117 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-26 10:10:20 +02:00
Concedo
4dcd215b27
handle explicit null
2025-04-26 13:06:38 +08:00
Concedo
cb1c182673
add more warmup (+1 squashed commits)
...
Squashed commits:
[9578d5352] updated lite
2025-04-26 10:22:09 +08:00
Concedo
4decd6bea1
GLM4 batch clamp
2025-04-26 09:42:17 +08:00
Concedo
3f545eadbe
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-backend-ops.cpp
2025-04-26 09:12:40 +08:00
kallewoof
7cb815b727
AutoGuess: GLM-4 ( #1502 )
...
* AutoGuess: GLM-4
* add 'chat_start' field to adapters
* GLM-4 fix
2025-04-26 08:47:42 +08:00
Concedo
35dc8387e9
fixed rwkv7 handling
2025-04-26 02:13:06 +08:00
Concedo
5e87c04056
improved memory estimation (+2 squashed commit)
...
Squashed commit:
[3319540f9] mem estimation
[43bad21db] mem estimation
2025-04-26 02:03:09 +08:00
Diego Devesa
295354ea68
llama : fix K-shift with quantized K and BLAS backend ( #13113 )
2025-04-25 19:40:11 +02:00
HimariO
77b144a8e7
replace KEY_FULLATTN_BLK_IDX
with KEY_WIN_ATTN_PATTERN
2025-04-26 01:00:00 +08:00
HimariO
f69e9fa04d
remove KEY_USE_GLU_MLP
, KEY_USE_RMS_NORM
2025-04-26 00:16:27 +08:00
HimariO
caa7e57ec5
add PROJECTOR_TYPE_QWEN2_5_VL
2025-04-26 00:03:02 +08:00
HimariO
a3cd0e52f2
fix attn weight scaling after rebase
2025-04-25 22:12:55 +08:00
HimariO
7f530ac040
remove commented-out code blocks
2025-04-25 22:12:55 +08:00
HimariO
2de5dc3a14
remove not so often use qwen2vl-cli
debug functions
2025-04-25 22:12:55 +08:00
HimariO
91fbdd781d
ignore transformers Qwen2_5_xxx type check
2025-04-25 22:12:26 +08:00
HimariO
d1af45988a
cleaning up
2025-04-25 22:12:26 +08:00
HimariO
2eb32933ea
move position id remap out of ggml to avoid int32 cuda operations
2025-04-25 22:12:26 +08:00
HimariO
444e47c088
fix few incorrect tensor memory layout
2025-04-25 22:11:48 +08:00
HimariO
69b39addd2
add debug utils
2025-04-25 22:11:48 +08:00
HimariO
3d5198ee05
handle window attention inputs
2025-04-25 22:11:13 +08:00
HimariO
d9f2d71bc2
implment vision model architecture, gguf convertor
2025-04-25 22:11:13 +08:00
City
558a764713
Force FP32 compute in GLM4 FFN Down ( #13101 )
...
* Force FP32 compute in cuBLAS GEMM
* Revert "Force FP32 compute in cuBLAS GEMM"
This reverts commit 6efd872732159ab88ee7b3c1d77ba5ebc83079bd.
* Force F32 compute in GLM4 ffn down
* Edit comment to clarify issue
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-04-25 14:38:34 +02:00
Xuan-Son Nguyen
edb18b6e8f
clip : fix pixtral on some GPU backends ( #13097 )
...
* clip : fix pixtral on some GPU backends
* refactor inp_raw set
* rm outdated comment
* fix dynamic size
* add TODO
2025-04-25 14:31:42 +02:00
Neo Zhang Jianyu
514c45608f
change the reorder tensor from init to execute OP ( #13003 )
2025-04-25 17:37:51 +08:00
Concedo
6b6597ebf1
allow for single token prompt processing (actual batch size 1)
2025-04-25 16:54:46 +08:00
Radoslav Gerganov
553a5c3a9f
rpc : do not wait for response when sending RPC_CMD_SET_TENSOR ( #12943 )
...
RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.
The performance impact of this change depends on the network latency.
2025-04-25 10:08:08 +03:00
Xuan-Son Nguyen
13be08daf9
clip : remove boi/eoi embeddings for GLM-edge model ( #13081 )
2025-04-24 22:17:04 +02:00
Georgi Gerganov
226251ed56
embeddings : fix batch sizes ( #13076 )
...
ggml-ci
2025-04-24 22:29:22 +03:00
Concedo
d32d0b382a
glm4 template
2025-04-25 00:41:15 +08:00
Georgi Gerganov
87616f0680
ggml : fix trailing whitespaces ( #0 )
2025-04-24 17:32:47 +03:00
Georgi Gerganov
63b4911494
sync : ggml
...
ggml-ci
2025-04-24 17:32:47 +03:00
Acly
c6e8cc28c1
ggml : Depthwise 2D convolution (ggml/1152)
...
* ggml-cpu : kernels for faster depthwise 2D convolution
* fix compile: remove static after moving to ops.cpp
* add dilation for depthwise_conv_2d
* review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace
* review: rename depthwise_conv_2d -> conv_2d_dw everywhere
2025-04-24 17:32:47 +03:00
Johannes Gäßler
b10d8bfdb1
CUDA: use switch statements in constexpr functions ( #13095 )
2025-04-24 15:57:10 +02:00
Georgi Gerganov
13b4548877
cmake : do not include ./src as public for libllama ( #13062 )
...
* cmake : do not include ./src as public for libllama
ggml-ci
* cmake : rework tests
ggml-ci
* llguidance : remove unicode include
ggml-ci
* cmake : make c++17 private
ggml-ci
2025-04-24 16:00:10 +03:00
Georgi Gerganov
572b3141d3
clang-tidy : disable warning about missing math parenthesis ( #13091 )
2025-04-24 15:44:05 +03:00
Xuan-Son Nguyen
7c727fbe39
arg : add --no-mmproj-offload ( #13093 )
...
* arg : add --no-mmproj-offload
* Update common/arg.cpp
2025-04-24 14:04:14 +02:00
Concedo
25e747e9d8
up version
2025-04-24 18:44:29 +08:00
Xuan-Son Nguyen
80982e815e
arg : clean up handling --mmproj with -hf ( #13082 )
...
* arg : clean up handling --mmproj with -hf
* rm change about no_mmproj
* Revert "rm change about no_mmproj"
This reverts commit 2cac8e0efb629d66c612f137e75d562f94bb9e6c.
* handle no_mmproj explicitly
* skip download mmproj on examples not using it
2025-04-24 12:14:13 +02:00
Concedo
c21c8cd00a
Merge branch 'upstream' into concedo_experimental
2025-04-24 18:00:29 +08:00
Concedo
2f645bb1b4
pixtral is working only on cpu, however the images are distorted
2025-04-24 17:59:47 +08:00
Concedo
f1eb6c4e36
mtmd for debug
2025-04-24 16:27:24 +08:00
Georgi Gerganov
7604a7d6b8
metal : fix floating-point range of attention scores in FA kernels ( #13090 )
...
ggml-ci
2025-04-24 10:38:30 +03:00
Concedo
28a2723100
merged pixtral support, not fully working
2025-04-24 15:27:02 +08:00
Eve
b3b6d862cf
vulkan: matmul gcn tuning ( #13016 )
...
* tune matmul for gcn
* this one is more power efficient
* Update ggml/src/ggml-vulkan/ggml-vulkan.cpp
Co-authored-by: 0cc4m <picard12@live.de>
* disable this tune for the proprietary driver
---------
Co-authored-by: 0cc4m <picard12@live.de>
2025-04-24 09:18:33 +02:00
Concedo
8f1edcbdac
Merge commit ' dc39a5e7a8
' into concedo_experimental
...
# Conflicts:
# README.md
# SECURITY.md
# docs/multimodal/MobileVLM.md
# examples/llava/CMakeLists.txt
# examples/llava/README.md
# examples/llava/android/adb_run.sh
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/rope.cpp
# ggml/src/ggml-sycl/rope.hpp
2025-04-24 11:49:08 +08:00