Commit graph

1766 commits

Author SHA1 Message Date
Concedo
dbb6bbf8ea fixed clip quantize 2025-04-30 20:45:40 +08:00
Concedo
8273739412 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/cpu.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/llama-cli-cann.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/rocm.Dockerfile
#	.devops/vulkan.Dockerfile
#	examples/llama-bench/llama-bench.cpp
#	examples/rpc/rpc-server.cpp
#	scripts/compare-llama-bench.py
#	tests/test-quantize-stats.cpp
2025-04-30 17:22:18 +08:00
xiaofei
a0f7016d17
rpc : fix cache directory initialization (#13188)
Signed-off-by: xiaofei <hbuxiaofei@gmail.com>
2025-04-30 09:29:22 +03:00
matteo
e2e1ddb93a
server : Prefilling assistant message in openai compatible API (#13174)
* Prefilling assistant message in openai compatible API

* fixed indentation

* fixed code convention

* simplify method usage

* no more than one assistant message at end of messages

* merge checks into prefill code

* Update examples/server/utils.hpp

---------

Co-authored-by: matteo <matteo@naspc.lan>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-04-29 20:33:10 +02:00
Alberto Cabrera Pérez
5a63980117
llama-bench: fixed size of fields to correctly map to values (#13183) 2025-04-29 17:24:36 +02:00
Concedo
be66a77ca5 add f16 quantclip 2025-04-29 22:25:52 +08:00
Concedo
b2ecfa0f55 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	examples/llama-bench/README.md
#	examples/llama-bench/llama-bench.cpp
#	examples/llava/CMakeLists.txt
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-chat-template.cpp
2025-04-29 21:05:16 +08:00
Xuan-Son Nguyen
00e3e5a194
mtmd : add qwen2vl and qwen2.5vl (#13141)
* llava : add clip_n_output_tokens, deprecate clip_n_patches

* mtmd : add qwen2vl and qwen2.5vl

* decode_embd_batch::set_position_...

* working version

* deprecate llama-qwen2vl-cli

* correct order W, H of clip_embd_nbytes_by_img

* edit existing line in hot topics
2025-04-29 11:47:04 +02:00
Xuan-Son Nguyen
eaea325324
clip : fix model size display (#13153) 2025-04-28 21:23:19 +02:00
Concedo
4d8a7a6594 fix occasional clip segfault, fix glm4 (+1 squashed commits)
Squashed commits:

[bd71cd688] GLM4 fix wip
2025-04-29 01:42:50 +08:00
Vishal Agarwal
1831f538f7
llama-bench: add -d depth arg (#13096)
* add depth param

* update llama-bench README and add depth param

* llama-bench: default params for depth arg for faster execution

* Update examples/llama-bench/README.md

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* fix buffer print ub

* use user provided args

* remove extra whitespaces

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-04-28 16:50:39 +02:00
Xuan-Son Nguyen
4e87962e34
mtmd : fix glm-edge redundant token count (#13139)
* mtmd : fix glm-edge redundant token count

* fix chat template

* temporary disable GLMEdge test chat tmpl
2025-04-28 16:12:56 +02:00
Xuan-Son Nguyen
d2b2031e5f
llama : (mrope) allow using normal 1D position for text token (#13138)
* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
2025-04-28 14:20:56 +02:00
Xuan-Son Nguyen
5fa9e63be8
clip : refactor set input for cgraph + fix qwen2.5vl input (#13136)
* clip : refactor set input for cgraph

* more strict assert

* minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere

* split qwen2 and qwen2.5 code blocks

* minor style fix
2025-04-28 12:18:59 +02:00
4onen
c0a97b762e
llama-bench : Add --override-tensors arg (#12922)
* Add --override-tensors option to llama-bench

* Correct llama-bench --override-tensors to --override-tensor

* llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix.

* Make new llama-bench util functions static to fix Ubuntu CI

* llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)
2025-04-27 23:48:26 +02:00
LostRuins Concedo
59e991c23c
Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133) 2025-04-27 12:43:37 +02:00
Concedo
37060f54da backwards compat handle older HimarIO quants 2025-04-27 17:38:22 +08:00
Concedo
f8b7ddeac0 emergency fix for q25vl 2025-04-27 16:46:33 +08:00
HimariO
ca2bb89eac
clip : Add Qwen2.5VL support (#12402)
* implment vision model architecture, gguf convertor

* handle window attention inputs

* add debug utils

* fix few incorrect tensor memory layout

* move position id remap out of ggml to avoid int32 cuda operations

* cleaning up

* ignore transformers Qwen2_5_xxx type check

* remove not so often use `qwen2vl-cli` debug functions

* remove commented-out code blocks

* fix attn weight scaling after rebase

* add `PROJECTOR_TYPE_QWEN2_5_VL`

* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`

* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`

* remove `attn_window_size` from gguf

* fix model conversion

* clean up

* fix merging problem

* add test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-27 10:10:34 +02:00
Concedo
1b0481f4b1 wip qwen25vl merge 2025-04-27 13:07:07 +08:00
Concedo
36c8db1248 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/llava/clip-impl.h
#	examples/llava/clip.cpp
#	tests/test-arg-parser.cpp
#	tests/test-json-schema-to-grammar.cpp
2025-04-27 12:51:02 +08:00
Xuan Son Nguyen
53a15d014f add test 2025-04-26 23:00:41 +02:00
Xuan Son Nguyen
89be919988 fix merging problem 2025-04-26 22:54:41 +02:00
Xuan Son Nguyen
82f8e72ecd Merge branch 'master' into qwen25-vl 2025-04-26 22:45:06 +02:00
Xuan-Son Nguyen
4753791e70
clip : improve projector naming (#13118)
* clip : improve projector naming

* no more kv has_llava_projector

* rm unused kv

* rm more unused
2025-04-26 22:39:47 +02:00
Xuan Son Nguyen
0c74ea54f5 clean up 2025-04-26 22:37:05 +02:00
Xuan Son Nguyen
5085dbb293 Merge branch 'master' into qwen25-vl 2025-04-26 22:24:04 +02:00
HimariO
7e1bb0437a remove attn_window_size from gguf 2025-04-26 20:19:51 +08:00
frob
d5fe4e81bd
grammar : handle maxItems == 0 in JSON schema (#13117)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-26 10:10:20 +02:00
Concedo
3f545eadbe Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-backend-ops.cpp
2025-04-26 09:12:40 +08:00
HimariO
77b144a8e7 replace KEY_FULLATTN_BLK_IDX with KEY_WIN_ATTN_PATTERN 2025-04-26 01:00:00 +08:00
HimariO
f69e9fa04d remove KEY_USE_GLU_MLP, KEY_USE_RMS_NORM 2025-04-26 00:16:27 +08:00
HimariO
caa7e57ec5 add PROJECTOR_TYPE_QWEN2_5_VL 2025-04-26 00:03:02 +08:00
HimariO
a3cd0e52f2 fix attn weight scaling after rebase 2025-04-25 22:12:55 +08:00
HimariO
7f530ac040 remove commented-out code blocks 2025-04-25 22:12:55 +08:00
HimariO
2de5dc3a14 remove not so often use qwen2vl-cli debug functions 2025-04-25 22:12:55 +08:00
HimariO
91fbdd781d ignore transformers Qwen2_5_xxx type check 2025-04-25 22:12:26 +08:00
HimariO
d1af45988a cleaning up 2025-04-25 22:12:26 +08:00
HimariO
2eb32933ea move position id remap out of ggml to avoid int32 cuda operations 2025-04-25 22:12:26 +08:00
HimariO
444e47c088 fix few incorrect tensor memory layout 2025-04-25 22:11:48 +08:00
HimariO
69b39addd2 add debug utils 2025-04-25 22:11:48 +08:00
HimariO
3d5198ee05 handle window attention inputs 2025-04-25 22:11:13 +08:00
HimariO
d9f2d71bc2 implment vision model architecture, gguf convertor 2025-04-25 22:11:13 +08:00
Xuan-Son Nguyen
edb18b6e8f
clip : fix pixtral on some GPU backends (#13097)
* clip : fix pixtral on some GPU backends

* refactor inp_raw set

* rm outdated comment

* fix dynamic size

* add TODO
2025-04-25 14:31:42 +02:00
Concedo
6b6597ebf1 allow for single token prompt processing (actual batch size 1) 2025-04-25 16:54:46 +08:00
Xuan-Son Nguyen
13be08daf9
clip : remove boi/eoi embeddings for GLM-edge model (#13081) 2025-04-24 22:17:04 +02:00
Georgi Gerganov
226251ed56
embeddings : fix batch sizes (#13076)
ggml-ci
2025-04-24 22:29:22 +03:00
Georgi Gerganov
13b4548877
cmake : do not include ./src as public for libllama (#13062)
* cmake : do not include ./src as public for libllama

ggml-ci

* cmake : rework tests

ggml-ci

* llguidance : remove unicode include

ggml-ci

* cmake : make c++17 private

ggml-ci
2025-04-24 16:00:10 +03:00
Xuan-Son Nguyen
7c727fbe39
arg : add --no-mmproj-offload (#13093)
* arg : add --no-mmproj-offload

* Update common/arg.cpp
2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen
80982e815e
arg : clean up handling --mmproj with -hf (#13082)
* arg : clean up handling --mmproj with -hf

* rm change about no_mmproj

* Revert "rm change about no_mmproj"

This reverts commit 2cac8e0efb629d66c612f137e75d562f94bb9e6c.

* handle no_mmproj explicitly

* skip download mmproj on examples not using it
2025-04-24 12:14:13 +02:00