Commit graph

1785 commits

Author SHA1 Message Date
HimariO
f69e9fa04d remove KEY_USE_GLU_MLP, KEY_USE_RMS_NORM 2025-04-26 00:16:27 +08:00
HimariO
caa7e57ec5 add PROJECTOR_TYPE_QWEN2_5_VL 2025-04-26 00:03:02 +08:00
HimariO
a3cd0e52f2 fix attn weight scaling after rebase 2025-04-25 22:12:55 +08:00
HimariO
7f530ac040 remove commented-out code blocks 2025-04-25 22:12:55 +08:00
HimariO
2de5dc3a14 remove not so often use qwen2vl-cli debug functions 2025-04-25 22:12:55 +08:00
HimariO
91fbdd781d ignore transformers Qwen2_5_xxx type check 2025-04-25 22:12:26 +08:00
HimariO
d1af45988a cleaning up 2025-04-25 22:12:26 +08:00
HimariO
2eb32933ea move position id remap out of ggml to avoid int32 cuda operations 2025-04-25 22:12:26 +08:00
HimariO
444e47c088 fix few incorrect tensor memory layout 2025-04-25 22:11:48 +08:00
HimariO
69b39addd2 add debug utils 2025-04-25 22:11:48 +08:00
HimariO
3d5198ee05 handle window attention inputs 2025-04-25 22:11:13 +08:00
HimariO
d9f2d71bc2 implment vision model architecture, gguf convertor 2025-04-25 22:11:13 +08:00
Xuan-Son Nguyen
edb18b6e8f
clip : fix pixtral on some GPU backends (#13097)
* clip : fix pixtral on some GPU backends

* refactor inp_raw set

* rm outdated comment

* fix dynamic size

* add TODO
2025-04-25 14:31:42 +02:00
Concedo
6b6597ebf1 allow for single token prompt processing (actual batch size 1) 2025-04-25 16:54:46 +08:00
Xuan-Son Nguyen
13be08daf9
clip : remove boi/eoi embeddings for GLM-edge model (#13081) 2025-04-24 22:17:04 +02:00
Georgi Gerganov
226251ed56
embeddings : fix batch sizes (#13076)
ggml-ci
2025-04-24 22:29:22 +03:00
Georgi Gerganov
13b4548877
cmake : do not include ./src as public for libllama (#13062)
* cmake : do not include ./src as public for libllama

ggml-ci

* cmake : rework tests

ggml-ci

* llguidance : remove unicode include

ggml-ci

* cmake : make c++17 private

ggml-ci
2025-04-24 16:00:10 +03:00
Xuan-Son Nguyen
7c727fbe39
arg : add --no-mmproj-offload (#13093)
* arg : add --no-mmproj-offload

* Update common/arg.cpp
2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen
80982e815e
arg : clean up handling --mmproj with -hf (#13082)
* arg : clean up handling --mmproj with -hf

* rm change about no_mmproj

* Revert "rm change about no_mmproj"

This reverts commit 2cac8e0efb629d66c612f137e75d562f94bb9e6c.

* handle no_mmproj explicitly

* skip download mmproj on examples not using it
2025-04-24 12:14:13 +02:00
Concedo
2f645bb1b4 pixtral is working only on cpu, however the images are distorted 2025-04-24 17:59:47 +08:00
Concedo
f1eb6c4e36 mtmd for debug 2025-04-24 16:27:24 +08:00
Concedo
28a2723100 merged pixtral support, not fully working 2025-04-24 15:27:02 +08:00
Concedo
8f1edcbdac Merge commit 'dc39a5e7a8' into concedo_experimental
# Conflicts:
#	README.md
#	SECURITY.md
#	docs/multimodal/MobileVLM.md
#	examples/llava/CMakeLists.txt
#	examples/llava/README.md
#	examples/llava/android/adb_run.sh
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/rope.cpp
#	ggml/src/ggml-sycl/rope.hpp
2025-04-24 11:49:08 +08:00
pl752
5630406959
llama-mtmd-cli: Sigint rework in mtmd vision example (#13080)
* Sigint rework in mtmd vision example

* Applied suggestions on mtmd-cli PR

* Forgot to invert one of the conditions

* Update examples/llava/mtmd-cli.cpp

* Removed redundant exit check

---------

Co-authored-by: pl752 <maximpl752@gmail.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-04-23 23:32:35 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B (#13065)
* add pixtral text model (vision is wip)

* cgraph ok, just missing 2D RoPE

* fix bad rebase

* first working version

* fix problem with img_break token

* support dynamic image size

* update docs

* update test script
2025-04-23 20:21:59 +02:00
Radoslav Gerganov
2cca6c01e4
rpc : add command line option for number of threads for the CPU backend (#13060)
closes #13051
2025-04-23 10:32:49 +03:00
Xuan-Son Nguyen
dc39a5e7a8
mtmd : support SmolVLM (version 1 and 2) (#13050)
* mtmd : support SmolVLM (version 1 and 2)

* correct chat template

* fix n_patches

* scale_factor is an int

* add more models to test
2025-04-22 16:24:54 +02:00
Xuan-Son Nguyen
243453533e
llava : update documentations (#13055)
* llava : update documentations

* fix typo
2025-04-22 10:37:00 +02:00
Xuan-Son Nguyen
84a9bf2fc2
mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli (#13012)
* mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli`

* support for minicpmv

* remove cpp files of llava and minicpmv

* update hot topics

* mtmd : add not supported msg for qwen2vl

* Update examples/llava/mtmd.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-21 15:32:58 +02:00
Xuan-Son Nguyen
2016f07bd1
convert : experimental support for --mmproj flag (#13023)
* convert : experimental support for `--mmproj` flag

* fix bad ctrl+f replace

* fix style

* split into subclasses TextModel and VisionModel

* rename Mode --> ModelBase

* small fix

* correct CLIP_VISION arch name (because existing GGUF already use it)

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

* fix Mistral3Model

* fix typo

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>
2025-04-20 23:29:36 +02:00
Concedo
687bb5375c Merge branch 'upstream' into concedo_experimental 2025-04-20 20:57:11 +08:00
Jeffrey Morgan
6602304814
llava: fix errors in clip.h on certain compilers (#13030) 2025-04-20 12:15:41 +02:00
Concedo
17360a3b32 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	examples/llava/clip.cpp
2025-04-20 17:59:58 +08:00
Xuan-Son Nguyen
37b9f0d29d
clip : refactor, add image_manipulation and llava_uhd classes (#13011)
* clip : refactor, add `image_manipulation` and `llava_uhd`

* refactor llava-1.6 preprocessing

* simplify logic for llava-1.5

* missing include
2025-04-19 09:15:45 +02:00
Concedo
95d1aaf4d4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/rpc/rpc-server.cpp
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/backend.hpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	requirements/requirements-all.txt
2025-04-19 13:17:13 +08:00
Daniel Tang
6408210082
main : Fix Ctrl+D/newline handling (#12951)
This restores the behavior from #491. This does not affect Ctrl+D's ability to
terminate --multiline-input lines (#1040).

This also actually implements #587: "If the user wants the text to end in a
newline, this should be accomplished by explicitly adding a newline by using
\ followed by return, then returning control by pressing return again."

Fixes #12949
2025-04-18 22:02:55 +02:00
Xuan-Son Nguyen
35370ba945
server : use std::move whenever possible (#12936)
* server : use std::move whenever possible

* use r-value ref

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* make task creation scoped

* restore std::move

* fix task_id not set correctly

* apply changes from suggestion

Co-authored-by: ggerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-18 19:58:12 +02:00
Xuan-Son Nguyen
b9154ecff9
mtmd : add methods to access mtmd_image_tokens (#12906)
* mtmd : add more api around mtmd_image_tokens

* mtmd : ability to calc image hash

* shared_ptr for mtmd_image_tokens

* move hash to user-define ID (fixed)

* fix prompt_modified

* rm redundant data member
2025-04-18 10:04:51 +02:00
Radoslav Gerganov
2db9ba1464
rpc : add RPC_CMD_HELLO (#12955)
Add RPC_CMD_HELLO for getting the version of the protocol implemend by
the server. Follow the semantic versioning rules at https://semver.org

Hopefully this bring better user experience when we make breaking
changes at the protocol level and avoid issues like #12465
2025-04-18 10:13:42 +03:00
Concedo
06159939d9 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	Makefile
#	docs/build.md
#	examples/rpc/rpc-server.cpp
#	examples/sycl/build.sh
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-hip/CMakeLists.txt
#	scripts/sync-ggml.last
2025-04-17 00:52:37 +08:00
Russyyds
d6d2c2ab8c
Add performance print for gemma3 in example (#12929) 2025-04-14 19:18:20 +02:00
Neo Zhang Jianyu
81c7e64fc2
dsiable curl lib check, this action is missed by commit bd3f59f812 (#12761) (#12937) 2025-04-14 18:19:07 +08:00
Ed Addario
71e90e8813
quantize: Handle user-defined quantization levels for additional tensors (#12511)
* Add llama_model_quantize_params parameters

* Add new quantize parameters parsing and validation

* Update usage

* Add new parameters defaults

* Add new quantization parameters logic

* Add llama_model_quantize_params parameters

* Add new quantize parameters parsing and validation

* Update usage

* Add new parameters defaults

* Add new quantization parameters logic

* Minor refactoring as per the contributors' coding guidelines

* Update descriptions to match existing style

* Add llama_model_quantize_params parameters

* Add new quantize parameters parsing and validation

* Update usage

* Add new parameters defaults

* Add new quantization parameters logic

* Minor refactoring as per the contributors' guidelines

* Implement general --tensor-type instead of tensor-specific command option

* Fix implied type bug

* Restore missing #includes

* Add regex capability for tensor selection

* Refactor function name and update ALLOWED_TENSOR_TYPE

* Add missing #include

* Handle edge case when tensor name is cls.output

* Minor logging improvement
2025-04-13 21:29:28 +03:00
Prajwal B Mehendarkar
bc091a4dc5
common : Define cache directory on AIX (#12915) 2025-04-12 17:33:39 +02:00
Concedo
a6149ad0fc fixed g3 adapter back 2025-04-12 23:17:54 +08:00
Concedo
9f94f62768 fixed segfault 2025-04-12 19:08:27 +08:00
Concedo
7b4254bef9 not working on cpu 2025-04-12 18:55:29 +08:00
Matt Clayton
e59ea539b8
llava: Fix cpu-only clip image encoding sefault (#12907)
* llava: Fix cpu-only clip image encoding

* clip : no smart ptr for ggml_backend_t

* Fix for backend_ptr push_back

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-12 07:29:03 +02:00
Concedo
7e1289ade8 fixes for sdcpp 2025-04-12 10:08:23 +08:00
Concedo
a0ae187563 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	README.md
#	build-xcframework.sh
#	examples/llava/CMakeLists.txt
#	examples/llava/clip.cpp
#	examples/rpc/rpc-server.cpp
#	examples/run/run.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
2025-04-12 10:06:47 +08:00