Xuan Son Nguyen
0c74ea54f5
clean up
2025-04-26 22:37:05 +02:00
HimariO
7e1bb0437a
remove attn_window_size from gguf
2025-04-26 20:19:51 +08:00
HimariO
77b144a8e7
replace KEY_FULLATTN_BLK_IDX with KEY_WIN_ATTN_PATTERN
2025-04-26 01:00:00 +08:00
HimariO
f69e9fa04d
remove KEY_USE_GLU_MLP, KEY_USE_RMS_NORM
2025-04-26 00:16:27 +08:00
HimariO
caa7e57ec5
add PROJECTOR_TYPE_QWEN2_5_VL
2025-04-26 00:03:02 +08:00
HimariO
a3cd0e52f2
fix attn weight scaling after rebase
2025-04-25 22:12:55 +08:00
HimariO
7f530ac040
remove commented-out code blocks
2025-04-25 22:12:55 +08:00
HimariO
2de5dc3a14
remove not so often use qwen2vl-cli debug functions
2025-04-25 22:12:55 +08:00
HimariO
91fbdd781d
ignore transformers Qwen2_5_xxx type check
2025-04-25 22:12:26 +08:00
HimariO
d1af45988a
cleaning up
2025-04-25 22:12:26 +08:00
HimariO
2eb32933ea
move position id remap out of ggml to avoid int32 cuda operations
2025-04-25 22:12:26 +08:00
HimariO
444e47c088
fix few incorrect tensor memory layout
2025-04-25 22:11:48 +08:00
HimariO
69b39addd2
add debug utils
2025-04-25 22:11:48 +08:00
HimariO
3d5198ee05
handle window attention inputs
2025-04-25 22:11:13 +08:00
HimariO
d9f2d71bc2
implment vision model architecture, gguf convertor
2025-04-25 22:11:13 +08:00
Xuan-Son Nguyen
edb18b6e8f
clip : fix pixtral on some GPU backends ( #13097 )
...
* clip : fix pixtral on some GPU backends
* refactor inp_raw set
* rm outdated comment
* fix dynamic size
* add TODO
2025-04-25 14:31:42 +02:00
Xuan-Son Nguyen
13be08daf9
clip : remove boi/eoi embeddings for GLM-edge model ( #13081 )
2025-04-24 22:17:04 +02:00
Xuan-Son Nguyen
7c727fbe39
arg : add --no-mmproj-offload ( #13093 )
...
* arg : add --no-mmproj-offload
* Update common/arg.cpp
2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen
80982e815e
arg : clean up handling --mmproj with -hf ( #13082 )
...
* arg : clean up handling --mmproj with -hf
* rm change about no_mmproj
* Revert "rm change about no_mmproj"
This reverts commit 2cac8e0efb629d66c612f137e75d562f94bb9e6c.
* handle no_mmproj explicitly
* skip download mmproj on examples not using it
2025-04-24 12:14:13 +02:00
pl752
5630406959
llama-mtmd-cli: Sigint rework in mtmd vision example ( #13080 )
...
* Sigint rework in mtmd vision example
* Applied suggestions on mtmd-cli PR
* Forgot to invert one of the conditions
* Update examples/llava/mtmd-cli.cpp
* Removed redundant exit check
---------
Co-authored-by: pl752 <maximpl752@gmail.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-04-23 23:32:35 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
Xuan-Son Nguyen
dc39a5e7a8
mtmd : support SmolVLM (version 1 and 2) ( #13050 )
...
* mtmd : support SmolVLM (version 1 and 2)
* correct chat template
* fix n_patches
* scale_factor is an int
* add more models to test
2025-04-22 16:24:54 +02:00
Xuan-Son Nguyen
243453533e
llava : update documentations ( #13055 )
...
* llava : update documentations
* fix typo
2025-04-22 10:37:00 +02:00
Xuan-Son Nguyen
84a9bf2fc2
mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli ( #13012 )
...
* mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli`
* support for minicpmv
* remove cpp files of llava and minicpmv
* update hot topics
* mtmd : add not supported msg for qwen2vl
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-21 15:32:58 +02:00
Xuan-Son Nguyen
2016f07bd1
convert : experimental support for --mmproj flag ( #13023 )
...
* convert : experimental support for `--mmproj` flag
* fix bad ctrl+f replace
* fix style
* split into subclasses TextModel and VisionModel
* rename Mode --> ModelBase
* small fix
* correct CLIP_VISION arch name (because existing GGUF already use it)
* Apply suggestions from code review
Co-authored-by: compilade <git@compilade.net>
* fix Mistral3Model
* fix typo
Co-authored-by: compilade <git@compilade.net>
---------
Co-authored-by: compilade <git@compilade.net>
2025-04-20 23:29:36 +02:00
Jeffrey Morgan
6602304814
llava: fix errors in clip.h on certain compilers ( #13030 )
2025-04-20 12:15:41 +02:00
Xuan-Son Nguyen
37b9f0d29d
clip : refactor, add image_manipulation and llava_uhd classes ( #13011 )
...
* clip : refactor, add `image_manipulation` and `llava_uhd`
* refactor llava-1.6 preprocessing
* simplify logic for llava-1.5
* missing include
2025-04-19 09:15:45 +02:00
Xuan-Son Nguyen
b9154ecff9
mtmd : add methods to access mtmd_image_tokens ( #12906 )
...
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* fix prompt_modified
* rm redundant data member
2025-04-18 10:04:51 +02:00
Russyyds
d6d2c2ab8c
Add performance print for gemma3 in example ( #12929 )
2025-04-14 19:18:20 +02:00
Matt Clayton
e59ea539b8
llava: Fix cpu-only clip image encoding sefault ( #12907 )
...
* llava: Fix cpu-only clip image encoding
* clip : no smart ptr for ggml_backend_t
* Fix for backend_ptr push_back
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-12 07:29:03 +02:00
Xuan-Son Nguyen
0c50923944
clip : use smart pointer ( ⚠️ breaking change) ( #12869 )
...
* clip : use smart pointers
* fix warmup
* add forward declaration
* misisng include
* fix include (2)
* composite
* simplify batch ptr
* fix conflict
2025-04-11 12:09:39 +02:00
Xuan-Son Nguyen
8b9cc7cdd8
llava : introduce libmtmd ( #12849 )
...
* wip llava2
* migrated gemma3 to llava2
* add timings
* correct pre/postfix
* fix missing include
* fix compilation unused var warn
* update llava2_tokenize
* change name llava2 --> mtmd
* improve api
* refine helpers
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-10 22:57:16 +02:00
Xuan-Son Nguyen
65a69e6e1b
clip : do not print ftype ( #12832 )
2025-04-09 10:09:53 +02:00
Matt Clayton
b32efad2bc
llava: improve clip_ctx destructor to not memleak load_image_size ( #12834 )
2025-04-08 22:01:58 +02:00
dm4
2dabf759e7
llava: add more helper functions to check projector types in clip context ( #12824 )
...
Signed-off-by: dm4 <sunrisedm4@gmail.com>
2025-04-08 15:49:13 +02:00
Sergey Fedorov
f1e3eb4249
common : fix includes in arg.cpp and gemma3-cli.cpp ( #12766 )
...
* arg.cpp: add a missing include
* gemma3-cli.cpp: fix cinttypes include
2025-04-05 17:46:00 +02:00
Xuan-Son Nguyen
0364178ca2
clip : refactor clip_init, add tests ( #12757 )
...
* refactor clip_init
* fix loading file
* fix style
* test ok
* better test with report
* add missing headers
* clarify
* add KEY_MM_PATCH_MERGE_TYPE
* remove bool has_* pattern
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/clip.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* use ggml_soft_max_ext
* refactor logging system
* add minicpm-v-o 2.6 for testing
* use nullptr everywhere
* fix Yi-VL model
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-05 17:17:40 +02:00
Xuan-Son Nguyen
267c1399f1
common : refactor downloading system, handle mmproj with -hf option ( #12694 )
...
* (wip) refactor downloading system [no ci]
* fix all examples
* fix mmproj with -hf
* gemma3: update readme
* only handle mmproj in llava example
* fix multi-shard download
* windows: fix problem with std::min and std::max
* fix 2
2025-04-01 23:44:05 +02:00
Sigbjørn Skjæret
1a85949067
llava : proper description fix ( #12668 )
2025-03-31 11:28:30 +02:00
Sigbjørn Skjæret
f52d59d771
llava : fix clip loading GGUFs with missing description ( #12660 )
2025-03-31 11:07:07 +02:00
Ivy233
02082f1519
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend ( #12566 )
...
* [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize.
After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA.
* [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.
2025-03-26 15:06:04 +01:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )
...
* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci
2025-03-13 12:35:44 +02:00
Xuan-Son Nguyen
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )
...
* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64
2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen
96e1280839
clip : bring back GPU support ( #12322 )
...
* clip : bring back GPU support
* use n_gpu_layers param
* fix double free
* ggml_backend_init_by_type
* clean up
2025-03-11 09:20:16 +01:00
tc-mb
8352cdc87b
llava : fix bug in minicpm-v code ( #11513 )
...
* fix bug in minicpm-v code
* update readme of minicpm-v
2025-03-10 10:33:24 +02:00
Aaron Teo
e9b2f84f14
llava: add big-endian conversion for image encoder ( #12218 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-03-06 09:33:21 +01:00
Alex Brooks
84d5f4bc19
Update granite vision docs for 3.2 model ( #12105 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-28 11:31:47 +00:00
Ting Lou
a800ae46da
llava : add struct for FFI bindgen ( #12079 )
...
* add struct for FFI bindgen
* Apply suggestions from code review
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-26 15:26:52 +01:00
Alex Brooks
4d1051a40f
Add Doc for Converting Granite Vision -> GGUF ( #12006 )
...
* Add example docs for granite vision
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-25 10:46:05 +01:00
Alex Brooks
7a2c913e66
llava : Add Granite Vision Support ( #11794 )
...
* Add super wip scripts for multimodal granite gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add example for converting mmgranite to gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* remove hardcoded path
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add vision feature layer to gguf params
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Clean up llava surgery and remove name substitution hacks
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add transformers llava next tensor name mapping
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Make siglip / openclip mutuall exclusive
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix projector linear substitution
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix linear 2 substitution index
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Increase max flattened gridpoints to 64
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix hardcoded concat for multiple feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Pull vision feature layers out of gguf keys
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* fix num gridpoints and use all layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Avoid dropping last image encoder layer in llava models
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use 10 for max number of patches
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Standardize vision feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Cleanup logs
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update comment for vision feature layer init
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update notes for alternative to legacy llm conversion script
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix notes rendering
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add v prefix to vision feature layer log
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use current defaults for feature layer
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use constant for max gridpoints / feat layers, style fixes
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* clarify non-negative feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Remove CLIP_API from func signature
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Clarify feature layers are non negative ints and not uint
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix condition for reading feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* pop last llava layer when feature layers are unset
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix unset vision layer 0
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update examples/llava/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Reenable assertion for out of bounds get_rows
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use std vector for gridpoints and feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Caculate max feature layer at load time
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Include base patch for granite vision allocation
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix trailing whitespace
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add max num patches = 10 back for minicpmv
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use unordered set to store feature layers
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use max feature layer for postnorm
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Apply suggestions from code review
---------
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-24 17:09:51 +01:00