Concedo
e88bf41fdc
Merge commit ' 12280ae905' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# common/CMakeLists.txt
# docs/docker.md
# examples/model-conversion/scripts/causal/compare-logits.py
# ggml/src/ggml-hexagon/htp/rope-ops.c
# tests/test-backend-ops.cpp
# tests/test-barrier.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-12-16 16:29:01 +08:00
Xuan-Son Nguyen
c6b2c9310c
mtmd: some small clean up ( #17909 )
...
* clip: add support for fused qkv in build_vit
* use bulid_ffn whenever possible
* fix internvl
* mtmd-cli: move image to beginning
* test script: support custom args
2025-12-10 22:20:06 +01:00
Georgi Gerganov
4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant ( #17910 )
...
* ggml : remove GGML_KQ_MASK_PAD constant
* cont : remove comment
2025-12-10 20:53:16 +02:00
Concedo
03cec02a3d
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# .github/workflows/winget.yml
# CODEOWNERS
# README.md
# ci/run.sh
# docs/build.md
# docs/ops.md
# docs/ops/Vulkan.csv
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync_vendor.py
# src/CMakeLists.txt
# tests/test-json-schema-to-grammar.cpp
# tests/test-quantize-stats.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-12-03 18:56:31 +08:00
Concedo
83269df91b
Merge commit ' 649495c9d9' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# SECURITY.md
# docs/backend/SYCL.md
# examples/sycl/run-llama2.sh
# examples/sycl/run-llama3.sh
# examples/sycl/win-run-llama2.bat
# examples/sycl/win-run-llama3.bat
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/cpy.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-backend-ops.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/server/CMakeLists.txt
2025-12-03 18:43:46 +08:00
Xuan-Son Nguyen
a96283adc4
mtmd: fix --no-warmup ( #17695 )
2025-12-02 22:48:08 +01:00
Xuan-Son Nguyen
ecf74a8417
mtmd: add mtmd_context_params::warmup option ( #17652 )
...
* mtmd: add mtmd_context_params::warmup option
* reuse the common_params::warmup
2025-12-01 21:32:25 +01:00
Tarek Dakhran
2ba719519d
model: LFM2-VL fixes ( #17577 )
...
* Adjust to pytorch
* Add antialiasing upscale
* Increase number of patches to 1024
* Handle default marker insertion for LFM2
* Switch to flag
* Reformat
* Cuda implementation of antialias kernel
* Change placement in ops.cpp
* consistent float literals
* Pad only for LFM2
* Address PR feedback
* Rollback default marker placement changes
* Fallback to CPU implementation for antialias implementation of upscale
2025-11-30 21:57:31 +01:00
Xuan-Son Nguyen
7f8ef50cce
clip: fix nb calculation for qwen3-vl ( #17594 )
2025-11-30 15:33:55 +01:00
Ruben Garcia
06d39dff73
Fix warnings ( #1864 )
2025-11-29 20:18:38 +08:00
Concedo
eda4a312cb
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/vulkan.Dockerfile
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/common.hpp
# tests/test-backend-ops.cpp
# tools/server/README.md
2025-11-28 13:22:02 +08:00
Han Qingzhe
1d594c295c
clip: (minicpmv) fix resampler kq_scale ( #17516 )
...
* debug:"solve minicpmv precision problem"
* “debug minicpmv”
* Apply suggestion from @ngxson
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-11-26 21:44:07 +01:00
LostRuins Concedo
3fe0e39b62
Merge commit ' 4dca015b7e' into concedo_experimental
...
# Conflicts:
# .github/copilot-instructions.md
# README.md
# docs/ops.md
# docs/ops/CPU.csv
# docs/ops/CUDA.csv
# docs/ops/Vulkan.csv
# ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
# src/CMakeLists.txt
# tests/test-backend-ops.cpp
2025-11-16 18:33:58 +08:00
Xuan-Son Nguyen
9b17d74ab7
mtmd: add mtmd_log_set ( #17268 )
2025-11-14 15:56:19 +01:00
Xuan-Son Nguyen
4b13a684c5
mtmd: fix patch_size initialized to random value in audio models ( #17128 )
...
* mtmd: fix patch_size initialized to random value in audio models
* add default hparams
2025-11-10 11:41:05 +01:00
LostRuins Concedo
df6e303fd3
merge https://github.com/ggml-org/llama.cpp/pull/17128
2025-11-10 11:24:04 +08:00
LostRuins Concedo
d02cb1b117
Revert "fix divide by zero error"
...
This reverts commit 6cce98eca5 .
2025-11-10 11:22:50 +08:00
LostRuins Concedo
6cce98eca5
fix divide by zero error
2025-11-10 01:38:55 +08:00
LostRuins Concedo
60a74bdd89
make tool calling work with jinja. but still need to fix qwen omni first (+1 squashed commits)
...
Squashed commits:
[e394da61e] make tool calling work with jinja. but still need to fix qwen omni first
2025-11-09 16:56:14 +08:00
LostRuins Concedo
4fc022a51f
revert qwen vl warmup size
2025-11-09 02:24:49 +08:00
LostRuins Concedo
d6a2ad8455
still not really working right
2025-11-09 01:57:48 +08:00
LostRuins Concedo
e6ca0aa8d0
Merge commit ' 2f0c2db43e' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# README.md
# docs/backend/OPENCL.md
# docs/ops.md
# docs/ops/CUDA.csv
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/set_rows.tmpl.wgsl
# scripts/sync-ggml.last
# src/CMakeLists.txt
# tools/server/README.md
2025-11-08 23:27:59 +08:00
LostRuins Concedo
64a1cd95a7
fixed missing headers
2025-11-08 11:09:49 +08:00
LostRuins Concedo
dfb0966ed2
not working
2025-11-08 10:49:10 +08:00
LostRuins Concedo
fdcb281a3a
Merge commit ' 2f966b8ed8' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# docs/docker.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-thread-safety.cpp
# tools/batched-bench/batched-bench.cpp
# tools/mtmd/clip.cpp
2025-11-08 10:34:17 +08:00
LostRuins Concedo
7061cd1cc9
Merge commit ' e4a71599e5' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# tools/mtmd/clip.cpp
2025-11-08 10:28:49 +08:00
Xuan-Son Nguyen
4882f0ff78
clip: implement minicpm-v sinusoidal embd using GGML ( #17036 )
...
* clip: implement minicpm-v sinusoidal embd using GGML
* fix repeat op
2025-11-06 11:02:54 +01:00
Xuan-Son Nguyen
92bb84f775
mtmd: allow QwenVL to process larger image by default ( #17020 )
2025-11-05 14:26:49 +01:00
Xuan-Son Nguyen
2f0c2db43e
mtmd: improve struct initialization ( #16981 )
2025-11-05 11:26:37 +01:00
Xuan-Son Nguyen
070ff4d535
mtmd: add --image-min/max-tokens ( #16921 )
2025-11-03 11:11:18 +01:00
Xuan-Son Nguyen
bf7b0c9725
mtmd: pad mask for qwen2.5vl ( #16954 )
...
* mtmd: pad mask for qwen2.5vl
* improve
2025-11-03 10:25:55 +01:00
Zhiyong Wang
6b9a52422b
model: add Janus Pro for image understanding ( #16906 )
...
* Add support for Janus Pro
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Address reviewer suggestions
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add JANUS_PRO constant
* Update clip model handling
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Refactor JANUS_PRO handling in clip.cpp
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
* Update tools/mtmd/clip.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* em whitespace
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-11-02 22:08:04 +01:00
Georgi Gerganov
2f966b8ed8
clip : use FA ( #16837 )
...
* clip : use FA
* cont : add warning about unsupported ops
* implement "auto" mode for clip flash attn
* clip : print more detailed op support info during warmup
* cont : remove obsolete comment [no ci]
* improve debugging message
* trailing space
* metal : remove stray return
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-02 21:21:48 +01:00
Concedo
af327857ec
handle loading very old mmproj that broke after https://github.com/ggml-org/llama.cpp/pull/14928
2025-11-02 02:11:17 +08:00
Xuan-Son Nguyen
cf659bbb8e
mtmd: refactor preprocessing + support max/min pixels ( #16878 )
...
* mtmd: refactor preprocessing + support max/min pixels
* fix mlp type
* implement mix/max pixels
* improve hparams
* better image preproc for qwen
* fix
* fix out of bound composite
* fix (2)
* fix token calculation
* get_merge_kernel_size()
* fix llama4 and lfm2
* gonna fix them all
* use simple resize for qwen
* qwen: increase min tokens
* no resize if dst size == src size
* restore to initial min/max tokens value for qwen
2025-11-01 15:51:36 +01:00
Concedo
2b00e55356
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# ggml/src/ggml-opencl/kernels/mul_mm_f16_f32_l4_lm.cl
# ggml/src/ggml-opencl/kernels/mul_mm_f32_f32_l4_lm.cl
# ggml/src/ggml-sycl/rope.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/rope.tmpl.wgsl
# requirements/requirements-convert_legacy_llama.txt
# tests/test-backend-ops.cpp
# tests/test-rope.cpp
# tools/server/README.md
2025-10-31 10:52:57 +08:00
JJJYmmm
d261223d24
model: add support for qwen3vl series ( #16780 )
...
* support qwen3vl series.
Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>
* bugfix: fix the arch check for qwen3vl-moe.
* use build_ffn
* optimize deepstack structure
* optimize deepstack feature saving
* Revert "optimize deepstack feature saving" for temporal fix
This reverts commit f321b9fdf13e59527408152e73b1071e19a87e71.
* code clean
* use fused qkv in clip
* clean up / rm is_deepstack_layers for simplification
* add test model
* move test model to "big" section
* fix imrope check
* remove trailing whitespace
* fix rope fail
* metal : add imrope support
* add imrope support for sycl
* vulkan: add imrope w/o check
* fix vulkan
* webgpu: add imrope w/o check
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix tensor mapping
---------
Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-30 16:19:14 +01:00
Tianyue-Zhao
bacddc049a
model: Add support for CogVLM model ( #15002 )
...
* Added GGUF mappings for CogVLM model
* Add tensor mapping for CogVLM visual encoder
* Add CogVLM to conversion script, no vision part yet
* Added CogVLM vision model to conversion script
* Add graph for CogVLM CLIP model
* Add graph for CogVLM
* Fixes for CogVLM. Now compiles.
* Model now runs
* Fixes for cogvlm graph
* Account for graph context change after rebase
* Changes for whitespace
* Changes in convert script according to comments
* Switch CogVLM LLM graph to merged QKV tensor
* Use rope_type variable instead of direct definition
* Change CogVLM CLIP encoder to use SWIGLU
* Switch CogVLM CLIP to use merged QKV
* Apply rebase edits and remove ggml_cont call that is now unnecessary
* clean up
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-10-30 12:18:50 +01:00
Concedo
16cbe9f24e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# docs/ops.md
# docs/ops/SYCL.csv
# examples/embedding/README.md
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/norm.cpp
# ggml/src/ggml-sycl/norm.hpp
# scripts/snapdragon/adb/run-bench.sh
# scripts/snapdragon/adb/run-cli.sh
# src/llama-batch.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/llama-bench/README.md
2025-10-30 13:44:46 +08:00
Concedo
472438aad3
Merge commit ' 5a4ff43e7d' into concedo_experimental
...
# Conflicts:
# docs/build.md
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# src/llama-context.cpp
# tests/test-backend-ops.cpp
2025-10-30 13:13:00 +08:00
Xuan-Son Nguyen
e1ab084803
mtmd : fix idefics3 preprocessing ( #16806 )
...
* mtmd : fix idefics3 preprocessing
* disable granite test
* fix test for granite
2025-10-27 23:12:16 +01:00
Xuan-Son Nguyen
c55d53acec
model : add LightOnOCR-1B model ( #16764 )
...
* model : add LightOnOCR-1B model
* add test
2025-10-27 16:02:58 +01:00
Concedo
85556118b5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cann/acl_tensor.cpp
# ggml/src/ggml-cann/acl_tensor.h
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/presets.hpp
2025-10-18 10:56:55 +08:00
Xuan-Son Nguyen
1bb4f43380
mtmd : support home-cooked Mistral Small Omni ( #14928 )
2025-10-16 19:00:31 +02:00
Concedo
bb5cef1756
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# ci/run.sh
# ggml/src/ggml-cpu/amx/amx.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/rms_norm.wgsl
# tools/server/README.md
2025-10-06 22:41:46 +08:00
Gabe Goodhart
ca71fb9b36
model : Granite docling + Idefics3 preprocessing (SmolVLM) ( #16206 )
...
* feat: Add granite-docling conversion using trillion pretokenizer
Branch: gabe-l-hart/GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Add granite-docling vocab pre enum
Branch: gabe-l-hart/GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Use granite-docling pre
Branch: gabe-l-hart/GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Add clip_is_idefics3
Branch: gabe-l-hart/GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Allow multi-token boundary sequences for image templating
Branch: gabe-l-hart/GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Add tiling support for idefices3 in clip.cpp
This should likely be moved into llava_uhd::get_slice_instructions, but for
now this avoids disrupting the logic there.
Branch: gabe-l-hart/GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Partial support for full templating for idefics3 in mtmd
There are still errors encoding some of the image chunks, but the token
sequence now matches transformers _almost_ perfectly, except for the double
newline before the global image which shows up as two consecutive newline
tokens instead of a single double-newline token. I think this is happening
because the blocks are tokenized separately then concatenated.
Branch: gabe-l-hart/GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Fully working image preprocessing for idefics3 w/ resize and slicing
Branch: gabe-l-hart/GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* feat: Parse the preprocessor config's longest side and add it to the mmproj hparams
Branch: GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Use the longest side instead of size * scale_factor
For Granite Docling, these come out to the same value, but that was just a
conicidence.
Branch: GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Allow batch encoding and remove clip_is_idefics3
Branch: GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* refactor: Remove unnecessary conditionals for empty token vectors
Branch: GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* refactor: Use image_manipulation util
Branch: GraniteDocling
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* add test model
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-10-05 14:57:47 +02:00
Concedo
b120e107f9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .clang-tidy
# .devops/musa.Dockerfile
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .gitignore
# CODEOWNERS
# CONTRIBUTING.md
# README.md
# build-xcframework.sh
# ci/README-MUSA.md
# ci/run.sh
# common/CMakeLists.txt
# docs/docker.md
# examples/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/model-conversion/Makefile
# examples/model-conversion/README.md
# examples/model-conversion/logits.cpp
# examples/model-conversion/scripts/causal/compare-logits.py
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh
# examples/model-conversion/scripts/embedding/run-converted-model.sh
# examples/model-conversion/scripts/embedding/run-original-model.py
# examples/model-conversion/scripts/utils/check-nmse.py
# examples/model-conversion/scripts/utils/inspect-org-model.py
# examples/model-conversion/scripts/utils/semantic_check.py
# ggml/CMakeLists.txt
# ggml/include/ggml-zdnn.h
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/set_rows.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/set_rows.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-zdnn/ggml-zdnn.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-quantize-perf.cpp
# tests/test-tokenizers-repo.sh
# tools/perplexity/perplexity.cpp
# tools/server/tests/README.md
2025-09-27 17:09:14 +08:00
Aleksei Nikiforov
cc1cfa277b
mtmd : fix uninitialized variable in bicubic_resize ( #16275 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-26 15:00:44 +02:00
Diego Devesa
50f4281a6f
llama : allow using iGPUs with --device ( #15951 )
...
* llama : allow using iGPUs with --device
* mtmd : allow iGPU
* rpc-server : allow iGPU
2025-09-13 16:49:49 +02:00
Concedo
575eb40950
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/multimodal/minicpmv4.0.md
# examples/model-conversion/Makefile
# examples/model-conversion/README.md
# examples/model-conversion/logits.cpp
# examples/model-conversion/scripts/causal/modelcard.template
# examples/model-conversion/scripts/utils/hf-create-model.py
# ggml/src/ggml-opencl/ggml-opencl.cpp
# tests/test-backend-ops.cpp
# tools/batched-bench/batched-bench.cpp
2025-08-26 19:09:48 +08:00