Concedo
9c0b9b0bb1
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/development/HOWTO-add-model.md
# docs/multimodal.md
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/gated_delta_net.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/upscale.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# tests/test-backend-ops.cpp
# tests/test-llama-archs.cpp
# tools/mtmd/CMakeLists.txt
2026-04-14 20:06:04 +08:00
Xuan-Son Nguyen
21a4933042
mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) ( #19441 )
...
* add qwen3a
* wip
* vision ok
* no more deepstack for audio
* convert ASR model ok
* qwen3 asr working
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* nits
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix bad merge
* fix multi inheritance
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-12 23:57:25 +02:00
Stephen Cox
547765a93e
mtmd: add Gemma 4 audio conformer encoder support ( #21421 )
...
* mtmd: add Gemma 4 audio conformer encoder support
Add audio processing for Gemma 4 E2B/E4B via a USM-style Conformer.
Architecture:
- 12-layer Conformer: FFN → Self-Attention → Causal Conv1D → FFN → Norm
- Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm
- Full self-attention with sinusoidal RPE and sliding window mask (24)
- Logit softcapping at 50.0, ClippableLinear clamping
- Output: 1024 → 1536 → RMSNorm → multimodal embedder
Mel preprocessing (dedicated mtmd_audio_preprocessor_gemma4a):
- HTK mel scale, 128 bins, magnitude STFT, mel_floor=1e-3
- Standard periodic Hann window (320 samples), zero-padded to FFT size
- Semicausal left-padding (frame_length/2 samples)
- Frame count matched to PyTorch (unfold formula)
- No pre-emphasis, no Whisper-style normalization
- Mel cosine similarity vs PyTorch: 0.9998
Key fixes:
- Tensor loading dedup: prevent get_tensor() from creating duplicate
entries in ctx_data. Fixed with std::set guard.
- ClippableLinear clamp_info loading moved after per-layer tensors.
- Sliding window mask (24 positions) matching PyTorch context_size.
- Skip Whisper normalization for Gemma4 mel output.
Tested on E2B and E4B with CPU and Vulkan backends.
Transcribes: "Glad to see things are going well and business is starting
to pick up" (matching ground truth).
Ref: #21325
2026-04-12 14:15:26 +02:00
Concedo
5361b45fba
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# requirements/requirements-tool_bench.txt
2026-04-12 16:22:26 +08:00
Sirui He
073bb2c20b
mtmd : add MERaLiON-2 multimodal audio support ( #21756 )
...
* mtmd : add MERaLiON-2 multimodal audio support
Adds support for A*STAR's MERaLiON-2 audio-language model (3B and 10B)
to the multimodal framework.
Architecture:
- Whisper large-v2 encoder for audio feature extraction
- Gated MLP adaptor: ln_speech -> frame stack (x15) -> Linear+SiLU -> GLU -> out_proj
- Gemma2 3B / 27B decoder
The mmproj GGUF is generated via convert_hf_to_gguf.py --mmproj on the full
MERaLiON-2 model directory (architecture: MERaLiON2ForConditionalGeneration).
The decoder is converted separately as a standard Gemma2 model after stripping
the text_decoder. weight prefix.
New projector type: PROJECTOR_TYPE_MERALION
Supports tasks: speech transcription (EN/ZH/MS/TA), translation, spoken QA.
Model: https://huggingface.co/MERaLiON/MERaLiON-2-3B
https://huggingface.co/MERaLiON/MERaLiON-2-10B
* simplify comments in meralion adaptor
* meralion: use format_tensor_name, ascii arrows in comments
2026-04-11 14:15:48 +02:00
Concedo
8b90bfe094
Merge commit ' 4ef9301e4d' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# docs/multimodal.md
# embd_res/ggml-vocab-gemma-4.gguf
# embd_res/ggml-vocab-gemma-4.gguf.inp
# embd_res/ggml-vocab-gemma-4.gguf.out
# ggml/src/ggml-sycl/fattn-tile.cpp
# ggml/src/ggml-sycl/fattn-tile.hpp
# ggml/src/ggml-sycl/fattn-vec.hpp
# ggml/src/ggml-sycl/fattn.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q8_0.cpp
# tests/CMakeLists.txt
# tests/test-jinja.cpp
# tools/mtmd/CMakeLists.txt
2026-04-11 09:38:50 +08:00
Xuan-Son Nguyen
501aeed18f
mtmd: support dots.ocr ( #17575 )
...
* convert gguf
* clip impl
* fix conversion
* wip
* corrections
* update docs
* add gguf to test script
2026-04-09 12:16:38 +02:00
Concedo
c82c0b463a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/release.yml
# examples/debug/debug.cpp
# ggml/src/ggml-cuda/common.cuh
# ggml/src/ggml-cuda/mmq.cuh
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/mtmd/CMakeLists.txt
2026-04-09 17:45:04 +08:00
forforever73
09343c0198
model : support step3-vl-10b ( #21287 )
...
* feat: support step3-vl-10b
* use fused QKV && mapping tensor in tensor_mapping.py
* guard hardcoded params and drop crop metadata
* get understand_projector_stride from global config
* img_u8_resize_bilinear_to_f32 move in step3vl class
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix the \r\n mess
* add width and heads to MmprojModel.set_gguf_parameters
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-08 09:51:31 +02:00
Concedo
a395af65db
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-riscv.yml
# .github/workflows/build.yml
# ggml/src/ggml-hexagon/htp/argsort-ops.c
# ggml/src/ggml-sycl/fattn-tile.hpp
# tools/mtmd/CMakeLists.txt
2026-04-06 20:56:02 +08:00
Richard Davison
af76639f72
model : add HunyuanOCR support ( #21395 )
...
* HunyuanOCR: add support for text and vision models
- Add HunyuanOCR vision projector (perceiver-based) with Conv2d merge
- Add separate HUNYUAN_OCR chat template (content-before-role format)
- Handle HunyuanOCR's invalid pad_token_id=-1 in converter
- Fix EOS/EOT token IDs from generation_config.json
- Support xdrope RoPE scaling type
- Add tensor mappings for perceiver projector (mm.before_rms, mm.after_rms, etc.)
- Register HunYuanVLForConditionalGeneration for both text and mmproj conversion
* fix proper mapping
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* address comments
* update
* Fix typecheck
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-05 23:32:14 +02:00
Concedo
e8cffa37c8
fixed gemma4v image crashing on encode, however images are not yet working correctly
2026-04-03 15:56:35 +08:00
Concedo
34ad53e950
merged support for gemma4. the e2b, e4b and 26b work, the 31b does not
2026-04-03 11:07:46 +08:00
Xuan-Son Nguyen
63f8fe0ef4
model, mtmd: fix gguf conversion for audio/vision mmproj ( #21309 )
...
* fix gguf conversion for audio/vision mmproj
* fix test
2026-04-02 17:10:32 +02:00
Concedo
42ad89cd86
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/nix/package.nix
# .github/workflows/build-android.yml
# .github/workflows/build-cann.yml
# .github/workflows/build-msys.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/python-lint.yml
# .github/workflows/release.yml
# CMakeLists.txt
# docs/backend/CANN.md
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/sync_vendor.py
# tests/test-chat-auto-parser.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-reasoning-budget.cpp
# tools/cli/cli.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2026-03-30 20:45:38 +08:00
Concedo
923d5fc5d0
warning: clip_image_preprocess has been moved, now you must manually copy init_vision from mtmd into clip.cpp's setup_init_vision_shim_kcpp
2026-03-30 20:39:55 +08:00
Concedo
4a09f3805b
prepare for breaking merge
2026-03-29 14:09:29 +08:00
Concedo
aac220f7e3
Merge commit ' 0fac87b157' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-android.yml
# .github/workflows/hip-quality-check.yml
# docs/multimodal.md
# scripts/hip/gcn-cdna-vgpr-check.py
# scripts/snapdragon/windows/run-bench.ps1
# scripts/snapdragon/windows/run-cli.ps1
# scripts/snapdragon/windows/run-tool.ps1
# tests/test-backend-ops.cpp
# tests/test-llama-archs.cpp
# tools/imatrix/imatrix.cpp
# tools/mtmd/CMakeLists.txt
2026-03-29 01:14:33 +08:00
Xuan-Son Nguyen
871f1a2d2f
mtmd: add more sanity checks ( #21047 )
2026-03-27 11:00:52 +01:00
Xuan-Son Nguyen
a73bbd5d92
mtmd: refactor image preprocessing ( #21031 )
...
* mtmd: refactor image pre-processing
* correct some places
* correct lfm2
* fix deepseek-ocr on server
* add comment to clarify about mtmd_image_preprocessor_dyn_size
2026-03-26 19:49:20 +01:00
Saba Fallah
a970515bdb
mtmd: Add DeepSeekOCR Support ( #17400 )
...
* mtmd: llama.cpp DeepSeekOCR support
init commit
* loading sam tensors
* mtmd: fix vision model processing
* deepseek-ocr clip-vit model impl
* mtmd: add DeepSeek-OCR LM support with standard attention
* mtmd: successfully runs DeepSeek-OCR LM in llama-cli
* mtmd: Fix RoPE type for DeepSeek-OCR LM.
* loading LM
testing Vision model loading
* sam warmup working
* sam erroneous return corrected
* clip-vit: corrected cls_embd concat
* clip-vit: model convert qkv_proj split
* corrected combining of image encoders' results
* fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model
* concat image_newline and image_seperator tokens
* visual_model warmup (technically) works
* window partitioning using standard ggml ops
* sam implementation without using CPU only ops
* clip: fixed warnings
* Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr
* mtmd: fix get_rel_pos
* mtmd: fixed the wrong scaler for get_rel_pos
* image encoding technically works but the output can't be checked singe image decoding fails
* mtmd: minor changed
* mtmd: add native resolution support
* - image encoding debugged
- issues fixed mainly related wrong config like n_patches etc.
- configs need to be corrected in the converter
* mtmd: correct token order
* - dynamic resizing
- changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4
* mtmd: quick fix token order
* mtmd: fix danling pointer
* mtmd: SAM numerically works
* mtmd: debug CLIP-L (vit_pre_ln)
* mtmd: debug CLIP-L & first working DeepSeek-OCR model
* mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work
* mtmd: simplify SAM patch embedding
* mtmd: adapt Pillow image resizing function
* mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing
* mtmd: remove --dsocr-mode argument
* mtmd: refactor code & remove unused helper functions
* mtmd: fix tensor names for image newlines and view separator
* clean up
* reverting automatically removed spaces
* reverting automatically removed spaces
* mtmd: fixed bad ocr check in Deepseek2 (LM)
* mtmd: support combined QKV projection in buid_vit
* using common build_attn in sam
* corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option
* mtmd: minor fix
* minor formatting and style
* fixed flake8 lint issues
* minor editorconfig-check fixes
* minor editorconfig-check fixes
* mtmd: simplify get_rel_pos
* mtmd: make sam hparams configurable
* mtmd: add detailed comments for resize_bicubic_pillow
* mtmd: fixed wrong input setting
* mtmd: convert model in FP16
* mtmd: minor fix
* mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template
* fix: test-1.jpg ORC issue with small (640) resolution
setting min-resolution base (1024) max large (1280) for dynamic-resolution
* minor: editconfig-check fix
* merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909
added new opt to tests.sh to disable flash-attn
* minor: editconfig-check fix
* testing deepseek-ocr
quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR
* quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909
* refactoring, one single builder function and static helpers
* added deepseek-ocr test to tests.sh
* minor formatting fixes
* check with fixed expected resutls
* minor formatting
* editorconfig-check fix
* merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042
* minor
- added GLM-4.6V to big tests
- added missing deps for python test
* convert: minor fix
* mtmd: format code
* convert: quick fix
* convert: quick fix
* minor python formatting
* fixed merge build issue
* merge resolved
- fixed issues in convert
- tested several deepseek models
* minor fix
* minor
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* - removed clip_is_deepseekocr
- removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo
- simplified image-preprocessing
- removed/simplified debug functions
* - cleaning commented out code
* fixing instabilities issues reintroducing resize_bicubic_pillow
* - use f16 model for deepseek-ocr test
- ignore llama-arch test for deepseek-ocr
* rename fc_w --> mm_fc_w
* add links to OCR discussion
* cleaner loading code
* add missing .weight to some tensors
* add default jinja template (to be used by server)
* move test model to ggml-org
* rolling back upscale change
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: bluebread <hotbread70127@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-03-25 19:57:40 +01:00
Concedo
8a6c41dc5c
Merge commit ' 841bc203e2' into concedo_experimental
...
# Conflicts:
# .github/workflows/ai-issues.yml
# embd_res/templates/HuggingFaceTB-SmolLM3-3B.jinja
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-openvino/ggml-openvino.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-jinja.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-03-25 22:49:53 +08:00
bssrdf
ec2b787ebe
mtmd: Add dynamic high-resolution image preprocessing for InternVL model ( #20847 )
...
* added support for internvl's dynamic high-resolution (Qianfan-OCR needed)
* add min/max dynamic patch to gguf meta
* clean up
* simplified handling min/max dynamic patch
* reuse llava_uhd logic for slice images
* provide default values for older models
* flake8
* prevent writing 0 value to gguf
* remove duplicated resolution candidates with a better algorithm
* fix indentation
* format
* add protection from divide by zero
* change to 0 to be safe
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-03-23 01:06:30 +01:00
DorianRudolph
d3ac030a5d
mtmd : fix LightOnOCR image preprocessing ( #20877 )
2026-03-23 01:04:14 +01:00
Concedo
98f099aecc
Merge commit ' c1258830b2' into concedo_experimental
...
# Conflicts:
# docs/docker.md
# docs/ops.md
# docs/ops/WebGPU.csv
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/row_norm.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
2026-03-21 12:00:52 +08:00
Xuan-Son Nguyen
1e64534570
mtmd: add clip_graph::build_mm() ( #20751 )
...
* clip: add build_mm()
* apply to all models
* add TODO for bias overload
2026-03-19 13:11:39 +01:00
Concedo
f3d2f58fa8
note: smartcache is broken for rnn currently
2026-03-15 11:31:47 +08:00
Concedo
b1c500ae2b
Merge commit ' 2948e6049a' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CONTRIBUTING.md
# docs/backend/VirtGPU/development.md
# docs/ops.md
# docs/ops/WebGPU.csv
# embd_res/templates/GigaChat3-10B-A1.8B.jinja
# embd_res/templates/GigaChat3.1-10B-A1.8B.jinja
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-grammar-integration.cpp
# tests/test-quantize-fns.cpp
2026-03-15 11:21:24 +08:00
Xuan-Son Nguyen
94d0262277
mtmd: add llama-mtmd-debug binary ( #20508 )
...
* mtmd: add llama-mtmd-debug binary
* adapt
* fixes
* fix compile error
* fix windows compile error
* rm legacy clip_debug_encode()
* add MTMD_API to fix build
2026-03-14 15:52:29 +01:00
DAN™
fdb17643d3
model : add support for Phi4ForCausalLMV ( #20168 )
...
* Add support for Phi4ForCausalLMV.
* Fix Phi-4 vision parity (correcting SigLIP2 patch-kernel export layout) and matching HF NaFlex resize behavior in mtmd.
* Rename contants + fix tokenizer label
* Clean-ups.
* Fix GGUF export.
* Set tokenizer.ggml.pre explicitly.
* Default vocab name rather than forcing it.
* Clean-ups.
* Fix indent.
* Fix subscriptable error.
* remov overcomplicated code path
* Clean-ups.
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-03-12 00:25:54 +01:00
Concedo
746664fde6
Merge commit ' 2cd20b72ed' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# docs/backend/CANN.md
# docs/backend/SYCL.md
# docs/backend/snapdragon/README.md
# docs/backend/snapdragon/windows.md
# docs/build.md
# docs/multimodal/MobileVLM.md
# docs/ops.md
# docs/ops/WebGPU.csv
# examples/debug/README.md
# examples/llama.vim
# examples/model-conversion/README.md
# examples/sycl/README.md
# ggml/src/ggml-cpu/amx/mmq.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp-drv.cpp
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-copy.h
# ggml/src/ggml-hexagon/htp/hvx-inverse.h
# ggml/src/ggml-hexagon/htp/hvx-reduce.h
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/worker-pool.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cpy.cl
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/quants.hpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/pr2wt.sh
# scripts/server-bench.py
# scripts/snapdragon/windows/run-cli.ps1
# tests/test-alloc.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/completion/README.md
# tools/cvector-generator/cvector-generator.cpp
# tools/imatrix/README.md
# tools/perplexity/README.md
# tools/server/public_simplechat/readme.md
# tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
JustCommitRandomness
2fbc3b2ae5
Adjust int types in format strings ( #2009 )
...
* tweak format sting types
This may not be all of them, but it's the ones which warn on OpenBSD
* complete the changes needed to fix the format string specifers
* avoid using inttypes, directly cast to size_t (u64 usually) instead
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2026-03-06 19:06:18 +08:00
Marcel Petrick
92f7da00b4
chore : correct typos [no ci] ( #20041 )
...
* fix(docs): correct typos found during code review
Non-functional changes only:
- Fixed minor spelling mistakes in comments
- Corrected typos in user-facing strings
- No variables, logic, or functional code was modified.
Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
* Update docs/backend/CANN.md
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8"
This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256.
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-05 08:50:21 +01:00
Concedo
e626de2430
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/ops.md
# docs/ops/WebGPU.csv
# embd_res/templates/stepfun-ai-Step-3.5-Flash.jinja
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
# src/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/mtmd/CMakeLists.txt
2026-02-20 15:16:26 +08:00
Concedo
9eb9e4eb83
Merge commit ' 8a70973557' into concedo_experimental
...
# Conflicts:
# docs/backend/CANN.md
# docs/backend/SYCL.md
# examples/model-conversion/scripts/utils/tensor-info.py
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/expm1.cl
# ggml/src/ggml-opencl/kernels/mean.cl
# ggml/src/ggml-opencl/kernels/softplus.cl
# ggml/src/ggml-opencl/kernels/sum_rows.cl
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/scale.wgsl
# tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-02-20 14:36:49 +08:00
megemini
237958db33
model: Add PaddleOCR-VL model support ( #18825 )
...
* support PaddleOCR-VL
* clip: update PaddleOCR model loader parameters to prevent OOM during warmup
* [update] add paddleocr vl text model instead of ernie4.5
* [update] restore change of minicpmv
* [update] format
* [update] format
* [update] positions and patch merge permute
* [update] mtmd_decode_use_mrope for paddleocr
* [update] image min/max pixels
* [update] remove set_limit_image_tokens
* upate: preprocess without padding
* clean up
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-19 17:05:25 +01:00
Saba Fallah
e6267a9359
mtmd: build_attn modified, flash_attn on/off via ctx_params ( #19729 )
2026-02-19 13:50:29 +01:00
Xuan-Son Nguyen
eeef3cfced
model: support GLM-OCR ( #19677 )
...
* model: support GLM-OCR
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-18 17:51:40 +01:00
Concedo
72f7e01b27
Merge commit ' 01d8eaa28d' into concedo_experimental
...
# Conflicts:
# build-xcframework.sh
# scripts/sync_vendor.py
# tests/test-backend-ops.cpp
# tools/mtmd/CMakeLists.txt
# tools/rpc/rpc-server.cpp
2026-02-16 15:36:59 +08:00
Anav Prasad
01d8eaa28d
mtmd : Add Nemotron Nano 12B v2 VL support ( #19547 )
...
* nemotron nano v2 vlm support added
* simplified code; addressed reviews
* pre-downsample position embeddings during GGUF conversion for fixed input size
2026-02-14 14:07:00 +01:00
Concedo
55524e160b
temp merge, not working
2026-02-13 12:11:26 +08:00
Concedo
261d78eaaa
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# README.md
# docs/speculative.md
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/mtmd/clip.cpp
2026-02-12 18:05:20 +08:00
AesSedai
e463bbdf65
model: Add Kimi-K2.5 support ( #19170 )
...
* Move dequant_model to after the text_config merge
Add new kimi-k2.5 keys to mtmd convert
Update V_MMPROJ tensor mapping for new mm_projector.proj keys
Update V_M_IMP_NORM for new mm_projector.pre_norm key
* Fix a couple of oversights
* Add image support for Kimi-K2.5
* Revert changes to KimiVLForConditionalGeneration
* Fix an assert crash
* Fix permute swapping w / h on accident
* Kimi-K2.5: Use merged QKV for vision
* Kimi-K2.5: pre-convert vision QK to use build_rope_2d
* Kimi-K2.5: support non-interleaved rope for vision
* Kimi-K2.5: fix min / max pixel
* Kimi-K2.5: remove v/o permutes, unnecessary
* Kimi-K2.5: update permute name to match
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-11 16:47:30 +01:00
Tarek Dakhran
262364e31d
mtmd: Implement tiling for LFM2-VL ( #19454 )
2026-02-09 17:30:32 +01:00
Concedo
ddce19db72
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package-gguf-py.nix
# .devops/nix/scope.nix
# common/CMakeLists.txt
# docs/backend/SYCL.md
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup.cpp
# examples/sycl/run-llama2.sh
# examples/sycl/win-run-llama2.bat
# examples/sycl/win-test.bat
# ggml/src/ggml-hexagon/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hvx-dump.h
# ggml/src/ggml-hexagon/htp/hvx-reduce.h
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# scripts/sync-ggml.last
2026-02-01 22:35:25 +08:00
tc-mb
ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) ( #19211 )
...
Python Type-Check / pyright type-check (push) Has been cancelled
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Concedo
22af5f1250
Merge commit ' 2a13180100' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/cuda-new.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/musa.Dockerfile
# .devops/nix/package.nix
# .devops/rocm.Dockerfile
# .devops/s390x.Dockerfile
# .devops/vulkan.Dockerfile
# .github/workflows/build-cmake-pkg.yml
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# .github/workflows/copilot-setup-steps.yml
# .github/workflows/release.yml
# .github/workflows/server-webui.yml
# .github/workflows/server.yml
# CMakeLists.txt
# README.md
# build-xcframework.sh
# ci/run.sh
# cmake/common.cmake
# common/CMakeLists.txt
# docs/backend/hexagon/CMakeUserPresets.json
# docs/backend/hexagon/README.md
# docs/build-riscv64-spacemit.md
# docs/build.md
# examples/debug/debug.cpp
# examples/eval-callback/CMakeLists.txt
# examples/eval-callback/eval-callback.cpp
# examples/llama.android/lib/build.gradle.kts
# examples/sycl/build.sh
# examples/sycl/win-build-sycl.bat
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/get-rows-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/set-rows-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-hexagon/htp/worker-pool.c
# scripts/debug-test.sh
# scripts/serve-static.js
# scripts/snapdragon/adb/run-bench.sh
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/adb/run-tool.sh
# scripts/tool_bench.py
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/mtmd/clip.cpp
2026-01-16 21:52:01 +08:00
Piotr Wilkin (ilintar)
d98b548120
Restore clip's cb() to its rightful glory - extract common debugging elements in llama ( #17914 )
...
* Extract common debugging functions; plug eval-callback and mtmd's MTMD_DEBUG_GRAPH with same functionality
* Move to common
* Remove unneeded header
* Unlink from common
* chore: update webui build output
* Cleanup; properly pass params to mtmd without depending on common; factorize debug.cpp to use common debug code.
* Revert change to webapp
* Post-merge adjust
* Apply suggestions from code review
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Apply code review changes
* Remove changes to server-context
* Remove mtmd.h include
* Remove utility functions from header
* Apply suggestions from code review
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Rename functions
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-01-14 20:29:35 +01:00
Concedo
7d2c1c4f46
note: clip_is_mrope was moved to mtmd_decode_use_mrope upstream and no longer syncs since https://github.com/ggml-org/llama.cpp/pull/18793
...
Merge commit 'c1e79e610f ' into concedo_experimental
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# CMakeLists.txt
# CONTRIBUTING.md
# MIT_LICENSE_GGML_SDCPP_LLAMACPP_ONLY.md
# README.md
# SECURITY.md
# ci/run.sh
# common/CMakeLists.txt
# common/arg.cpp
# docs/ops.md
# docs/ops/BLAS.csv
# docs/ops/zDNN.csv
# docs/preset.md
# examples/batched/batched.cpp
# examples/debug/debug.cpp
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# licenses/LICENSE-curl
# licenses/LICENSE-httplib
# scripts/pr2wt.sh
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
# vendor/cpp-httplib/LICENSE
2026-01-13 23:31:14 +08:00
Concedo
0dc18c668c
Merge commit ' a61c8bc3bf' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/pr2wt.sh
# src/llama-model.cpp
# tools/CMakeLists.txt
# tools/mtmd/CMakeLists.txt
# tools/mtmd/clip.cpp
# tools/mtmd/clip.h
2026-01-13 23:06:50 +08:00