Concedo
eb30b29d69
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/gguf-publish.yml
# CODEOWNERS
# examples/sycl/test.sh
# pyproject.toml
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
2026-05-08 14:48:57 +08:00
Pascal
cc97e45a14
mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT ( #22770 )
2026-05-07 14:01:01 +02:00
tc-mb
2496f9c149
mtmd : support MiniCPM-V 4.6 ( #22529 )
...
* Support MiniCPM-V 4.6 in new branch
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* fix code bug
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* fix pre-commit
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* fix convert
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* rename clip_graph_minicpmv4_6
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* use new TYPE_MINICPMV4_6
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* use build_attn to allow flash attention support
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* no use legacy code, restored here.
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* use the existing tensors name
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* unused ctx->model.hparams.minicpmv_version
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* use n_merge for slice alignment
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* borrow wa_layer_indexes for vit_merger insertion point
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* fix code style
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* use filter_tensors and add model.vision_tower
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* fix chkhsh
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
* fix type check
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
---------
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-06 21:54:09 +02:00
Concedo
9e9497f0cc
Merge remote-tracking branch 'origin/upstream' into concedo_experimental
...
# Conflicts:
# examples/save-load-state/save-load-state.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/gemm_noshuffle_q4_0_f32.cl
# ggml/src/ggml-opencl/kernels/gemm_noshuffle_q8_0_f32.cl
# ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_0_f32.cl
# ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_0_f32_spec.cl
# ggml/src/ggml-opencl/kernels/gemv_noshuffle_q8_0_f32.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/sync-ggml.last
# scripts/sync_vendor.py
# src/llama-graph.cpp
# tests/test-backend-ops.cpp
# tests/test-state-restore-fragmented.cpp
2026-05-06 21:20:06 +08:00
Yakine Tahtah
a00e47e422
mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) ( #22101 )
...
* mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech)
Conformer encoder with Shaw relative position encoding,
QFormer projector, log-mel spectrogram with frame stacking.
Encoder uses GLU gating, folded batch norm, and SSM depthwise
conv. QFormer compresses encoder output via windowed
cross-attention (window=15, queries=3) into the LLM embedding
space.
Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank,
dynamic range compression, 2x frame stacking (80->160 mel).
GGUF converter handles batch norm folding at export time,
fused K/V split, and Conv1d weight reshaping.
Tested against HF transformers reference: token-for-token match
on 30s/60s audio clips with greedy decoding.
* mtmd: rename gs_ prefixed tensors to generic/architecture names
* mtmd: use tensor_mapping.py for all granite_speech tensors
* convert: fold GraniteSpeechTextModel into GraniteModel
* mtmd: replace n_layer hack with explicit has_standard_layers flag
* mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech
* mtmd: align KEY_A_ define spacing
* convert: register GraniteModel for GraniteSpeechForConditionalGeneration
* convert: fix ty type-check for GraniteSpeechMmprojModel registration
* mtmd: align TN_ define spacing
* mtmd: use generic layer loop for granite speech tensor loading
* mtmd: merge qformer_proj_layer into clip_layer
* mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs
* mtmd: granite_speech add comment explaining why build_attn is not used
* mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata
* gguf: add spacing between granite_speech tensor mapping blocks
* mtmd: make generic audio layer_norm_eps read optional
* mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps
* mtmd: align defines and struct fields in clip-impl.h and clip-model.h
* mtmd: fix alignment and ordering issues across granite speech files
* convert: granite_speech use filter_tensors instead of modify_tensors for skipping
2026-05-06 14:40:59 +02:00
Adrien Gallouët
bf76ac77be
common : only load backends when required ( #22290 )
...
* common : only load backends when required
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* llama : call ggml_backend_load_all() directly from llama_backend_init()
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add ggml_backend_load_all() where llama_backend_init() is not used
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-05-05 09:23:50 +02:00
Concedo
70be589894
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# examples/debug/debug.cpp
# examples/eval-callback/eval-callback.cpp
# ggml/src/ggml-cpu/amx/mmq.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# scripts/pr2wt.sh
2026-04-28 21:13:40 +08:00
Max Krasnyansky
5594d13224
common: fix missing exports in llama-common ( #22340 )
...
* common: refactor common/debug to move abort_on_nan into base_callback_data
Passing bool abort_on_nan as template parameter for common_debug_cb_eval is unnecessary and creates an issue with LTO.
It should just be a member of the base_callback_data instead.
* cont : cleanup
* common : use pimpl in debug.h to reduce header dependencies
Move common_debug_cb_user_data's data members (std::regex,
std::vector<uint8_t>) into a private impl struct in debug.cpp.
This removes the includes of common.h and <regex> from debug.h,
reducing transitive dependencies for any translation unit that
includes the header.
Assisted-by: llama.cpp:local pi
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-27 08:06:39 +03:00
Concedo
0755f27372
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/openvino.Dockerfile
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# common/chat.cpp
# docs/backend/OPENVINO.md
# examples/speculative-simple/speculative-simple.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/libggml-htp.inf
# ggml/src/ggml-openvino/ggml-decoder.cpp
# ggml/src/ggml-openvino/ggml-openvino-extra.cpp
# ggml/src/ggml-openvino/ggml-openvino.cpp
# ggml/src/ggml-openvino/ggml-quants.cpp
# ggml/src/ggml-openvino/openvino/op/rope.cpp
# ggml/src/ggml-openvino/openvino/op_table.cpp
# ggml/src/ggml-openvino/openvino/op_table.h
# ggml/src/ggml-openvino/openvino/translate_session.cpp
# ggml/src/ggml-openvino/openvino/utils.cpp
# ggml/src/ggml-openvino/openvino/utils.h
# ggml/src/ggml-openvino/utils.cpp
# ggml/src/ggml-openvino/utils.h
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/convert.hpp
# ggml/src/ggml-sycl/gemm.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/set_rows.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/mtmd/CMakeLists.txt
# tools/server/CMakeLists.txt
2026-04-23 00:55:05 +08:00
Concedo
becf70d49b
fixed logspam for fit
2026-04-23 00:43:09 +08:00
Xuan-Son Nguyen
82d3f4d3b2
mtmd: also support LLAMA_ROPE_TYPE_NONE ( #22242 )
2026-04-22 12:16:29 +02:00
manayang
7bfe60fdf9
mtmd, llama : Update HunyuanVL vision-language model support ( #22037 )
...
* mtmd, llama : add HunyuanVL vision-language model support
- add LLM_ARCH_HUNYUAN_VL with M-RoPE (XD-RoPE) support
- add PROJECTOR_TYPE_HUNYUANVL with PatchMerger vision encoder
- add HunyuanVL-specific M-RoPE position encoding for image tokens
- add GGUF conversion for HunyuanVL vision and text models
- add smoke test in tools/mtmd/tests.sh
* fix: fix HunyuanVL XD-RoPE h/w section order
* fix: Remove redundant code
* convert : fix HunyuanOCR / HunyuanVL conversion
- Tested locally: both HunyuanOCR and HunyuanVL-4B convert to GGUF
- successfully and produce correct inference output on Metal (F16 / Q8_0).
* clip : fix -Werror=misleading-indentation in bilinear resize
* fix CI: convert_hf_to_gguf type check error
- convert_hf_to_gguf.py: give HunyuanVLTextModel.__init__ an explicit `dir_model: Path` parameter so ty can infer the type for load_hparams instead of reporting `Unknown | None`.
---------
Co-authored-by: wendadawen <wendadawen@tencent.com>
2026-04-22 11:58:43 +02:00
Kwa Jie Hao
98d2d2884e
mtmd: Add support for Reka Edge 2603 ( #21616 )
...
* feat: (vocab) fix stray text appended in llama_decode_text
Remove accidental concatenation of the full `text` string when
formatting UNK_BYTE hex escapes. Only the closing "]" should be appended.
* feat(mtmd): add Yasa2 vision encoder support
Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge:
- Register PROJECTOR_TYPE_YASA2 and tensor name definitions
- Add yasa2_block/yasa2_stage model structs
- Implement graph builder with ConvNeXt stages, GRN, adaptive pooling
- Wire into clip.cpp switch statements and mtmd.cpp init_vision
- Use mtmd_image_preprocessor_fixed_size for image preprocessing
* feat(chat): add reka-edge template handler (tools, thinking)
- Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format
- Add Reka-Edge.jinja chat template
- Detect reka-edge template in try_specialized_template()
- Add LLAMA_EXAMPLE_MTMD to chat-template-file arg
* feat: add reka vlm to gguf conversion script
Converts Reka Yasa2 hf checkpoints to GGUF format:
- Text decoder: Llama-arch with tiktoken/BPE vocab
- Mmproj (--mmproj): ConvNeXt vision backbone + language_projection
- Generates 2D sincos positional embeddings for vision encoder
* test: add Reka Edge chat template and parser tests
- test-chat-template: oracle tests comparing Jinja engine output vs
common_chat_templates_apply for text, tools, thinking, images, video
- test-chat: PEG parser tests for Reka Edge format, round-trip tests
for image/video content parts, common path integration tests
* scripts: add Reka Edge mixed quantization helper
Q4_0 base quantization with Q8_0 override for the last 8 transformer
blocks (layers 24-31) via --tensor-type regex.
* fix: adapt chat-reka and tests to upstream API
- Use autoparser::generation_params (not templates_params)
- Add p.prefix(generation_prompt) to PEG parser
- Simplify reasoning parser to match LFM2 pattern
- Remove image/video oracle tests (unsupported by oaicompat parser;
no other multimodal models test this path)
* fix: avoid duplicate tensor loading in yasa2 vision encoder
TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight",
causing the same tensor to be loaded twice into ctx_data and overflowing
the memory pool. Reuse the tensors already loaded by the common section.
* chore: update image pre-processing settings
The reka-edge model depends on the following settings in an older
fork of llama.cpp:
1. Fixed square resize
2. BICUBIC
3. add_padding=false
In current llama.cpp, this means setting:
- image_resize_algo = RESIZE_ALGO_BICUBIC
- image_resize_pad = false
* chore: remove reka gguf conversion script
* chore: remove reka quantization script
* chore: remove unnecessary changes from PR scope
This commit removes a couple of unnecessary changes for the PR scope:
1. BPE decoder bug fix - this affects reka edge because there's a bug
in our tokenization that doesn't represent <think> tokens as special
tokens. However this isn't meant to be a thinking model so when run
with --reasoning off the edge case does not affect us
2. --chat-template-file support from llama-mtmd-cli - the focus is on
llama-server and the reka edge gguf contains the necessary metadata
to detect the chat template
3. reka edge oracle test cases - no other model has similar test cases,
so I removed it for standardization
* chore: remove unnecessary ggml_cast
This commit removes unnecessary ggml_cast after updating the
reka vlm -> gguf conversion script on hugging face.
* chore: remove redundant code
* chore: remove unnecessary ggml_cont calls
This commit removes all ggml_cont calls except the four that
precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary
because ggml_reshape recomputes strides assuming contiguous
layout and asserts ggml_is_contiguous.
Other operations (ggml_mean, ggml_add, ggml_mul etc.) use
stride-based indexing and handle non-contiguous inputs
correctly and so we are ok to remove ggml_cont for those.
* chore: remove unnecessary ggml_repeat calls
This commit removes unnecessary ggml_repeat calls because the underlying
ops already broadcast automatically.
Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match
a larger one's shape before passing both to an elementwise op (ggml_add,
ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four
of these ops already support broadcasting internally.
* chore: restore ggml_cont needed for cpu operations
* refactor: locate reka chat template handler in chat.cpp
* chore: remove unnecessary warmup tokens
* chore: add code comments on image_resize_pad
* chore: remove custom reka parsing code
* chore: revert common/chat.cpp
* Uncomment debug logging for PEG input parsing
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-04-21 20:02:49 +02:00
Concedo
4629b49afb
updated to handle changes for clip_is_mrope
2026-04-21 19:34:32 +08:00
Concedo
19a12bb080
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# common/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# scripts/sync-ggml.last
# tools/cli/cli.cpp
# tools/llama-bench/llama-bench.cpp
# tools/perplexity/perplexity.cpp
2026-04-21 18:53:03 +08:00
Xuan-Son Nguyen
9998d88bc8
mtmd: correct mtmd_decode_use_mrope() ( #22188 )
2026-04-21 10:53:37 +02:00
Xuan-Son Nguyen
86f8daacfe
mtmd: correct get_n_pos / get_decoder_pos ( #22175 )
2026-04-20 23:29:19 +02:00
Xuan-Son Nguyen
a678916623
mtmd: refactor mtmd_decode_use_mrope ( #22161 )
2026-04-20 14:45:11 +02:00
Concedo
cd6788007e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-cross.yml
# .github/workflows/build-self-hosted.yml
# .github/workflows/release.yml
# examples/llama.android/lib/src/main/cpp/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/ggml-rpc/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync_vendor.py
# tests/test-chat.cpp
# tests/test-mtmd-c-api.c
# tools/server/README.md
2026-04-20 20:19:11 +08:00
Xuan-Son Nguyen
19124078be
mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change) ( #22082 )
...
* mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos
* fix build
2026-04-19 11:57:21 +02:00
Concedo
79882d669a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-android.yml
# .github/workflows/build.yml
# .github/workflows/release.yml
# CMakeLists.txt
# CODEOWNERS
# common/CMakeLists.txt
# common/common.h
# docs/ops.md
# docs/ops/Metal.csv
# examples/batched/CMakeLists.txt
# examples/convert-llama2c-to-ggml/CMakeLists.txt
# examples/debug/CMakeLists.txt
# examples/diffusion/CMakeLists.txt
# examples/embedding/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/gen-docs/CMakeLists.txt
# examples/idle/CMakeLists.txt
# examples/lookahead/CMakeLists.txt
# examples/lookup/CMakeLists.txt
# examples/parallel/CMakeLists.txt
# examples/passkey/CMakeLists.txt
# examples/retrieval/CMakeLists.txt
# examples/save-load-state/CMakeLists.txt
# examples/speculative-simple/CMakeLists.txt
# examples/speculative/CMakeLists.txt
# examples/sycl/CMakeLists.txt
# examples/training/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# pocs/vdot/CMakeLists.txt
# src/CMakeLists.txt
# tests/CMakeLists.txt
# tests/test-quantize-stats.cpp
# tools/batched-bench/CMakeLists.txt
# tools/cli/CMakeLists.txt
# tools/cli/cli.cpp
# tools/completion/CMakeLists.txt
# tools/cvector-generator/CMakeLists.txt
# tools/cvector-generator/cvector-generator.cpp
# tools/export-lora/CMakeLists.txt
# tools/gguf-split/CMakeLists.txt
# tools/gguf-split/gguf-split.cpp
# tools/imatrix/CMakeLists.txt
# tools/llama-bench/CMakeLists.txt
# tools/llama-bench/llama-bench.cpp
# tools/mtmd/CMakeLists.txt
# tools/perplexity/CMakeLists.txt
# tools/quantize/CMakeLists.txt
# tools/quantize/quantize.cpp
# tools/results/CMakeLists.txt
# tools/server/CMakeLists.txt
# tools/tokenize/CMakeLists.txt
# tools/tts/CMakeLists.txt
2026-04-17 22:37:37 +08:00
Concedo
768527b031
Merge commit ' 1e796eb41f' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .github/workflows/build-riscv.yml
# .github/workflows/build-vulkan.yml
# .github/workflows/build.yml
# docs/backend/SYCL.md
# docs/build.md
# docs/development/HOWTO-add-model.md
# embd_res/templates/Reka-Edge.jinja
# ggml/CMakeLists.txt
# ggml/src/ggml-rpc/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/dmmv.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_id.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
# tests/test-chat.cpp
# tools/rpc/README.md
2026-04-17 21:47:29 +08:00
Yuri Khrustalev
a279d0f0f4
ci : add android arm64 build and release ( #21647 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
Update Operations Documentation / update-ops-docs (push) Has been cancelled
* server: respect the ignore eos flag
* ci: add android arm64 build and release
* patch
* pin android-setup actions to v4
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* lf in the suggestion
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-17 11:32:24 +02:00
65a
268d61e178
mtmd: add missing struct tag ( #22023 )
2026-04-17 10:48:33 +02:00
Georgi Gerganov
6990e2f1f7
libs : rename libcommon -> libllama-common ( #21936 )
...
* cmake : allow libcommon to be shared
* cmake : rename libcommon to libllama-common
* cont : set -fPIC for httplib
* cont : export all symbols
* cont : fix build_info exports
* libs : add libllama-common-base
* log : add common_log_get_verbosity_thold()
2026-04-17 11:11:46 +03:00
Xuan-Son Nguyen
408225bb1a
server: use random media marker ( #21962 )
...
* server: use random media marker
* nits
* remove legacy <__image__> token
* revert special char in random
2026-04-15 23:52:22 +02:00
Concedo
ac29e6f0c0
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/vulkan.Dockerfile
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# .github/workflows/release.yml
# .github/workflows/server-self-hosted.yml
# docs/build.md
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hex-utils.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hmx-utils.h
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# tests/test-backend-ops.cpp
# tests/test-mtmd-c-api.c
2026-04-15 15:15:19 +08:00
Xuan-Son Nguyen
707c0b7a6e
mtmd: add mtmd_image_tokens_get_decoder_pos() API ( #21851 )
...
* mtmd: add mtmd_image_tokens_get_decoder_pos() API
* consistent naming
* fix build
2026-04-14 16:07:41 +02:00
Concedo
236ae27329
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/close-issue.yml
# docs/multimodal.md
# embd_res/templates/deepseek-ai-DeepSeek-V3.2.jinja
# ggml/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
# tests/peg-parser/test-gbnf-generation.cpp
# tests/test-chat.cpp
2026-04-14 21:01:41 +08:00
Concedo
9c0b9b0bb1
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/development/HOWTO-add-model.md
# docs/multimodal.md
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/gated_delta_net.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/upscale.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# tests/test-backend-ops.cpp
# tests/test-llama-archs.cpp
# tools/mtmd/CMakeLists.txt
2026-04-14 20:06:04 +08:00
Xuan-Son Nguyen
e974923698
docs: listing qwen3-asr and qwen3-omni as supported ( #21857 )
...
* docs: listing qwen3-asr and qwen3-omni as supported
* nits
2026-04-13 22:28:17 +02:00
Xuan-Son Nguyen
920b3e78cb
mtmd: use causal attn for gemma 4 audio ( #21824 )
2026-04-13 09:47:55 +02:00
Sergiu
82764d8f40
mtmd: fix crash when sending image under 2x2 pixels ( #21711 )
2026-04-12 23:59:21 +02:00
Xuan-Son Nguyen
21a4933042
mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) ( #19441 )
...
* add qwen3a
* wip
* vision ok
* no more deepstack for audio
* convert ASR model ok
* qwen3 asr working
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* nits
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix bad merge
* fix multi inheritance
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-12 23:57:25 +02:00
Xuan-Son Nguyen
aa4695c5e5
mtmd: add gemma 4 test (vision + audio) [no ci] ( #21806 )
...
* mtmd: add gemma 4 test (vision + audio)
* add to docs
2026-04-12 16:29:03 +02:00
Stephen Cox
547765a93e
mtmd: add Gemma 4 audio conformer encoder support ( #21421 )
...
* mtmd: add Gemma 4 audio conformer encoder support
Add audio processing for Gemma 4 E2B/E4B via a USM-style Conformer.
Architecture:
- 12-layer Conformer: FFN → Self-Attention → Causal Conv1D → FFN → Norm
- Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm
- Full self-attention with sinusoidal RPE and sliding window mask (24)
- Logit softcapping at 50.0, ClippableLinear clamping
- Output: 1024 → 1536 → RMSNorm → multimodal embedder
Mel preprocessing (dedicated mtmd_audio_preprocessor_gemma4a):
- HTK mel scale, 128 bins, magnitude STFT, mel_floor=1e-3
- Standard periodic Hann window (320 samples), zero-padded to FFT size
- Semicausal left-padding (frame_length/2 samples)
- Frame count matched to PyTorch (unfold formula)
- No pre-emphasis, no Whisper-style normalization
- Mel cosine similarity vs PyTorch: 0.9998
Key fixes:
- Tensor loading dedup: prevent get_tensor() from creating duplicate
entries in ctx_data. Fixed with std::set guard.
- ClippableLinear clamp_info loading moved after per-layer tensors.
- Sliding window mask (24 positions) matching PyTorch context_size.
- Skip Whisper normalization for Gemma4 mel output.
Tested on E2B and E4B with CPU and Vulkan backends.
Transcribes: "Glad to see things are going well and business is starting
to pick up" (matching ground truth).
Ref: #21325
2026-04-12 14:15:26 +02:00
Concedo
5361b45fba
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# requirements/requirements-tool_bench.txt
2026-04-12 16:22:26 +08:00
Sirui He
073bb2c20b
mtmd : add MERaLiON-2 multimodal audio support ( #21756 )
...
* mtmd : add MERaLiON-2 multimodal audio support
Adds support for A*STAR's MERaLiON-2 audio-language model (3B and 10B)
to the multimodal framework.
Architecture:
- Whisper large-v2 encoder for audio feature extraction
- Gated MLP adaptor: ln_speech -> frame stack (x15) -> Linear+SiLU -> GLU -> out_proj
- Gemma2 3B / 27B decoder
The mmproj GGUF is generated via convert_hf_to_gguf.py --mmproj on the full
MERaLiON-2 model directory (architecture: MERaLiON2ForConditionalGeneration).
The decoder is converted separately as a standard Gemma2 model after stripping
the text_decoder. weight prefix.
New projector type: PROJECTOR_TYPE_MERALION
Supports tasks: speech transcription (EN/ZH/MS/TA), translation, spoken QA.
Model: https://huggingface.co/MERaLiON/MERaLiON-2-3B
https://huggingface.co/MERaLiON/MERaLiON-2-10B
* simplify comments in meralion adaptor
* meralion: use format_tensor_name, ascii arrows in comments
2026-04-11 14:15:48 +02:00
Concedo
8b90bfe094
Merge commit ' 4ef9301e4d' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# docs/multimodal.md
# embd_res/ggml-vocab-gemma-4.gguf
# embd_res/ggml-vocab-gemma-4.gguf.inp
# embd_res/ggml-vocab-gemma-4.gguf.out
# ggml/src/ggml-sycl/fattn-tile.cpp
# ggml/src/ggml-sycl/fattn-tile.hpp
# ggml/src/ggml-sycl/fattn-vec.hpp
# ggml/src/ggml-sycl/fattn.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q8_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-f16.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_0.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_1.cpp
# ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q8_0.cpp
# tests/CMakeLists.txt
# tests/test-jinja.cpp
# tools/mtmd/CMakeLists.txt
2026-04-11 09:38:50 +08:00
Xuan-Son Nguyen
501aeed18f
mtmd: support dots.ocr ( #17575 )
...
* convert gguf
* clip impl
* fix conversion
* wip
* corrections
* update docs
* add gguf to test script
2026-04-09 12:16:38 +02:00
Concedo
c82c0b463a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/release.yml
# examples/debug/debug.cpp
# ggml/src/ggml-cuda/common.cuh
# ggml/src/ggml-cuda/mmq.cuh
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/mtmd/CMakeLists.txt
2026-04-09 17:45:04 +08:00
forforever73
09343c0198
model : support step3-vl-10b ( #21287 )
...
* feat: support step3-vl-10b
* use fused QKV && mapping tensor in tensor_mapping.py
* guard hardcoded params and drop crop metadata
* get understand_projector_stride from global config
* img_u8_resize_bilinear_to_f32 move in step3vl class
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix the \r\n mess
* add width and heads to MmprojModel.set_gguf_parameters
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-08 09:51:31 +02:00
Concedo
15d269197e
Merge commit ' 506200cf8b' into concedo_experimental
...
# Conflicts:
# docs/multimodal.md
# scripts/compare-llama-bench.py
# src/llama-vocab.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
2026-04-07 14:58:36 +08:00
Concedo
a395af65db
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-riscv.yml
# .github/workflows/build.yml
# ggml/src/ggml-hexagon/htp/argsort-ops.c
# ggml/src/ggml-sycl/fattn-tile.hpp
# tools/mtmd/CMakeLists.txt
2026-04-06 20:56:02 +08:00
Xuan-Son Nguyen
3979f2bb08
docs: add hunyuan-ocr gguf, also add test [no ci] ( #21490 )
2026-04-06 14:02:37 +02:00
anchortense
58190cc84d
llama : correct platform-independent loading of BOOL metadata ( #21428 )
...
* model-loader : fix GGUF bool array conversion
* model-loader : fix remaining GGUF bool pointer uses
2026-04-06 01:40:38 +02:00
Richard Davison
af76639f72
model : add HunyuanOCR support ( #21395 )
...
* HunyuanOCR: add support for text and vision models
- Add HunyuanOCR vision projector (perceiver-based) with Conv2d merge
- Add separate HUNYUAN_OCR chat template (content-before-role format)
- Handle HunyuanOCR's invalid pad_token_id=-1 in converter
- Fix EOS/EOT token IDs from generation_config.json
- Support xdrope RoPE scaling type
- Add tensor mappings for perceiver projector (mm.before_rms, mm.after_rms, etc.)
- Register HunYuanVLForConditionalGeneration for both text and mmproj conversion
* fix proper mapping
* Update gguf-py/gguf/tensor_mapping.py
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* address comments
* update
* Fix typecheck
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-05 23:32:14 +02:00
Concedo
e8cffa37c8
fixed gemma4v image crashing on encode, however images are not yet working correctly
2026-04-03 15:56:35 +08:00
Concedo
17678748ac
fixed mtmd build
2026-04-03 14:41:24 +08:00
Concedo
34ad53e950
merged support for gemma4. the e2b, e4b and 26b work, the 31b does not
2026-04-03 11:07:46 +08:00