Commit graph

219 commits

Author SHA1 Message Date
Concedo
746664fde6 Merge commit '2cd20b72ed' into concedo_experimental
# Conflicts:
#	CONTRIBUTING.md
#	docs/backend/CANN.md
#	docs/backend/SYCL.md
#	docs/backend/snapdragon/README.md
#	docs/backend/snapdragon/windows.md
#	docs/build.md
#	docs/multimodal/MobileVLM.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	examples/debug/README.md
#	examples/llama.vim
#	examples/model-conversion/README.md
#	examples/sycl/README.md
#	ggml/src/ggml-cpu/amx/mmq.cpp
#	ggml/src/ggml-cpu/arch/x86/repack.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp-drv.cpp
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-copy.h
#	ggml/src/ggml-hexagon/htp/hvx-inverse.h
#	ggml/src/ggml-hexagon/htp/hvx-reduce.h
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/worker-pool.c
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cpy.cl
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/quants.hpp
#	ggml/src/ggml-sycl/softmax.cpp
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/pr2wt.sh
#	scripts/server-bench.py
#	scripts/snapdragon/windows/run-cli.ps1
#	tests/test-alloc.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/completion/README.md
#	tools/cvector-generator/cvector-generator.cpp
#	tools/imatrix/README.md
#	tools/perplexity/README.md
#	tools/server/public_simplechat/readme.md
#	tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
Concedo
d20e60ddd5 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	docs/build.md
#	examples/batched/batched.cpp
#	examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
#	examples/deprecation-warning/deprecation-warning.cpp
#	examples/eval-callback/eval-callback.cpp
#	examples/gen-docs/gen-docs.cpp
#	examples/gguf-hash/gguf-hash.cpp
#	examples/gguf/gguf.cpp
#	examples/lookahead/lookahead.cpp
#	examples/lookup/lookup-create.cpp
#	examples/lookup/lookup-merge.cpp
#	examples/lookup/lookup-stats.cpp
#	examples/lookup/lookup.cpp
#	examples/parallel/parallel.cpp
#	examples/passkey/passkey.cpp
#	examples/retrieval/retrieval.cpp
#	examples/save-load-state/save-load-state.cpp
#	examples/simple-chat/simple-chat.cpp
#	examples/simple/simple.cpp
#	examples/speculative-simple/speculative-simple.cpp
#	examples/speculative/speculative.cpp
#	examples/sycl/ls-sycl-device.cpp
#	examples/training/finetune.cpp
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cpu/amx/common.h
#	ggml/src/ggml-cpu/kleidiai/kernels.cpp
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	ggml/src/ggml-opencl/kernels/gemv_noshuffle_general_q8_0_f32.cl
#	ggml/src/ggml-opencl/kernels/transpose.cl
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
#	scripts/get-wikitext-2.sh
#	tests/test-backend-ops.cpp
#	tools/batched-bench/batched-bench.cpp
#	tools/cvector-generator/cvector-generator.cpp
#	tools/export-lora/export-lora.cpp
#	tools/imatrix/imatrix.cpp
#	tools/llama-bench/llama-bench.cpp
#	tools/perplexity/perplexity.cpp
#	tools/rpc/rpc-server.cpp
#	tools/tokenize/tokenize.cpp
2026-03-06 21:19:49 +08:00
JustCommitRandomness
2fbc3b2ae5
Adjust int types in format strings (#2009)
* tweak format sting types
This may not be all of them, but it's the ones which warn on OpenBSD

* complete the changes needed to fix the format string specifers

* avoid using inttypes, directly cast to size_t (u64 usually) instead

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2026-03-06 19:06:18 +08:00
Marcel Petrick
92f7da00b4
chore : correct typos [no ci] (#20041)
* fix(docs): correct typos found during code review

Non-functional changes only:
- Fixed minor spelling mistakes in comments
- Corrected typos in user-facing strings
- No variables, logic, or functional code was modified.

Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>

* Update docs/backend/CANN.md

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

* Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8"

This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256.

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-05 08:50:21 +01:00
Sigbjørn Skjæret
d969e933e1
tools : add missing clocale include in mtmd-cli [no ci] (#20107) 2026-03-04 14:18:04 +01:00
SamareshSingh
cb8f4fa3f8
Fix locale-dependent float printing in GGUF metadata (#17331)
* Set C locale for consistent float formatting across all binaries.

* Add C locale setting to all tools binaries

Add std::setlocale(LC_NUMERIC, "C") to all 16 binaries in the tools/
directory to ensure consistent floating-point formatting.

* Apply suggestion from @JohannesGaessler

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-04 09:30:40 +01:00
Concedo
4e358265a3 Merge commit '8387ffb28d' into concedo_experimental
# Conflicts:
#	docs/backend/VirtGPU.md
#	docs/backend/ZenDNN.md
#	ggml/src/ggml-cpu/amx/amx.cpp
#	ggml/src/ggml-cpu/amx/mmq.cpp
#	ggml/src/ggml-sycl/add-id.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp
#	ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h
#	ggml/src/ggml-virtgpu/backend/backend-dispatched.h
#	ggml/src/ggml-virtgpu/backend/backend-virgl-apir.h
#	ggml/src/ggml-virtgpu/backend/backend.cpp
#	ggml/src/ggml-virtgpu/backend/shared/api_remoting.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_backend.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_cs.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h
#	ggml/src/ggml-virtgpu/backend/shared/apir_cs_rpc.h
#	ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp
#	ggml/src/ggml-virtgpu/ggml-backend-device.cpp
#	ggml/src/ggml-virtgpu/ggml-backend-reg.cpp
#	ggml/src/ggml-virtgpu/ggml-backend.cpp
#	ggml/src/ggml-virtgpu/ggml-remoting.h
#	ggml/src/ggml-virtgpu/include/apir_hw.h
#	ggml/src/ggml-virtgpu/regenerate_remoting.py
#	ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp
#	ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp
#	ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp
#	ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp
#	ggml/src/ggml-virtgpu/virtgpu-forward-impl.h
#	ggml/src/ggml-virtgpu/virtgpu-forward.gen.h
#	ggml/src/ggml-virtgpu/virtgpu.cpp
#	ggml/src/ggml-virtgpu/virtgpu.h
#	ggml/src/ggml-zendnn/CMakeLists.txt
#	ggml/src/ggml-zendnn/ggml-zendnn.cpp
#	src/CMakeLists.txt
#	tests/CMakeLists.txt
#	tests/test-tokenizer-0.sh
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/imatrix/imatrix.cpp
#	tools/server/README.md
2026-02-28 12:45:16 +08:00
Georgi Gerganov
37964f44f9
mtmd : fix padding of n_tokens (#19930) 2026-02-26 18:39:49 +02:00
Concedo
e626de2430 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	embd_res/templates/stepfun-ai-Step-3.5-Flash.jinja
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
#	src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/mtmd/CMakeLists.txt
2026-02-20 15:16:26 +08:00
Concedo
9eb9e4eb83 Merge commit '8a70973557' into concedo_experimental
# Conflicts:
#	docs/backend/CANN.md
#	docs/backend/SYCL.md
#	examples/model-conversion/scripts/utils/tensor-info.py
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/expm1.cl
#	ggml/src/ggml-opencl/kernels/mean.cl
#	ggml/src/ggml-opencl/kernels/softplus.cl
#	ggml/src/ggml-opencl/kernels/sum_rows.cl
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
#	ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/scale.wgsl
#	tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte
2026-02-20 14:36:49 +08:00
megemini
237958db33
model: Add PaddleOCR-VL model support (#18825)
* support PaddleOCR-VL

* clip: update PaddleOCR model loader parameters to prevent OOM during warmup

* [update] add paddleocr vl text model instead of ernie4.5

* [update] restore change of minicpmv

* [update] format

* [update] format

* [update] positions and patch merge permute

* [update] mtmd_decode_use_mrope for paddleocr

* [update] image min/max pixels

* [update] remove set_limit_image_tokens

* upate: preprocess without padding

* clean up

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-19 17:05:25 +01:00
Saba Fallah
e6267a9359
mtmd: build_attn modified, flash_attn on/off via ctx_params (#19729) 2026-02-19 13:50:29 +01:00
Xuan-Son Nguyen
eeef3cfced
model: support GLM-OCR (#19677)
* model: support GLM-OCR

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-18 17:51:40 +01:00
Concedo
72f7e01b27 Merge commit '01d8eaa28d' into concedo_experimental
# Conflicts:
#	build-xcframework.sh
#	scripts/sync_vendor.py
#	tests/test-backend-ops.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/rpc/rpc-server.cpp
2026-02-16 15:36:59 +08:00
Anav Prasad
01d8eaa28d
mtmd : Add Nemotron Nano 12B v2 VL support (#19547)
* nemotron nano v2 vlm support added

* simplified code; addressed reviews

* pre-downsample position embeddings during GGUF conversion for fixed input size
2026-02-14 14:07:00 +01:00
Concedo
55524e160b temp merge, not working 2026-02-13 12:11:26 +08:00
Concedo
261d78eaaa Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	README.md
#	docs/speculative.md
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tools/mtmd/clip.cpp
2026-02-12 18:05:20 +08:00
AesSedai
e463bbdf65
model: Add Kimi-K2.5 support (#19170)
* Move dequant_model to after the text_config merge
Add new kimi-k2.5 keys to mtmd convert
Update V_MMPROJ tensor mapping for new mm_projector.proj keys
Update V_M_IMP_NORM for new mm_projector.pre_norm key

* Fix a couple of oversights

* Add image support for Kimi-K2.5

* Revert changes to KimiVLForConditionalGeneration

* Fix an assert crash

* Fix permute swapping w / h on accident

* Kimi-K2.5: Use merged QKV for vision

* Kimi-K2.5: pre-convert vision QK to use build_rope_2d

* Kimi-K2.5: support non-interleaved rope for vision

* Kimi-K2.5: fix min / max pixel

* Kimi-K2.5: remove v/o permutes, unnecessary

* Kimi-K2.5: update permute name to match

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-11 16:47:30 +01:00
JJJYmmm
fc0fe40049
models : support qwen3.5 series (#19468)
* support qwen3.5 series

* remove deepstack for now, and some code clean

* code clean

* add FULL_ATTENTION_INTERVAL metadata

* code clean

* reorder v heads for linear attention to avoid expensive interleaved repeat
2026-02-10 18:00:26 +02:00
Tarek Dakhran
262364e31d
mtmd: Implement tiling for LFM2-VL (#19454) 2026-02-09 17:30:32 +01:00
Concedo
7b393fa487 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	AUTHORS
#	ci/run.sh
#	docs/backend/SYCL.md
#	docs/build.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmo4.0.md
#	docs/multimodal/minicpmv2.5.md
#	docs/multimodal/minicpmv2.6.md
#	docs/multimodal/minicpmv4.0.md
#	docs/multimodal/minicpmv4.5.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	docs/speculative.md
#	examples/deprecation-warning/README.md
#	examples/deprecation-warning/deprecation-warning.cpp
#	examples/model-conversion/Makefile
#	examples/model-conversion/scripts/causal/convert-model.sh
#	ggml/include/ggml-cann.h
#	ggml/src/ggml-cann/acl_tensor.cpp
#	ggml/src/ggml-cann/acl_tensor.h
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-metal/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/concat.cl
#	ggml/src/ggml-opencl/kernels/repeat.cl
#	ggml/src/ggml-opencl/kernels/scale.cl
#	ggml/src/ggml-opencl/kernels/tanh.cl
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-sycl/dpct/helper.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/outprod.cpp
#	ggml/src/ggml-sycl/rope.cpp
#	ggml/src/ggml-sycl/wkv.cpp
#	src/llama-vocab.cpp
#	tests/test-autorelease.cpp
#	tests/test-backend-ops.cpp
#	tools/cvector-generator/pca.hpp
#	tools/export-lora/export-lora.cpp
#	tools/perplexity/README.md
2026-02-03 19:00:42 +08:00
Xuan-Son Nguyen
07a7412a3b
mtmd: add min/max pixels gguf metadata (#19273) 2026-02-02 20:59:06 +01:00
Concedo
ddce19db72 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/nix/package-gguf-py.nix
#	.devops/nix/scope.nix
#	common/CMakeLists.txt
#	docs/backend/SYCL.md
#	examples/lookahead/lookahead.cpp
#	examples/lookup/lookup.cpp
#	examples/sycl/run-llama2.sh
#	examples/sycl/win-run-llama2.bat
#	examples/sycl/win-test.bat
#	ggml/src/ggml-hexagon/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-dump.h
#	ggml/src/ggml-hexagon/htp/hvx-reduce.h
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	scripts/sync-ggml.last
2026-02-01 22:35:25 +08:00
tc-mb
ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) (#19211)
Some checks failed
Python Type-Check / pyright type-check (push) Has been cancelled
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Concedo
e8e7c357c9 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build-cache.yml
#	.github/workflows/build-cmake-pkg.yml
#	.github/workflows/build-linux-cross.yml
#	.github/workflows/build.yml
#	.github/workflows/check-vendor.yml
#	.github/workflows/close-issue.yml
#	.github/workflows/copilot-setup-steps.yml
#	.github/workflows/docker.yml
#	.github/workflows/editorconfig.yml
#	.github/workflows/gguf-publish.yml
#	.github/workflows/labeler.yml
#	.github/workflows/pre-tokenizer-hashes.yml
#	.github/workflows/python-check-requirements.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/python-type-check.yml
#	.github/workflows/release.yml
#	.github/workflows/server-webui.yml
#	.github/workflows/server.yml
#	.github/workflows/update-ops-docs.yml
#	.github/workflows/winget.yml
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-zdnn/ggml-zdnn.cpp
#	requirements/requirements-tool_bench.txt
#	src/CMakeLists.txt
#	src/llama-quant.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/server/README.md
2026-01-23 14:27:04 +08:00
Xuan-Son Nguyen
9eb5bfec1a
mtmd : update docs to use llama_model_n_embd_inp (#18999) 2026-01-22 14:36:32 +01:00
Concedo
8855a7f52b Merge commit 'c945aaaef2' into concedo_experimental
# Conflicts:
#	.devops/cann.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	README.md
#	common/CMakeLists.txt
#	common/chat.cpp
#	docs/function-calling.md
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	models/templates/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.jinja
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
#	tests/peg-parser/tests.h
#	tests/test-chat-peg-parser.cpp
#	tests/test-chat-template.cpp
#	tests/test-chat.cpp
#	tests/testing.h
#	tools/llama-bench/llama-bench.cpp
2026-01-17 10:24:03 +08:00
Concedo
22af5f1250 Merge commit '2a13180100' into concedo_experimental
# Conflicts:
#	.devops/cann.Dockerfile
#	.devops/cpu.Dockerfile
#	.devops/cuda-new.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/llama-cli-cann.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/nix/package.nix
#	.devops/rocm.Dockerfile
#	.devops/s390x.Dockerfile
#	.devops/vulkan.Dockerfile
#	.github/workflows/build-cmake-pkg.yml
#	.github/workflows/build-linux-cross.yml
#	.github/workflows/build.yml
#	.github/workflows/copilot-setup-steps.yml
#	.github/workflows/release.yml
#	.github/workflows/server-webui.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	README.md
#	build-xcframework.sh
#	ci/run.sh
#	cmake/common.cmake
#	common/CMakeLists.txt
#	docs/backend/hexagon/CMakeUserPresets.json
#	docs/backend/hexagon/README.md
#	docs/build-riscv64-spacemit.md
#	docs/build.md
#	examples/debug/debug.cpp
#	examples/eval-callback/CMakeLists.txt
#	examples/eval-callback/eval-callback.cpp
#	examples/llama.android/lib/build.gradle.kts
#	examples/sycl/build.sh
#	examples/sycl/win-build-sycl.bat
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/act-ops.c
#	ggml/src/ggml-hexagon/htp/binary-ops.c
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/get-rows-ops.c
#	ggml/src/ggml-hexagon/htp/hex-dma.c
#	ggml/src/ggml-hexagon/htp/hex-dma.h
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-utils.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/set-rows-ops.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-hexagon/htp/worker-pool.c
#	scripts/debug-test.sh
#	scripts/serve-static.js
#	scripts/snapdragon/adb/run-bench.sh
#	scripts/snapdragon/adb/run-cli.sh
#	scripts/snapdragon/adb/run-mtmd.sh
#	scripts/snapdragon/adb/run-tool.sh
#	scripts/tool_bench.py
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tools/mtmd/clip.cpp
2026-01-16 21:52:01 +08:00
Tarek Dakhran
c945aaaef2
mtmd : Fix ASR for LFM2.5-Audio-1.5B (#18876) 2026-01-16 11:23:08 +01:00
Piotr Wilkin (ilintar)
d98b548120
Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914)
* Extract common debugging functions; plug eval-callback and mtmd's MTMD_DEBUG_GRAPH with same functionality

* Move to common

* Remove unneeded header

* Unlink from common

* chore: update webui build output

* Cleanup; properly pass params to mtmd without depending on common; factorize debug.cpp to use common debug code.

* Revert change to webapp

* Post-merge adjust

* Apply suggestions from code review

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Apply code review changes

* Remove changes to server-context

* Remove mtmd.h include

* Remove utility functions from header

* Apply suggestions from code review

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Rename functions

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Update tools/mtmd/clip.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-01-14 20:29:35 +01:00
Concedo
7d2c1c4f46 note: clip_is_mrope was moved to mtmd_decode_use_mrope upstream and no longer syncs since https://github.com/ggml-org/llama.cpp/pull/18793
Merge commit 'c1e79e610f' into concedo_experimental

# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	CMakeLists.txt
#	CONTRIBUTING.md
#	MIT_LICENSE_GGML_SDCPP_LLAMACPP_ONLY.md
#	README.md
#	SECURITY.md
#	ci/run.sh
#	common/CMakeLists.txt
#	common/arg.cpp
#	docs/ops.md
#	docs/ops/BLAS.csv
#	docs/ops/zDNN.csv
#	docs/preset.md
#	examples/batched/batched.cpp
#	examples/debug/debug.cpp
#	ggml/src/ggml-blas/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	licenses/LICENSE-curl
#	licenses/LICENSE-httplib
#	scripts/pr2wt.sh
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/llama-bench/llama-bench.cpp
#	tools/server/README.md
#	vendor/cpp-httplib/LICENSE
2026-01-13 23:31:14 +08:00
Concedo
0dc18c668c Merge commit 'a61c8bc3bf' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/pr2wt.sh
#	src/llama-model.cpp
#	tools/CMakeLists.txt
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/clip.cpp
#	tools/mtmd/clip.h
2026-01-13 23:06:50 +08:00
Xuan-Son Nguyen
e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly (#18793)
* mtmd: fix use_non_causal being reported incorrectly

* move clip_is_mrope to mtmd_decode_use_mrope

* fix sloppy code ggml_cpy
2026-01-13 12:19:38 +01:00
Simranjeet Singh
a61c8bc3bf
mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder (#18256)
* Add Gemma3nVisionModel - MobileNetV5 vision encoder convertor to convert_hf_to_gguf.py. Add gemma3n to vision projectors in gguf-py/gguf/constants.py.

* Add mobilenetv5 impl

* Fix comments, remove unused vars

* Fix permute and remove transpose of projection weights

* Fix comments, remove debugging prints from hf_to_gguf

* 1. Hard-code image_mean = 0 and image_std = 1
2. Use available tensor mapping logic
3. Remove redundant chat template replacement of soft tokens placeholder with media placeholder

* 1. Move mobilenetv5 helpers declarations to `clip_graph_mobilenetv5` struct and definitions to mobilenetv5.cpp
2.Remove unused `clip_is_gemma3n` func declarations and definitions
3. Remove redundant `rescale_image_u8_to_f32` func and use `normalize_image_u8_to_f32` with zero mean and unit std
4. Calculate n_patches using image_size / patch_size

* Remove obsolete comments

* - convert_hf_to_gguf.py & constants.py & tensor_mapping.py: Use explicit mapping: Custom map for double indexed blocks and tensor_mapping.py for rest
- convert_hf_to_gguf.py: Unsqueeze Stem Bias and Layer scale tensors to correct shape while converting to gguf
- mobilenetv5.cpp: Remove explicit reshaping of Stem Bias and Layer scale which are now handled while converting to gguf, replace fprintf with LOG_*
- clip.cpp: Remove unused embedding and hard_emb_norm tensor loading

* - Rename tensors to v.conv..., v.blk..., v.msfa... to better align with already existing terminology

* Fix stem conv bias name

* Remove explicit handling of bias term for stem conv

* - Change order of addition in "project_per_layer_inputs" to support broadcasting of vision inp_per_layer
- Simplify the vision embeddings path of "get_per_layer_inputs" to output [n_embd_altup, n_layer, 1], broadcastable

* clean up conversion script

* fix code style

* also preserve audio tensors

* trailing space

* split arch A and V

* rm unused gemma3 func

* fix alignment

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-09 23:42:38 +01:00
Concedo
956ab99934 Merge commit '56d2fed2b3' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	README.md
#	examples/CMakeLists.txt
#	examples/debug/CMakeLists.txt
#	examples/model-conversion/scripts/causal/compare-logits.py
#	examples/model-conversion/scripts/causal/run-casual-gen-embeddings-org.py
#	examples/model-conversion/scripts/causal/run-converted-model-embeddings-logits.sh
#	examples/model-conversion/scripts/causal/run-converted-model.sh
#	examples/model-conversion/scripts/causal/run-org-model.py
#	examples/model-conversion/scripts/embedding/run-converted-model.sh
#	examples/model-conversion/scripts/embedding/run-original-model.py
#	examples/model-conversion/scripts/utils/common.py
#	examples/model-conversion/scripts/utils/semantic_check.py
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-utils.c
#	ggml/src/ggml-hexagon/htp/hvx-utils.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	scripts/snapdragon/adb/run-bench.sh
#	tests/test-arg-parser.cpp
#	tools/CMakeLists.txt
2026-01-09 00:30:53 +08:00
Tarek Dakhran
ccbc84a537
mtmd: mtmd_audio_streaming_istft (#18645)
Change is decoupled from https://github.com/ggml-org/llama.cpp/pull/18641.

[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B)
needs streaming istft for generating output audio.

* add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction
* replace global audio cache with per-instance cache, the model requires
  two independent caches, for preprocessing (audio input) and for istft
  (audio output).
* unified templated FFT/IFFT implementation supporting both forward and inverse transforms
2026-01-06 21:00:29 +01:00
Concedo
4889f3a11d Merge commit '67e3f6f601' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.github/workflows/server.yml
#	ci/run.sh
#	docs/backend/CANN.md
#	docs/backend/OPENCL.md
#	examples/batched/batched.cpp
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cuda/CMakeLists.txt
#	src/llama-context.cpp
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
2026-01-05 20:52:20 +08:00
Tarek Dakhran
4974bf53cf
model : mtmd : make input norm optional in LFM2-VL (#18594)
Upcoming LFM2-VL releases will have configurable input norm.
See https://github.com/huggingface/transformers/pull/43087 for details.
2026-01-04 18:50:02 +01:00
Concedo
7e1ae49e7d Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-cuda/ggml-cuda.cu
#	tests/test-backend-ops.cpp
#	tools/mtmd/CMakeLists.txt
2026-01-02 11:05:20 +08:00
tt
ced765be44
model: support youtu-vl model (#18479)
* Support Youtu-VL Model

* merge code

* fix bug

* revert qwen2 code & support rsplit in minja.hpp

* update warm info

* fix annotation

* u

* revert minja.hpp

* fix

* Do not write routed_scaling_factor to gguf when routed_scaling_factor is None

* fix expert_weights_scale

* LGTM after whitespace fixes

* fix

* fix

* fix

* layers to layer_index

* enum fix

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-01 19:25:54 +01:00
Concedo
54e419f587 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	docs/ops.md
#	docs/ops/Metal.csv
#	ggml/CMakeLists.txt
#	ggml/src/ggml-sycl/CMakeLists.txt
#	grammars/README.md
#	models/templates/llama-cpp-deepseek-r1.jinja
#	scripts/sync-ggml.last
#	tests/test-chat.cpp
2026-01-01 15:34:10 +08:00
Henry147147
9b8329de7a
mtmd : Adding support for Nvidia Music Flamingo Model (#18470)
* Inital commit, debugging q5_k_s quant

* Made hf_to_gguf extend whisper to reduce code duplication

* addressed convert_hf_to_gguf pull request issue

---------

Co-authored-by: Henry D <henrydorsey147@gmail.com>
2025-12-31 12:13:23 +01:00
Concedo
0e26e4d354 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/011-bug-results.yml
#	.github/ISSUE_TEMPLATE/019-bug-misc.yml
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-rpc/ggml-rpc.cpp
2025-12-28 23:47:55 +08:00
Xuan-Son Nguyen
cffa5c46ea
mtmd: clarify that we no longer accept AI-generated PRs (#18406) 2025-12-28 09:57:04 +01:00
Concedo
51b1d12914 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	tests/test-backend-ops.cpp
#	tools/mtmd/CMakeLists.txt
2025-12-19 11:11:19 +08:00
Xuan-Son Nguyen
8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106)
* ASR with LFM2-Audio-1.5B

* Set rope_theta

* Fix comment

* Remove rope_theta setting

* Address PR feedback

* rename functions to conformer

* remove some redundant ggml_cont

* fix missing tensor

* add prefix "a." for conv tensors

* remove redundant reshape

* clean up

* add test model

---------

Co-authored-by: Tarek Dakhran <tarek@liquid.ai>
2025-12-19 00:18:01 +01:00
Concedo
e005fc2587 Merge commit '8dcc3662a2' into concedo_experimental
Keep changes from https://github.com/ggml-org/llama.cpp/pull/18096 without https://github.com/ggml-org/llama.cpp/pull/14904
Reason is to maintain compatibility with 2023 w64devkit

# Conflicts:
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/speculative/speculative.cpp
# ggml/src/ggml-cpu/arch-fallback.h
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-cpu/repack.h
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/hvx-utils.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
2025-12-19 02:11:55 +08:00
Concedo
cefb32df19 track clip img patch nx and ny 2025-12-18 22:58:10 +08:00
HonestQiao
15dd67d869
model: fix GLM-ASR-Nano-2512 load error (#18130) (#18142) 2025-12-17 16:34:35 +01:00
Concedo
1f2c9f6b62 gpt4v not working correctly 2025-12-17 21:02:16 +08:00