Commit graph

75 commits

Author SHA1 Message Date
Concedo
af327857ec handle loading very old mmproj that broke after https://github.com/ggml-org/llama.cpp/pull/14928 2025-11-02 02:11:17 +08:00
Concedo
2b00e55356 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	ggml/src/ggml-opencl/kernels/mul_mm_f16_f32_l4_lm.cl
#	ggml/src/ggml-opencl/kernels/mul_mm_f32_f32_l4_lm.cl
#	ggml/src/ggml-sycl/rope.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/rope.tmpl.wgsl
#	requirements/requirements-convert_legacy_llama.txt
#	tests/test-backend-ops.cpp
#	tests/test-rope.cpp
#	tools/server/README.md
2025-10-31 10:52:57 +08:00
JJJYmmm
d261223d24
model: add support for qwen3vl series (#16780)
* support qwen3vl series.

Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>

* bugfix: fix the arch check for qwen3vl-moe.

* use build_ffn

* optimize deepstack structure

* optimize deepstack feature saving

* Revert "optimize deepstack feature saving" for temporal fix

This reverts commit f321b9fdf13e59527408152e73b1071e19a87e71.

* code clean

* use fused qkv in clip

* clean up / rm is_deepstack_layers for simplification

* add test model

* move test model to "big" section

* fix imrope check

* remove trailing whitespace

* fix rope fail

* metal : add imrope support

* add imrope support for sycl

* vulkan: add imrope w/o check

* fix vulkan

* webgpu: add imrope w/o check

* Update gguf-py/gguf/tensor_mapping.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix tensor mapping

---------

Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-30 16:19:14 +01:00
Tianyue-Zhao
bacddc049a
model: Add support for CogVLM model (#15002)
* Added GGUF mappings for CogVLM model

* Add tensor mapping for CogVLM visual encoder

* Add CogVLM to conversion script, no vision part yet

* Added CogVLM vision model to conversion script

* Add graph for CogVLM CLIP model

* Add graph for CogVLM

* Fixes for CogVLM. Now compiles.

* Model now runs

* Fixes for cogvlm graph

* Account for graph context change after rebase

* Changes for whitespace

* Changes in convert script according to comments

* Switch CogVLM LLM graph to merged QKV tensor

* Use rope_type variable instead of direct definition

* Change CogVLM CLIP encoder to use SWIGLU

* Switch CogVLM CLIP to use merged QKV

* Apply rebase edits and remove ggml_cont call that is now unnecessary

* clean up

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-10-30 12:18:50 +01:00
Concedo
16cbe9f24e Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CODEOWNERS
#	docs/ops.md
#	docs/ops/SYCL.csv
#	examples/embedding/README.md
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-sycl/backend.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/norm.cpp
#	ggml/src/ggml-sycl/norm.hpp
#	scripts/snapdragon/adb/run-bench.sh
#	scripts/snapdragon/adb/run-cli.sh
#	src/llama-batch.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tools/llama-bench/README.md
2025-10-30 13:44:46 +08:00
Concedo
472438aad3 Merge commit '5a4ff43e7d' into concedo_experimental
# Conflicts:
#	docs/build.md
#	ggml/src/ggml-hip/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	src/llama-context.cpp
#	tests/test-backend-ops.cpp
2025-10-30 13:13:00 +08:00
Xuan-Son Nguyen
e1ab084803
mtmd : fix idefics3 preprocessing (#16806)
* mtmd : fix idefics3 preprocessing

* disable granite test

* fix test for granite
2025-10-27 23:12:16 +01:00
Xuan-Son Nguyen
c55d53acec
model : add LightOnOCR-1B model (#16764)
* model : add LightOnOCR-1B model

* add test
2025-10-27 16:02:58 +01:00
Concedo
85556118b5 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-cann/acl_tensor.cpp
#	ggml/src/ggml-cann/acl_tensor.h
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/presets.hpp
2025-10-18 10:56:55 +08:00
Xuan-Son Nguyen
1bb4f43380
mtmd : support home-cooked Mistral Small Omni (#14928) 2025-10-16 19:00:31 +02:00
Concedo
bb5cef1756 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/nix/package.nix
#	ci/run.sh
#	ggml/src/ggml-cpu/amx/amx.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/rms_norm.wgsl
#	tools/server/README.md
2025-10-06 22:41:46 +08:00
Gabe Goodhart
ca71fb9b36
model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)
* feat: Add granite-docling conversion using trillion pretokenizer

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Add granite-docling vocab pre enum

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Use granite-docling pre

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Add clip_is_idefics3

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Allow multi-token boundary sequences for image templating

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Add tiling support for idefices3 in clip.cpp

This should likely be moved into llava_uhd::get_slice_instructions, but for
now this avoids disrupting the logic there.

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Partial support for full templating for idefics3 in mtmd

There are still errors encoding some of the image chunks, but the token
sequence now matches transformers _almost_ perfectly, except for the double
newline before the global image which shows up as two consecutive newline
tokens instead of a single double-newline token. I think this is happening
because the blocks are tokenized separately then concatenated.

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Fully working image preprocessing for idefics3 w/ resize and slicing

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: Parse the preprocessor config's longest side and add it to the mmproj hparams

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Use the longest side instead of size * scale_factor

For Granite Docling, these come out to the same value, but that was just a
conicidence.

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Allow batch encoding and remove clip_is_idefics3

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* refactor: Remove unnecessary conditionals for empty token vectors

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* refactor: Use image_manipulation util

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* add test model

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-10-05 14:57:47 +02:00
Concedo
b120e107f9 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.clang-tidy
#	.devops/musa.Dockerfile
#	.github/workflows/build-linux-cross.yml
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	.gitignore
#	CODEOWNERS
#	CONTRIBUTING.md
#	README.md
#	build-xcframework.sh
#	ci/README-MUSA.md
#	ci/run.sh
#	common/CMakeLists.txt
#	docs/docker.md
#	examples/CMakeLists.txt
#	examples/eval-callback/CMakeLists.txt
#	examples/model-conversion/Makefile
#	examples/model-conversion/README.md
#	examples/model-conversion/logits.cpp
#	examples/model-conversion/scripts/causal/compare-logits.py
#	examples/model-conversion/scripts/causal/run-org-model.py
#	examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh
#	examples/model-conversion/scripts/embedding/run-converted-model.sh
#	examples/model-conversion/scripts/embedding/run-original-model.py
#	examples/model-conversion/scripts/utils/check-nmse.py
#	examples/model-conversion/scripts/utils/inspect-org-model.py
#	examples/model-conversion/scripts/utils/semantic_check.py
#	ggml/CMakeLists.txt
#	ggml/include/ggml-zdnn.h
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/set_rows.cl
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/set_rows.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-zdnn/ggml-zdnn.cpp
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-quantize-perf.cpp
#	tests/test-tokenizers-repo.sh
#	tools/perplexity/perplexity.cpp
#	tools/server/tests/README.md
2025-09-27 17:09:14 +08:00
Aleksei Nikiforov
cc1cfa277b
mtmd : fix uninitialized variable in bicubic_resize (#16275)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-26 15:00:44 +02:00
Diego Devesa
50f4281a6f
llama : allow using iGPUs with --device (#15951)
* llama : allow using iGPUs with --device

* mtmd : allow iGPU

* rpc-server : allow iGPU
2025-09-13 16:49:49 +02:00
Concedo
575eb40950 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	docs/multimodal/minicpmv4.0.md
#	examples/model-conversion/Makefile
#	examples/model-conversion/README.md
#	examples/model-conversion/logits.cpp
#	examples/model-conversion/scripts/causal/modelcard.template
#	examples/model-conversion/scripts/utils/hf-create-model.py
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	tests/test-backend-ops.cpp
#	tools/batched-bench/batched-bench.cpp
2025-08-26 19:09:48 +08:00
Xuan-Son Nguyen
79a546220c
mtmd : support Kimi VL model (#15458)
Some checks failed
Python check requirements.txt / check-requirements (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* convert : fix tensor naming conflict for llama 4 vision

* convert ok

* support kimi vision model

* clean up

* fix style

* fix calc number of output tokens

* refactor resize_position_embeddings

* add test case

* rename build fn

* correct a small bug
2025-08-26 12:54:19 +02:00
tc-mb
c4e9239064
model : support MiniCPM-V 4.5 (#15575) 2025-08-26 10:05:55 +02:00
Concedo
8b8396c30c Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	docs/build-s390x.md
#	examples/llama.vim
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/common.h
#	scripts/compare-llama-bench.py
#	src/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tools/llama-bench/README.md
#	tools/llama-bench/llama-bench.cpp
#	tools/server/README.md
2025-08-23 11:35:28 +08:00
Concedo
257992d6b8 possibly unstable, needs testing for fa 2025-08-22 17:35:32 +08:00
Tarek Dakhran
e288693669
readme : model : mtdm : lfm2 improvements (#15476)
* Support untied embeddings

* Increase number of image tokens to 1024

* Add LFM2-VL to readme

* Actually use untied embeddings
2025-08-22 09:29:08 +02:00
Michael Giba
b108e42904
ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without issue (#15221)
* Fix -Werror=return-type so ci/run.sh can run

* Update tools/mtmd/clip.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Remove false now that we have abort

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-08-21 12:06:46 +02:00
Concedo
1c41c38a6a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/cuda.Dockerfile
#	CODEOWNERS
#	README.md
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	scripts/sync-ggml.sh
#	tests/test-chat.cpp
#	tools/batched-bench/batched-bench.cpp
#	tools/mtmd/clip.h
2025-08-20 20:34:45 +08:00
Concedo
140ae92886 Merge commit '65349f26f2' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	tests/test-backend-ops.cpp
2025-08-19 11:35:32 +08:00
Xuan-Son Nguyen
f08c4c0d8d
mtmd : clean up clip_n_output_tokens (#15391) 2025-08-18 22:53:52 +02:00
Sigbjørn Skjæret
baa9255a45
llama : merge conts and reshapes and remove unnecessary cont (#15380)
* remove unnecessary conts and merge reshapes

* restore necessary conts

* merge more conts and reshapes

* merge even more conts and reshapes
2025-08-18 19:30:17 +02:00
Tarek Dakhran
65349f26f2
model : support vision LiquidAI LFM2-VL family (#15347)
* wip lfm2 vision model

* Fix conv weight

* Implement dynamic resolution

* Fix cuda

* support LFM2-VL-450M

* happy CI

* Remove extra `ggml_conv` and put others into the right place

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-08-16 23:33:54 +02:00
Concedo
d5876024ec Merge commit 'f4586ee598' into concedo_experimental
# Conflicts:
#	README.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmv2.6.md
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/add.cl
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tools/perplexity/perplexity.cpp
#	tools/server/README.md
2025-08-14 21:29:52 +08:00
rainred
cf9e5648a7
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750)
* Fix MinicpmV model converter and clip to avoid using hardcode.

* Code update for pr/14750

* Remove unused field, update script path in docs.

* Add version 5 for fallback code.

---------

Co-authored-by: lzhang <zhanglei@modelbest.cn>
2025-08-11 16:12:12 +02:00
Concedo
f430916a71 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	docs/backend/CANN.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmv2.5.md
#	docs/multimodal/minicpmv2.6.md
#	examples/speculative-simple/speculative-simple.cpp
#	ggml/cmake/ggml-config.cmake.in
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/repack.cpp
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/add.cl
#	ggml/src/ggml-opencl/kernels/mul.cl
#	scripts/compare-commits.sh
#	scripts/compare-llama-bench.py
#	scripts/sync-ggml.last
#	tools/server/README.md
2025-08-02 10:25:10 +08:00
tc-mb
952a47f455
mtmd : support MiniCPM-V 4.0 (#14983)
* support minicpm-v 4

* add md

* support MiniCPM-o 4.0

* add default location

* temp rm MiniCPM-o 4.0

* fix code

* fix "minicpmv_projector" default path
2025-07-31 17:22:17 +02:00
Concedo
3284757b56 voxstral mini is really bad 2025-07-29 21:22:17 +08:00
Concedo
b8425f5a9c merge but voxtral not working 2025-07-28 22:08:05 +08:00
Xuan-Son Nguyen
00fa15fedc
mtmd : add support for Voxtral (#14862)
* mtmd : add support for Voxtral

* clean up

* fix python requirements

* add [BEGIN_AUDIO] token

* also support Devstral conversion

* add docs and tests

* fix regression for ultravox

* minor coding style improvement

* correct project activation fn

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-28 15:01:48 +02:00
Concedo
21b7d0a899 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/rocm.Dockerfile
#	docs/build-s390x.md
#	docs/development/HOWTO-add-model.md
#	docs/ops.md
#	docs/ops/CPU.csv
#	docs/ops/CUDA.csv
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cann/acl_tensor.cpp
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/rms_norm.cl
#	scripts/create_ops_docs.py
#	tests/test-backend-ops.cpp
#	tools/export-lora/export-lora.cpp
2025-07-27 17:10:53 +08:00
Concedo
0d72c794fa Merge commit 'c8ade30036' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/im2col_f16.cl
#	ggml/src/ggml-opencl/kernels/im2col_f32.cl
#	ggml/src/ggml-sycl/im2col.cpp
#	tools/mtmd/clip.cpp
2025-07-25 19:42:45 +08:00
kiwi
749e0d27f0
mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503)
* [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip

* Update export-lora.cpp

* Update clip.cpp

* Update export-lora.cpp

* format: use space to replace tab
2025-07-25 13:08:04 +02:00
stduhpf
c8ade30036
Mtmd: add a way to select device for vision encoder (#14236)
* Mtmd: add a way to select device for vision encoder

* simplify

* format

* Warn user if manual device selection failed

* initialize backend to nullptr
2025-07-22 12:51:03 +02:00
Concedo
8cd72ea924 fix for clip first so that it loads qwen omni (expects dual backends) 2025-07-10 22:53:38 +08:00
Concedo
57ce374240 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/011-bug-results.yml
#	.github/labeler.yml
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.gitmodules
#	CMakeLists.txt
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/softmax_4_f16.cl
#	ggml/src/ggml-opencl/kernels/softmax_4_f32.cl
#	ggml/src/ggml-opencl/kernels/softmax_f16.cl
#	ggml/src/ggml-opencl/kernels/softmax_f32.cl
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	scripts/sync-ggml.sh
#	tests/test-backend-ops.cpp
#	tests/test-c.c
2025-07-05 12:16:28 +08:00
Sigbjørn Skjæret
28657a8229
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445) 2025-07-03 23:07:22 +02:00
Concedo
fb13e3e51b Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	src/llama-context.cpp
#	tests/test-backend-ops.cpp
2025-06-22 23:26:15 +08:00
yuiseki
5d5c066de8
mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326)
Mistral Small 2506 models using Pixtral vision encoder were running out
of GPU memory when processing images larger than 1024x1024 pixels due to
exponential memory growth from unlimited image size.

This fix applies the same 1024x1024 limit used by Qwen2VL models to
prevent OOM issues while maintaining compatibility with existing models.
2025-06-22 14:44:57 +02:00
Concedo
b925bbfc6d add simple api example 2025-06-19 23:05:28 +08:00
Concedo
5f0a7a84ae Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	scripts/sync-ggml.last
2025-06-18 21:22:51 +08:00
Xuan-Son Nguyen
413977de32
mtmd : refactor llava-uhd preprocessing logic (#14247)
* mtmd : refactor llava-uhd preprocessing logic

* fix editorconfig
2025-06-18 10:43:57 +02:00
Concedo
0c108f6054 Merge commit '34b7c0439e' into concedo_experimental
# Conflicts:
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/sync-ggml.last
#	src/CMakeLists.txt
#	tools/mtmd/clip.cpp
2025-05-31 12:27:45 +08:00
Xuan-Son Nguyen
10961339b2
mtmd : move helpers to dedicated library (⚠️ breaking change) (#13866)
* mtmd : move helpers to dedicated library

* fix server build

* rm leftover cmakelist code
2025-05-28 22:35:22 +02:00
Concedo
868cb6aff7 Merge commit 'e121edc432' into concedo_experimental
# Conflicts:
#	.github/workflows/release.yml
#	common/CMakeLists.txt
#	docs/function-calling.md
#	ggml/src/ggml-sycl/binbcast.cpp
#	models/templates/README.md
#	scripts/tool_bench.py
#	src/llama-kv-cache.cpp
#	tests/CMakeLists.txt
#	tests/test-chat.cpp
#	tools/mtmd/clip.h
#	tools/rpc/rpc-server.cpp
#	tools/server/README.md
2025-05-28 00:20:45 +08:00
Xuan-Son Nguyen
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784)
* mtmd : allow multiple modalities at the same time

* refactor mtmd tokenizer

* fix compile

* ok, missing SinusoidsPositionEmbedding

* first working version

* fix style

* more strict validate of n_embd

* refactor if..else to switch

* fix regression

* add test for 3B

* update docs

* fix tokenizing with add_special

* add more tests

* fix test case "huge"

* rm redundant code

* set_position_mrope_1d rm n_tokens
2025-05-27 14:06:10 +02:00