Concedo
7b393fa487
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# AUTHORS
# ci/run.sh
# docs/backend/SYCL.md
# docs/build.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmo4.0.md
# docs/multimodal/minicpmv2.5.md
# docs/multimodal/minicpmv2.6.md
# docs/multimodal/minicpmv4.0.md
# docs/multimodal/minicpmv4.5.md
# docs/ops.md
# docs/ops/SYCL.csv
# docs/speculative.md
# examples/deprecation-warning/README.md
# examples/deprecation-warning/deprecation-warning.cpp
# examples/model-conversion/Makefile
# examples/model-conversion/scripts/causal/convert-model.sh
# ggml/include/ggml-cann.h
# ggml/src/ggml-cann/acl_tensor.cpp
# ggml/src/ggml-cann/acl_tensor.h
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/concat.cl
# ggml/src/ggml-opencl/kernels/repeat.cl
# ggml/src/ggml-opencl/kernels/scale.cl
# ggml/src/ggml-opencl/kernels/tanh.cl
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/dpct/helper.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/outprod.cpp
# ggml/src/ggml-sycl/rope.cpp
# ggml/src/ggml-sycl/wkv.cpp
# src/llama-vocab.cpp
# tests/test-autorelease.cpp
# tests/test-backend-ops.cpp
# tools/cvector-generator/pca.hpp
# tools/export-lora/export-lora.cpp
# tools/perplexity/README.md
2026-02-03 19:00:42 +08:00
Xuan-Son Nguyen
07a7412a3b
mtmd: add min/max pixels gguf metadata ( #19273 )
2026-02-02 20:59:06 +01:00
Concedo
ddce19db72
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package-gguf-py.nix
# .devops/nix/scope.nix
# common/CMakeLists.txt
# docs/backend/SYCL.md
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup.cpp
# examples/sycl/run-llama2.sh
# examples/sycl/win-run-llama2.bat
# examples/sycl/win-test.bat
# ggml/src/ggml-hexagon/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/hvx-dump.h
# ggml/src/ggml-hexagon/htp/hvx-reduce.h
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# scripts/sync-ggml.last
2026-02-01 22:35:25 +08:00
tc-mb
ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) ( #19211 )
...
Python Type-Check / pyright type-check (push) Has been cancelled
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Concedo
e8e7c357c9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-cache.yml
# .github/workflows/build-cmake-pkg.yml
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# .github/workflows/check-vendor.yml
# .github/workflows/close-issue.yml
# .github/workflows/copilot-setup-steps.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/labeler.yml
# .github/workflows/pre-tokenizer-hashes.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/python-type-check.yml
# .github/workflows/release.yml
# .github/workflows/server-webui.yml
# .github/workflows/server.yml
# .github/workflows/update-ops-docs.yml
# .github/workflows/winget.yml
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-zdnn/ggml-zdnn.cpp
# requirements/requirements-tool_bench.txt
# src/CMakeLists.txt
# src/llama-quant.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/cli/cli.cpp
# tools/server/README.md
2026-01-23 14:27:04 +08:00
Xuan-Son Nguyen
9eb5bfec1a
mtmd : update docs to use llama_model_n_embd_inp ( #18999 )
2026-01-22 14:36:32 +01:00
Concedo
8855a7f52b
Merge commit ' c945aaaef2' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .github/workflows/build.yml
# .github/workflows/release.yml
# README.md
# common/CMakeLists.txt
# common/chat.cpp
# docs/function-calling.md
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# models/templates/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.jinja
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/peg-parser/tests.h
# tests/test-chat-peg-parser.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tests/testing.h
# tools/llama-bench/llama-bench.cpp
2026-01-17 10:24:03 +08:00
Concedo
22af5f1250
Merge commit ' 2a13180100' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/cuda-new.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/musa.Dockerfile
# .devops/nix/package.nix
# .devops/rocm.Dockerfile
# .devops/s390x.Dockerfile
# .devops/vulkan.Dockerfile
# .github/workflows/build-cmake-pkg.yml
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# .github/workflows/copilot-setup-steps.yml
# .github/workflows/release.yml
# .github/workflows/server-webui.yml
# .github/workflows/server.yml
# CMakeLists.txt
# README.md
# build-xcframework.sh
# ci/run.sh
# cmake/common.cmake
# common/CMakeLists.txt
# docs/backend/hexagon/CMakeUserPresets.json
# docs/backend/hexagon/README.md
# docs/build-riscv64-spacemit.md
# docs/build.md
# examples/debug/debug.cpp
# examples/eval-callback/CMakeLists.txt
# examples/eval-callback/eval-callback.cpp
# examples/llama.android/lib/build.gradle.kts
# examples/sycl/build.sh
# examples/sycl/win-build-sycl.bat
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/flash-attn-ops.c
# ggml/src/ggml-hexagon/htp/get-rows-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/rope-ops.c
# ggml/src/ggml-hexagon/htp/set-rows-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-hexagon/htp/worker-pool.c
# scripts/debug-test.sh
# scripts/serve-static.js
# scripts/snapdragon/adb/run-bench.sh
# scripts/snapdragon/adb/run-cli.sh
# scripts/snapdragon/adb/run-mtmd.sh
# scripts/snapdragon/adb/run-tool.sh
# scripts/tool_bench.py
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/mtmd/clip.cpp
2026-01-16 21:52:01 +08:00
Tarek Dakhran
c945aaaef2
mtmd : Fix ASR for LFM2.5-Audio-1.5B ( #18876 )
2026-01-16 11:23:08 +01:00
Piotr Wilkin (ilintar)
d98b548120
Restore clip's cb() to its rightful glory - extract common debugging elements in llama ( #17914 )
...
* Extract common debugging functions; plug eval-callback and mtmd's MTMD_DEBUG_GRAPH with same functionality
* Move to common
* Remove unneeded header
* Unlink from common
* chore: update webui build output
* Cleanup; properly pass params to mtmd without depending on common; factorize debug.cpp to use common debug code.
* Revert change to webapp
* Post-merge adjust
* Apply suggestions from code review
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Apply code review changes
* Remove changes to server-context
* Remove mtmd.h include
* Remove utility functions from header
* Apply suggestions from code review
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Rename functions
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-01-14 20:29:35 +01:00
Concedo
7d2c1c4f46
note: clip_is_mrope was moved to mtmd_decode_use_mrope upstream and no longer syncs since https://github.com/ggml-org/llama.cpp/pull/18793
...
Merge commit 'c1e79e610f ' into concedo_experimental
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# CMakeLists.txt
# CONTRIBUTING.md
# MIT_LICENSE_GGML_SDCPP_LLAMACPP_ONLY.md
# README.md
# SECURITY.md
# ci/run.sh
# common/CMakeLists.txt
# common/arg.cpp
# docs/ops.md
# docs/ops/BLAS.csv
# docs/ops/zDNN.csv
# docs/preset.md
# examples/batched/batched.cpp
# examples/debug/debug.cpp
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# licenses/LICENSE-curl
# licenses/LICENSE-httplib
# scripts/pr2wt.sh
# scripts/sync_vendor.py
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
# vendor/cpp-httplib/LICENSE
2026-01-13 23:31:14 +08:00
Concedo
0dc18c668c
Merge commit ' a61c8bc3bf' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/pr2wt.sh
# src/llama-model.cpp
# tools/CMakeLists.txt
# tools/mtmd/CMakeLists.txt
# tools/mtmd/clip.cpp
# tools/mtmd/clip.h
2026-01-13 23:06:50 +08:00
Xuan-Son Nguyen
e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly ( #18793 )
...
* mtmd: fix use_non_causal being reported incorrectly
* move clip_is_mrope to mtmd_decode_use_mrope
* fix sloppy code ggml_cpy
2026-01-13 12:19:38 +01:00
Simranjeet Singh
a61c8bc3bf
mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder ( #18256 )
...
* Add Gemma3nVisionModel - MobileNetV5 vision encoder convertor to convert_hf_to_gguf.py. Add gemma3n to vision projectors in gguf-py/gguf/constants.py.
* Add mobilenetv5 impl
* Fix comments, remove unused vars
* Fix permute and remove transpose of projection weights
* Fix comments, remove debugging prints from hf_to_gguf
* 1. Hard-code image_mean = 0 and image_std = 1
2. Use available tensor mapping logic
3. Remove redundant chat template replacement of soft tokens placeholder with media placeholder
* 1. Move mobilenetv5 helpers declarations to `clip_graph_mobilenetv5` struct and definitions to mobilenetv5.cpp
2.Remove unused `clip_is_gemma3n` func declarations and definitions
3. Remove redundant `rescale_image_u8_to_f32` func and use `normalize_image_u8_to_f32` with zero mean and unit std
4. Calculate n_patches using image_size / patch_size
* Remove obsolete comments
* - convert_hf_to_gguf.py & constants.py & tensor_mapping.py: Use explicit mapping: Custom map for double indexed blocks and tensor_mapping.py for rest
- convert_hf_to_gguf.py: Unsqueeze Stem Bias and Layer scale tensors to correct shape while converting to gguf
- mobilenetv5.cpp: Remove explicit reshaping of Stem Bias and Layer scale which are now handled while converting to gguf, replace fprintf with LOG_*
- clip.cpp: Remove unused embedding and hard_emb_norm tensor loading
* - Rename tensors to v.conv..., v.blk..., v.msfa... to better align with already existing terminology
* Fix stem conv bias name
* Remove explicit handling of bias term for stem conv
* - Change order of addition in "project_per_layer_inputs" to support broadcasting of vision inp_per_layer
- Simplify the vision embeddings path of "get_per_layer_inputs" to output [n_embd_altup, n_layer, 1], broadcastable
* clean up conversion script
* fix code style
* also preserve audio tensors
* trailing space
* split arch A and V
* rm unused gemma3 func
* fix alignment
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-09 23:42:38 +01:00
Concedo
956ab99934
Merge commit ' 56d2fed2b3' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .gitignore
# README.md
# examples/CMakeLists.txt
# examples/debug/CMakeLists.txt
# examples/model-conversion/scripts/causal/compare-logits.py
# examples/model-conversion/scripts/causal/run-casual-gen-embeddings-org.py
# examples/model-conversion/scripts/causal/run-converted-model-embeddings-logits.sh
# examples/model-conversion/scripts/causal/run-converted-model.sh
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/model-conversion/scripts/embedding/run-converted-model.sh
# examples/model-conversion/scripts/embedding/run-original-model.py
# examples/model-conversion/scripts/utils/common.py
# examples/model-conversion/scripts/utils/semantic_check.py
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-ctx.h
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-utils.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# scripts/snapdragon/adb/run-bench.sh
# tests/test-arg-parser.cpp
# tools/CMakeLists.txt
2026-01-09 00:30:53 +08:00
Tarek Dakhran
ccbc84a537
mtmd: mtmd_audio_streaming_istft ( #18645 )
...
Change is decoupled from https://github.com/ggml-org/llama.cpp/pull/18641 .
[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B )
needs streaming istft for generating output audio.
* add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction
* replace global audio cache with per-instance cache, the model requires
two independent caches, for preprocessing (audio input) and for istft
(audio output).
* unified templated FFT/IFFT implementation supporting both forward and inverse transforms
2026-01-06 21:00:29 +01:00
Concedo
4889f3a11d
Merge commit ' 67e3f6f601' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# .github/workflows/server.yml
# ci/run.sh
# docs/backend/CANN.md
# docs/backend/OPENCL.md
# examples/batched/batched.cpp
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# src/llama-context.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
2026-01-05 20:52:20 +08:00
Tarek Dakhran
4974bf53cf
model : mtmd : make input norm optional in LFM2-VL ( #18594 )
...
Upcoming LFM2-VL releases will have configurable input norm.
See https://github.com/huggingface/transformers/pull/43087 for details.
2026-01-04 18:50:02 +01:00
Concedo
7e1ae49e7d
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cuda/ggml-cuda.cu
# tests/test-backend-ops.cpp
# tools/mtmd/CMakeLists.txt
2026-01-02 11:05:20 +08:00
tt
ced765be44
model: support youtu-vl model ( #18479 )
...
* Support Youtu-VL Model
* merge code
* fix bug
* revert qwen2 code & support rsplit in minja.hpp
* update warm info
* fix annotation
* u
* revert minja.hpp
* fix
* Do not write routed_scaling_factor to gguf when routed_scaling_factor is None
* fix expert_weights_scale
* LGTM after whitespace fixes
* fix
* fix
* fix
* layers to layer_index
* enum fix
---------
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-01 19:25:54 +01:00
Concedo
54e419f587
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# docs/ops.md
# docs/ops/Metal.csv
# ggml/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# grammars/README.md
# models/templates/llama-cpp-deepseek-r1.jinja
# scripts/sync-ggml.last
# tests/test-chat.cpp
2026-01-01 15:34:10 +08:00
Henry147147
9b8329de7a
mtmd : Adding support for Nvidia Music Flamingo Model ( #18470 )
...
* Inital commit, debugging q5_k_s quant
* Made hf_to_gguf extend whisper to reduce code duplication
* addressed convert_hf_to_gguf pull request issue
---------
Co-authored-by: Henry D <henrydorsey147@gmail.com>
2025-12-31 12:13:23 +01:00
Concedo
0e26e4d354
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-rpc/ggml-rpc.cpp
2025-12-28 23:47:55 +08:00
Xuan-Son Nguyen
cffa5c46ea
mtmd: clarify that we no longer accept AI-generated PRs ( #18406 )
2025-12-28 09:57:04 +01:00
Concedo
51b1d12914
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# tests/test-backend-ops.cpp
# tools/mtmd/CMakeLists.txt
2025-12-19 11:11:19 +08:00
Xuan-Son Nguyen
8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) ( #18106 )
...
* ASR with LFM2-Audio-1.5B
* Set rope_theta
* Fix comment
* Remove rope_theta setting
* Address PR feedback
* rename functions to conformer
* remove some redundant ggml_cont
* fix missing tensor
* add prefix "a." for conv tensors
* remove redundant reshape
* clean up
* add test model
---------
Co-authored-by: Tarek Dakhran <tarek@liquid.ai>
2025-12-19 00:18:01 +01:00
Concedo
e005fc2587
Merge commit ' 8dcc3662a2' into concedo_experimental
...
Keep changes from https://github.com/ggml-org/llama.cpp/pull/18096 without https://github.com/ggml-org/llama.cpp/pull/14904
Reason is to maintain compatibility with 2023 w64devkit
# Conflicts:
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/speculative/speculative.cpp
# ggml/src/ggml-cpu/arch-fallback.h
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-cpu/repack.h
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/hvx-utils.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
2025-12-19 02:11:55 +08:00
Concedo
cefb32df19
track clip img patch nx and ny
2025-12-18 22:58:10 +08:00
HonestQiao
15dd67d869
model: fix GLM-ASR-Nano-2512 load error ( #18130 ) ( #18142 )
2025-12-17 16:34:35 +01:00
Concedo
1f2c9f6b62
gpt4v not working correctly
2025-12-17 21:02:16 +08:00
Concedo
1daeed5d4d
Merge commit ' 9963b81f63' into concedo_experimental
...
# Conflicts:
# .github/workflows/server.yml
# SECURITY.md
# docs/backend/SYCL.md
# examples/model-conversion/README.md
# examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
2025-12-17 20:30:34 +08:00
Concedo
050a5b1f52
Merge commit ' 4aced7a631' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/musa.Dockerfile
# .devops/rocm.Dockerfile
# .devops/tools.sh
# .devops/vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/release.yml
# .gitignore
# docs/ops.md
# docs/ops/SYCL.csv
# examples/batched/batched.cpp
# examples/eval-callback/eval-callback.cpp
# examples/gen-docs/gen-docs.cpp
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup-create.cpp
# examples/lookup/lookup-stats.cpp
# examples/lookup/lookup.cpp
# examples/model-conversion/scripts/causal/compare-logits.py
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/model-conversion/scripts/utils/check-nmse.py
# examples/parallel/parallel.cpp
# examples/retrieval/retrieval.cpp
# examples/save-load-state/save-load-state.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# examples/training/finetune.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/dpct/helper.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/pad.cpp
# ggml/src/ggml-sycl/ssm_conv.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# pyrightconfig.json
# scripts/sync-ggml.last
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
# tools/cvector-generator/cvector-generator.cpp
# tools/imatrix/imatrix.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/clip.cpp
# tools/perplexity/perplexity.cpp
# tools/server/README.md
2025-12-16 23:14:12 +08:00
Xuan-Son Nguyen
7b1db3d3b7
arg: clarify auto kvu/np being set on server ( #17997 )
...
* arg: clarify auto kvu/np being set on server
* improve docs
* use invalid_argument
2025-12-16 12:01:27 +01:00
Xuan-Son Nguyen
3d86c6c2b5
model: support GLM4V vision encoder ( #18042 )
...
* convert ok
* no deepstack
* less new tensors
* cgraph ok
* add mrope for text model
* faster patch merger
* add GGML_ROPE_TYPE_MRNORM
* add support for metal
* move glm4v do dedicated graph
* convert: add norm_embd
* clip: add debugging fn
* working correctly
* fix style
* use bicubic
* fix mrope metal
* improve cpu
* convert to neox ordering on conversion
* revert backend changes
* force stop if using old weight
* support moe variant
* fix conversion
* fix convert (2)
* Update tools/mtmd/clip-graph.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* process mrope_section on TextModel base class
* resolve conflict merge
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-16 11:25:26 +01:00
Concedo
e88bf41fdc
Merge commit ' 12280ae905' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# common/CMakeLists.txt
# docs/docker.md
# examples/model-conversion/scripts/causal/compare-logits.py
# ggml/src/ggml-hexagon/htp/rope-ops.c
# tests/test-backend-ops.cpp
# tests/test-barrier.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-12-16 16:29:01 +08:00
Xuan-Son Nguyen
96a181a933
mtmd: refactor audio preprocessing ( #17978 )
...
* mtmd: refactor audio preprocessing
* refactor
Co-authored-by: Tarek <tdakhran@users.noreply.github.com>
* wip
* wip (2)
* improve constructor
* fix use_natural_log
* fix padding for short input
* clean up
* remove need_chunking
---------
Co-authored-by: Tarek <tdakhran@users.noreply.github.com>
2025-12-15 14:16:52 +01:00
piDack
745fa0e78b
model : add glm-asr support ( #17901 )
...
* [model] add glm-asr support
* fix format for ci
* fix convert format for ci
* update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review
* check root architecture for convert hf script
* fix conficlt with upstream
* fix convert script for glm asr & format clip-impl
* format
* restore hparams text
* improved conversion
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-15 03:18:46 +01:00
Haowei Wu
37f5a1093b
mtmd: enhance image resizing in llava_uhd ( #18014 )
2025-12-14 15:57:52 +01:00
Georgi Gerganov
254098a279
common : refactor common_sampler + grammar logic changes ( #17937 )
...
* common : refactor common_sampler + grammar logic changes
* tests : increase max_tokens to get needed response
* batched : fix uninitialized samplers
2025-12-14 10:11:13 +02:00
Xuan-Son Nguyen
e39a2ce66d
clip: move model cgraphs into their own files ( #17965 )
...
* clip: move model cgraphs into their own files
* more explicit enums
* fix linux build
* fix naming
* missing headers
* nits: add comments for contributors
2025-12-12 21:14:48 +01:00
Xuan-Son Nguyen
17158965ac
mtmd: explicitly forbidden inclusion of private header and libcommon ( #17946 )
2025-12-12 15:16:06 +01:00
Concedo
010995c967
Merge commit ' 4df6e859e9' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# ci/run.sh
# examples/gen-docs/gen-docs.cpp
# scripts/snapdragon/adb/run-cli.sh
# tests/test-lora-conversion-inference.sh
# tools/CMakeLists.txt
# tools/completion/CMakeLists.txt
# tools/completion/README.md
# tools/server/CMakeLists.txt
2025-12-12 17:23:25 +08:00
Xuan-Son Nguyen
c6b2c9310c
mtmd: some small clean up ( #17909 )
...
* clip: add support for fused qkv in build_vit
* use bulid_ffn whenever possible
* fix internvl
* mtmd-cli: move image to beginning
* test script: support custom args
2025-12-10 22:20:06 +01:00
Xuan-Son Nguyen
34a6d86982
cli: enable jinja by default ( #17911 )
...
* cli: enable jinja by default
* Update common/arg.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-10 22:19:42 +01:00
Georgi Gerganov
4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant ( #17910 )
...
* ggml : remove GGML_KQ_MASK_PAD constant
* cont : remove comment
2025-12-10 20:53:16 +02:00
Xuan-Son Nguyen
6c2131773c
cli: new CLI experience ( #17824 )
...
* wip
* wip
* fix logging, add display info
* handle commands
* add args
* wip
* move old cli to llama-completion
* rm deprecation notice
* move server to a shared library
* move ci to llama-completion
* add loading animation
* add --show-timings arg
* add /read command, improve LOG_ERR
* add args for speculative decoding, enable show timings by default
* add arg --image and --audio
* fix windows build
* support reasoning_content
* fix llama2c workflow
* color default is auto
* fix merge conflicts
* properly fix color problem
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
* better loading spinner
* make sure to clean color on force-exit
* also clear input files on "/clear"
* simplify common_log_flush
* add warning in mtmd-cli
* implement console writter
* fix data race
* add attribute
* fix llama-completion and mtmd-cli
* add some notes about console::log
* fix compilation
---------
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
Rhys-T
63908b631a
cmake: fix Mach-O current version number ( #17877 )
...
PR #17091 set the VERSION of various libraries to 0.0.abcd, where abcd
is the LLAMA_BUILD_NUMBER. That build number is too large to fit in the
Mach-O 'current version' field's 'micro' part, which only goes up to
255. This just sets the Mach-O current version to 0 to get it building
properly again.
Fixes #17258 .
2025-12-09 13:17:41 +02:00
Concedo
03cec02a3d
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# .github/workflows/winget.yml
# CODEOWNERS
# README.md
# ci/run.sh
# docs/build.md
# docs/ops.md
# docs/ops/Vulkan.csv
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync_vendor.py
# src/CMakeLists.txt
# tests/test-json-schema-to-grammar.cpp
# tests/test-quantize-stats.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-12-03 18:56:31 +08:00
Concedo
83269df91b
Merge commit ' 649495c9d9' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# SECURITY.md
# docs/backend/SYCL.md
# examples/sycl/run-llama2.sh
# examples/sycl/run-llama3.sh
# examples/sycl/win-run-llama2.bat
# examples/sycl/win-run-llama3.bat
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/cpy.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-backend-ops.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/server/CMakeLists.txt
2025-12-03 18:43:46 +08:00
Xuan-Son Nguyen
a96283adc4
mtmd: fix --no-warmup ( #17695 )
2025-12-02 22:48:08 +01:00