Commit graph

173 commits

Author SHA1 Message Date
Concedo
e005fc2587 Merge commit '8dcc3662a2' into concedo_experimental
Keep changes from https://github.com/ggml-org/llama.cpp/pull/18096 without https://github.com/ggml-org/llama.cpp/pull/14904
Reason is to maintain compatibility with 2023 w64devkit

# Conflicts:
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/speculative/speculative.cpp
# ggml/src/ggml-cpu/arch-fallback.h
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-cpu/repack.h
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/hvx-utils.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
2025-12-19 02:11:55 +08:00
Concedo
cefb32df19 track clip img patch nx and ny 2025-12-18 22:58:10 +08:00
HonestQiao
15dd67d869
model: fix GLM-ASR-Nano-2512 load error (#18130) (#18142) 2025-12-17 16:34:35 +01:00
Concedo
1f2c9f6b62 gpt4v not working correctly 2025-12-17 21:02:16 +08:00
Concedo
1daeed5d4d Merge commit '9963b81f63' into concedo_experimental
# Conflicts:
#	.github/workflows/server.yml
#	SECURITY.md
#	docs/backend/SYCL.md
#	examples/model-conversion/README.md
#	examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	tests/CMakeLists.txt
#	tests/test-chat.cpp
#	tests/test-json-schema-to-grammar.cpp
2025-12-17 20:30:34 +08:00
Concedo
050a5b1f52 Merge commit '4aced7a631' into concedo_experimental
# Conflicts:
#	.devops/cann.Dockerfile
#	.devops/cpu.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/rocm.Dockerfile
#	.devops/tools.sh
#	.devops/vulkan.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.gitignore
#	docs/ops.md
#	docs/ops/SYCL.csv
#	examples/batched/batched.cpp
#	examples/eval-callback/eval-callback.cpp
#	examples/gen-docs/gen-docs.cpp
#	examples/lookahead/lookahead.cpp
#	examples/lookup/lookup-create.cpp
#	examples/lookup/lookup-stats.cpp
#	examples/lookup/lookup.cpp
#	examples/model-conversion/scripts/causal/compare-logits.py
#	examples/model-conversion/scripts/causal/run-org-model.py
#	examples/model-conversion/scripts/utils/check-nmse.py
#	examples/parallel/parallel.cpp
#	examples/retrieval/retrieval.cpp
#	examples/save-load-state/save-load-state.cpp
#	examples/speculative-simple/speculative-simple.cpp
#	examples/speculative/speculative.cpp
#	examples/training/finetune.cpp
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/repack.cpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/convert.cpp
#	ggml/src/ggml-sycl/dequantize.hpp
#	ggml/src/ggml-sycl/dpct/helper.hpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/pad.cpp
#	ggml/src/ggml-sycl/ssm_conv.cpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	pyrightconfig.json
#	scripts/sync-ggml.last
#	tests/test-arg-parser.cpp
#	tests/test-backend-ops.cpp
#	tools/cvector-generator/cvector-generator.cpp
#	tools/imatrix/imatrix.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/clip.cpp
#	tools/perplexity/perplexity.cpp
#	tools/server/README.md
2025-12-16 23:14:12 +08:00
Xuan-Son Nguyen
7b1db3d3b7
arg: clarify auto kvu/np being set on server (#17997)
* arg: clarify auto kvu/np being set on server

* improve docs

* use invalid_argument
2025-12-16 12:01:27 +01:00
Xuan-Son Nguyen
3d86c6c2b5
model: support GLM4V vision encoder (#18042)
* convert ok

* no deepstack

* less new tensors

* cgraph ok

* add mrope for text model

* faster patch merger

* add GGML_ROPE_TYPE_MRNORM

* add support for metal

* move glm4v do dedicated graph

* convert: add norm_embd

* clip: add debugging fn

* working correctly

* fix style

* use bicubic

* fix mrope metal

* improve cpu

* convert to neox ordering on conversion

* revert backend changes

* force stop if using old weight

* support moe variant

* fix conversion

* fix convert (2)

* Update tools/mtmd/clip-graph.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* process mrope_section on TextModel base class

* resolve conflict merge

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-16 11:25:26 +01:00
Concedo
e88bf41fdc Merge commit '12280ae905' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	common/CMakeLists.txt
#	docs/docker.md
#	examples/model-conversion/scripts/causal/compare-logits.py
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	tests/test-backend-ops.cpp
#	tests/test-barrier.cpp
#	tools/server/CMakeLists.txt
#	tools/server/README.md
2025-12-16 16:29:01 +08:00
Xuan-Son Nguyen
96a181a933
mtmd: refactor audio preprocessing (#17978)
* mtmd: refactor audio preprocessing

* refactor

Co-authored-by: Tarek <tdakhran@users.noreply.github.com>

* wip

* wip (2)

* improve constructor

* fix use_natural_log

* fix padding for short input

* clean up

* remove need_chunking

---------

Co-authored-by: Tarek <tdakhran@users.noreply.github.com>
2025-12-15 14:16:52 +01:00
piDack
745fa0e78b
model : add glm-asr support (#17901)
* [model] add glm-asr support

* fix format for ci

* fix convert format for ci

* update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review

* check root architecture for convert hf script

* fix conficlt with upstream

* fix convert script for glm asr & format clip-impl

* format

* restore hparams text

* improved conversion

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-15 03:18:46 +01:00
Haowei Wu
37f5a1093b
mtmd: enhance image resizing in llava_uhd (#18014) 2025-12-14 15:57:52 +01:00
Georgi Gerganov
254098a279
common : refactor common_sampler + grammar logic changes (#17937)
* common : refactor common_sampler + grammar logic changes

* tests : increase max_tokens to get needed response

* batched : fix uninitialized samplers
2025-12-14 10:11:13 +02:00
Xuan-Son Nguyen
e39a2ce66d
clip: move model cgraphs into their own files (#17965)
* clip: move model cgraphs into their own files

* more explicit enums

* fix linux build

* fix naming

* missing headers

* nits: add comments for contributors
2025-12-12 21:14:48 +01:00
Xuan-Son Nguyen
17158965ac
mtmd: explicitly forbidden inclusion of private header and libcommon (#17946) 2025-12-12 15:16:06 +01:00
Concedo
010995c967 Merge commit '4df6e859e9' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	ci/run.sh
#	examples/gen-docs/gen-docs.cpp
#	scripts/snapdragon/adb/run-cli.sh
#	tests/test-lora-conversion-inference.sh
#	tools/CMakeLists.txt
#	tools/completion/CMakeLists.txt
#	tools/completion/README.md
#	tools/server/CMakeLists.txt
2025-12-12 17:23:25 +08:00
Xuan-Son Nguyen
c6b2c9310c
mtmd: some small clean up (#17909)
* clip: add support for fused qkv in build_vit

* use bulid_ffn whenever possible

* fix internvl

* mtmd-cli: move image to beginning

* test script: support custom args
2025-12-10 22:20:06 +01:00
Xuan-Son Nguyen
34a6d86982
cli: enable jinja by default (#17911)
* cli: enable jinja by default

* Update common/arg.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-10 22:19:42 +01:00
Georgi Gerganov
4dff236a52
ggml : remove GGML_KQ_MASK_PAD constant (#17910)
* ggml : remove GGML_KQ_MASK_PAD constant

* cont : remove comment
2025-12-10 20:53:16 +02:00
Xuan-Son Nguyen
6c2131773c
cli: new CLI experience (#17824)
* wip

* wip

* fix logging, add display info

* handle commands

* add args

* wip

* move old cli to llama-completion

* rm deprecation notice

* move server to a shared library

* move ci to llama-completion

* add loading animation

* add --show-timings arg

* add /read command, improve LOG_ERR

* add args for speculative decoding, enable show timings by default

* add arg --image and --audio

* fix windows build

* support reasoning_content

* fix llama2c workflow

* color default is auto

* fix merge conflicts

* properly fix color problem

Co-authored-by: bandoti <bandoti@users.noreply.github.com>

* better loading spinner

* make sure to clean color on force-exit

* also clear input files on "/clear"

* simplify common_log_flush

* add warning in mtmd-cli

* implement console writter

* fix data race

* add attribute

* fix llama-completion and mtmd-cli

* add some notes about console::log

* fix compilation

---------

Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
Rhys-T
63908b631a
cmake: fix Mach-O current version number (#17877)
PR #17091 set the VERSION of various libraries to 0.0.abcd, where abcd
is the LLAMA_BUILD_NUMBER. That build number is too large to fit in the
Mach-O 'current version' field's 'micro' part, which only goes up to
255. This just sets the Mach-O current version to 0 to get it building
properly again.

Fixes #17258.
2025-12-09 13:17:41 +02:00
Concedo
03cec02a3d Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	.github/workflows/winget.yml
#	CODEOWNERS
#	README.md
#	ci/run.sh
#	docs/build.md
#	docs/ops.md
#	docs/ops/Vulkan.csv
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/sync_vendor.py
#	src/CMakeLists.txt
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-quantize-stats.cpp
#	tools/server/CMakeLists.txt
#	tools/server/README.md
2025-12-03 18:56:31 +08:00
Concedo
83269df91b Merge commit '649495c9d9' into concedo_experimental
# Conflicts:
#	CONTRIBUTING.md
#	SECURITY.md
#	docs/backend/SYCL.md
#	examples/sycl/run-llama2.sh
#	examples/sycl/run-llama3.sh
#	examples/sycl/win-run-llama2.bat
#	examples/sycl/win-run-llama3.bat
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-sycl/cpy.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-backend-ops.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tools/server/CMakeLists.txt
2025-12-03 18:43:46 +08:00
Xuan-Son Nguyen
a96283adc4
mtmd: fix --no-warmup (#17695) 2025-12-02 22:48:08 +01:00
Xuan-Son Nguyen
ecf74a8417
mtmd: add mtmd_context_params::warmup option (#17652)
* mtmd: add mtmd_context_params::warmup option

* reuse the common_params::warmup
2025-12-01 21:32:25 +01:00
Tarek Dakhran
2ba719519d
model: LFM2-VL fixes (#17577)
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
2025-11-30 21:57:31 +01:00
Xuan-Son Nguyen
7f8ef50cce
clip: fix nb calculation for qwen3-vl (#17594) 2025-11-30 15:33:55 +01:00
Ruben Garcia
06d39dff73
Fix warnings (#1864) 2025-11-29 20:18:38 +08:00
Concedo
eda4a312cb Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/vulkan.Dockerfile
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-sycl/common.hpp
#	tests/test-backend-ops.cpp
#	tools/server/README.md
2025-11-28 13:22:02 +08:00
Han Qingzhe
1d594c295c
clip: (minicpmv) fix resampler kq_scale (#17516)
* debug:"solve minicpmv precision problem"

* “debug minicpmv”

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-11-26 21:44:07 +01:00
LostRuins Concedo
7aea1d7c02 clean up unused llava functions, fix qwen3vl loading 2025-11-18 10:34:55 +08:00
LostRuins Concedo
3fe0e39b62 Merge commit '4dca015b7e' into concedo_experimental
# Conflicts:
#	.github/copilot-instructions.md
#	README.md
#	docs/ops.md
#	docs/ops/CPU.csv
#	docs/ops/CUDA.csv
#	docs/ops/Vulkan.csv
#	ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
#	src/CMakeLists.txt
#	tests/test-backend-ops.cpp
2025-11-16 18:33:58 +08:00
Ankur Verma
c7b7db0445
mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277) 2025-11-15 12:41:16 +01:00
Xuan-Son Nguyen
9b17d74ab7
mtmd: add mtmd_log_set (#17268) 2025-11-14 15:56:19 +01:00
Mike Abbott
4a5b8aff40
cmake : add version to all shared object files (#17091)
When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned.  This applies a version to all generated so files, allowing the package to build without errors.
2025-11-11 13:19:50 +02:00
LostRuins Concedo
5125c0b879 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/vulkan.Dockerfile
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/set_rows.cl
#	ggml/src/ggml-vulkan/ggml-vulkan.cpp
#	ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
#	tests/test-backend-ops.cpp
#	tools/batched-bench/batched-bench.cpp
2025-11-11 17:10:11 +08:00
Xuan-Son Nguyen
4b13a684c5
mtmd: fix patch_size initialized to random value in audio models (#17128)
* mtmd: fix patch_size initialized to random value in audio models

* add default hparams
2025-11-10 11:41:05 +01:00
LostRuins Concedo
df6e303fd3 merge https://github.com/ggml-org/llama.cpp/pull/17128 2025-11-10 11:24:04 +08:00
LostRuins Concedo
d02cb1b117 Revert "fix divide by zero error"
This reverts commit 6cce98eca5.
2025-11-10 11:22:50 +08:00
LostRuins Concedo
6cce98eca5 fix divide by zero error 2025-11-10 01:38:55 +08:00
Georgi Gerganov
b8595b16e6
mtmd : fix embedding size for image input (#17123) 2025-11-09 18:31:02 +02:00
LostRuins Concedo
60a74bdd89 make tool calling work with jinja. but still need to fix qwen omni first (+1 squashed commits)
Squashed commits:

[e394da61e] make tool calling work with jinja. but still need to fix qwen omni first
2025-11-09 16:56:14 +08:00
LostRuins Concedo
4fc022a51f revert qwen vl warmup size 2025-11-09 02:24:49 +08:00
LostRuins Concedo
d6a2ad8455 still not really working right 2025-11-09 01:57:48 +08:00
LostRuins Concedo
e6ca0aa8d0 Merge commit '2f0c2db43e' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	README.md
#	docs/backend/OPENCL.md
#	docs/ops.md
#	docs/ops/CUDA.csv
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/set_rows.tmpl.wgsl
#	scripts/sync-ggml.last
#	src/CMakeLists.txt
#	tools/server/README.md
2025-11-08 23:27:59 +08:00
LostRuins Concedo
64a1cd95a7 fixed missing headers 2025-11-08 11:09:49 +08:00
LostRuins Concedo
dfb0966ed2 not working 2025-11-08 10:49:10 +08:00
LostRuins Concedo
fdcb281a3a Merge commit '2f966b8ed8' into concedo_experimental
# Conflicts:
#	.github/workflows/release.yml
#	docs/docker.md
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-thread-safety.cpp
#	tools/batched-bench/batched-bench.cpp
#	tools/mtmd/clip.cpp
2025-11-08 10:34:17 +08:00
LostRuins Concedo
7061cd1cc9 Merge commit 'e4a71599e5' into concedo_experimental
# Conflicts:
#	CODEOWNERS
#	tools/mtmd/clip.cpp
2025-11-08 10:28:49 +08:00
Sigbjørn Skjæret
9008027aa3
hparams : add n_embd_inp() to support extended embed (#16928)
* add n_embd_full to support extended embed

* don't change output

* rename to n_embd_inp

* restore n_embd where applicable
2025-11-07 19:27:58 +01:00