koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-20 09:25:53 +00:00

Author	SHA1	Message	Date
Concedo	9203b6a051	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-self-hosted.yml # .github/workflows/release.yml # .github/workflows/server-sanitize.yml # .github/workflows/server-self-hosted.yml # .github/workflows/server.yml # .github/workflows/ui-build.yml # .github/workflows/ui-ci.yml # .github/workflows/ui-publish.yml # .gitignore # CMakeLists.txt # CODEOWNERS # scripts/ui-download.cmake # scripts/xxd.cmake # tests/test-backend-ops.cpp # tests/test-reasoning-budget.cpp # tools/CMakeLists.txt # tools/server/CMakeLists.txt # tools/server/README.md	2026-05-16 22:56:33 +08:00
Xuan-Son Nguyen	72e60f500d	mtmd: add chunks and fix preproc for qwen3a (#23073 ) * mtmd: add chunks and fix preproc for qwen3a * add attn_mask * limit mtmd_chunk size (avoid blow up memory) * correct audio tokens * re-order the set_input case * remove attn_mask	2026-05-15 19:32:47 +02:00
Concedo	cc82c3164e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # .github/workflows/build-cross.yml # .github/workflows/build-sycl.yml # .github/workflows/build.yml # .github/workflows/editorconfig.yml # .github/workflows/release.yml # cmake/riscv64-spacemit-linux-gnu-gcc.cmake # docs/backend/OPENVINO.md # docs/backend/SYCL.md # docs/build-riscv64-spacemit.md # docs/ops.md # docs/ops/WebGPU.csv # embd_res/ggml-vocab-qwen35.gguf # embd_res/ggml-vocab-qwen35.gguf.inp # embd_res/ggml-vocab-qwen35.gguf.out # examples/model-conversion/Makefile # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/hmx-flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/hmx-utils.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/hvx-utils.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/common.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_reduce.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec_acc.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl # ggml/src/ggml-zendnn/CMakeLists.txt # ggml/src/ggml-zendnn/ggml-zendnn.cpp # scripts/snapdragon/adb/run-completion.sh # tests/CMakeLists.txt # tools/cli/README.md # tools/completion/README.md # tools/mtmd/clip-impl.h # tools/mtmd/clip.cpp # tools/mtmd/clip.h # tools/server/README.md	2026-05-14 19:04:04 +08:00
Georgi Gerganov	67b2b7f2f2	logs : reduce (#23021 ) Some checks failed Python Type-Check / python type-check (push) Waiting to run Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Update Operations Documentation / update-ops-docs (push) Has been cancelled Details * logs : reduce * args : fix envs * server : fix build * common : print verbosity level at start * server : clean-up logs * server : print prompt processing timings + sampling params * minor : whitespaces	2026-05-14 13:05:52 +03:00
Xuan-Son Nguyen	7bfe120c21	mtmd, server, common: expose modalities to /v1/models (#22952 ) * mtmd, server, common: expose modalities to /v1/models * fix build * rename to mtmd_caps	2026-05-12 19:08:07 +02:00
Concedo	f7923b261f	need to fix cuda compile. Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/python-type-check.yml # examples/speculative-simple/README.md # examples/speculative-simple/speculative-simple.cpp # ggml/src/ggml-cuda/im2col.cu # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # tests/test-backend-ops.cpp # tools/cli/README.md # tools/mtmd/CMakeLists.txt # tools/server/README.md	2026-05-12 20:47:07 +08:00
AesSedai	4178259130	mtmd: add MiMo v2.5 vision (#22883 ) * mimo-v2.5: vision support * mimo-v2.5: use fused qkv for vision * mimi-v2.5: fix f16 vision overflow * mimo-v2.5: comment cleanups * mimo-v2.5: Flash doesn't have mmproj more cleanup remember to use filter_tensors * mimo-v2.5: fix trailing whitespace	2026-05-12 11:11:14 +02:00
Concedo	eb30b29d69	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/gguf-publish.yml # CODEOWNERS # examples/sycl/test.sh # pyproject.toml # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md	2026-05-08 14:48:57 +08:00
Pascal	cc97e45a14	mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT (#22770 )	2026-05-07 14:01:01 +02:00
tc-mb	2496f9c149	mtmd : support MiniCPM-V 4.6 (#22529 ) * Support MiniCPM-V 4.6 in new branch Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix code bug Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix pre-commit Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix convert Signed-off-by: tc-mb <tianchi_cai@icloud.com> * rename clip_graph_minicpmv4_6 Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use new TYPE_MINICPMV4_6 Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use build_attn to allow flash attention support Signed-off-by: tc-mb <tianchi_cai@icloud.com> * no use legacy code, restored here. Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use the existing tensors name Signed-off-by: tc-mb <tianchi_cai@icloud.com> * unused ctx->model.hparams.minicpmv_version Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use n_merge for slice alignment Signed-off-by: tc-mb <tianchi_cai@icloud.com> * borrow wa_layer_indexes for vit_merger insertion point Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix code style Signed-off-by: tc-mb <tianchi_cai@icloud.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * use filter_tensors and add model.vision_tower Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix chkhsh Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix type check Signed-off-by: tc-mb <tianchi_cai@icloud.com> --------- Signed-off-by: tc-mb <tianchi_cai@icloud.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-06 21:54:09 +02:00
Concedo	9e9497f0cc	Merge remote-tracking branch 'origin/upstream' into concedo_experimental # Conflicts: # examples/save-load-state/save-load-state.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/gemm_noshuffle_q4_0_f32.cl # ggml/src/ggml-opencl/kernels/gemm_noshuffle_q8_0_f32.cl # ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_0_f32.cl # ggml/src/ggml-opencl/kernels/gemv_noshuffle_q4_0_f32_spec.cl # ggml/src/ggml-opencl/kernels/gemv_noshuffle_q8_0_f32.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/sync-ggml.last # scripts/sync_vendor.py # src/llama-graph.cpp # tests/test-backend-ops.cpp # tests/test-state-restore-fragmented.cpp	2026-05-06 21:20:06 +08:00
Yakine Tahtah	a00e47e422	mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101 ) * mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) Conformer encoder with Shaw relative position encoding, QFormer projector, log-mel spectrogram with frame stacking. Encoder uses GLU gating, folded batch norm, and SSM depthwise conv. QFormer compresses encoder output via windowed cross-attention (window=15, queries=3) into the LLM embedding space. Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank, dynamic range compression, 2x frame stacking (80->160 mel). GGUF converter handles batch norm folding at export time, fused K/V split, and Conv1d weight reshaping. Tested against HF transformers reference: token-for-token match on 30s/60s audio clips with greedy decoding. * mtmd: rename gs_ prefixed tensors to generic/architecture names * mtmd: use tensor_mapping.py for all granite_speech tensors * convert: fold GraniteSpeechTextModel into GraniteModel * mtmd: replace n_layer hack with explicit has_standard_layers flag * mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech * mtmd: align KEY_A_ define spacing * convert: register GraniteModel for GraniteSpeechForConditionalGeneration * convert: fix ty type-check for GraniteSpeechMmprojModel registration * mtmd: align TN_ define spacing * mtmd: use generic layer loop for granite speech tensor loading * mtmd: merge qformer_proj_layer into clip_layer * mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs * mtmd: granite_speech add comment explaining why build_attn is not used * mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata * gguf: add spacing between granite_speech tensor mapping blocks * mtmd: make generic audio layer_norm_eps read optional * mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps * mtmd: align defines and struct fields in clip-impl.h and clip-model.h * mtmd: fix alignment and ordering issues across granite speech files * convert: granite_speech use filter_tensors instead of modify_tensors for skipping	2026-05-06 14:40:59 +02:00
Adrien Gallouët	bf76ac77be	common : only load backends when required (#22290 ) * common : only load backends when required Signed-off-by: Adrien Gallouët <angt@huggingface.co> * llama : call ggml_backend_load_all() directly from llama_backend_init() Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add ggml_backend_load_all() where llama_backend_init() is not used Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-05 09:23:50 +02:00
Concedo	70be589894	Merge branch 'upstream' into concedo_experimental # Conflicts: # CODEOWNERS # examples/debug/debug.cpp # examples/eval-callback/eval-callback.cpp # ggml/src/ggml-cpu/amx/mmq.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # scripts/pr2wt.sh	2026-04-28 21:13:40 +08:00
Max Krasnyansky	5594d13224	common: fix missing exports in llama-common (#22340 ) * common: refactor common/debug to move abort_on_nan into base_callback_data Passing bool abort_on_nan as template parameter for common_debug_cb_eval is unnecessary and creates an issue with LTO. It should just be a member of the base_callback_data instead. * cont : cleanup * common : use pimpl in debug.h to reduce header dependencies Move common_debug_cb_user_data's data members (std::regex, std::vector<uint8_t>) into a private impl struct in debug.cpp. This removes the includes of common.h and <regex> from debug.h, reducing transitive dependencies for any translation unit that includes the header. Assisted-by: llama.cpp:local pi --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-27 08:06:39 +03:00
Concedo	0755f27372	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/openvino.Dockerfile # .github/workflows/build-self-hosted.yml # .github/workflows/build.yml # common/chat.cpp # docs/backend/OPENVINO.md # examples/speculative-simple/speculative-simple.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/libggml-htp.inf # ggml/src/ggml-openvino/ggml-decoder.cpp # ggml/src/ggml-openvino/ggml-openvino-extra.cpp # ggml/src/ggml-openvino/ggml-openvino.cpp # ggml/src/ggml-openvino/ggml-quants.cpp # ggml/src/ggml-openvino/openvino/op/rope.cpp # ggml/src/ggml-openvino/openvino/op_table.cpp # ggml/src/ggml-openvino/openvino/op_table.h # ggml/src/ggml-openvino/openvino/translate_session.cpp # ggml/src/ggml-openvino/openvino/utils.cpp # ggml/src/ggml-openvino/openvino/utils.h # ggml/src/ggml-openvino/utils.cpp # ggml/src/ggml-openvino/utils.h # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/convert.hpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/set_rows.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync_vendor.py # tests/CMakeLists.txt # tests/test-chat.cpp # tools/cli/cli.cpp # tools/mtmd/CMakeLists.txt # tools/server/CMakeLists.txt	2026-04-23 00:55:05 +08:00
Concedo	becf70d49b	fixed logspam for fit	2026-04-23 00:43:09 +08:00
Xuan-Son Nguyen	82d3f4d3b2	mtmd: also support LLAMA_ROPE_TYPE_NONE (#22242 )	2026-04-22 12:16:29 +02:00
manayang	7bfe60fdf9	mtmd, llama : Update HunyuanVL vision-language model support (#22037 ) * mtmd, llama : add HunyuanVL vision-language model support - add LLM_ARCH_HUNYUAN_VL with M-RoPE (XD-RoPE) support - add PROJECTOR_TYPE_HUNYUANVL with PatchMerger vision encoder - add HunyuanVL-specific M-RoPE position encoding for image tokens - add GGUF conversion for HunyuanVL vision and text models - add smoke test in tools/mtmd/tests.sh * fix: fix HunyuanVL XD-RoPE h/w section order * fix: Remove redundant code * convert : fix HunyuanOCR / HunyuanVL conversion - Tested locally: both HunyuanOCR and HunyuanVL-4B convert to GGUF - successfully and produce correct inference output on Metal (F16 / Q8_0). * clip : fix -Werror=misleading-indentation in bilinear resize * fix CI: convert_hf_to_gguf type check error - convert_hf_to_gguf.py: give HunyuanVLTextModel.__init__ an explicit `dir_model: Path` parameter so ty can infer the type for load_hparams instead of reporting `Unknown \| None`. --------- Co-authored-by: wendadawen <wendadawen@tencent.com>	2026-04-22 11:58:43 +02:00
Kwa Jie Hao	98d2d2884e	mtmd: Add support for Reka Edge 2603 (#21616 ) * feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>	2026-04-21 20:02:49 +02:00
Concedo	4629b49afb	updated to handle changes for clip_is_mrope	2026-04-21 19:34:32 +08:00
Concedo	19a12bb080	Merge branch 'upstream' into concedo_experimental # Conflicts: # CODEOWNERS # common/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # scripts/sync-ggml.last # tools/cli/cli.cpp # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp	2026-04-21 18:53:03 +08:00
Xuan-Son Nguyen	9998d88bc8	mtmd: correct mtmd_decode_use_mrope() (#22188 )	2026-04-21 10:53:37 +02:00
Xuan-Son Nguyen	86f8daacfe	mtmd: correct get_n_pos / get_decoder_pos (#22175 )	2026-04-20 23:29:19 +02:00
Xuan-Son Nguyen	a678916623	mtmd: refactor mtmd_decode_use_mrope (#22161 )	2026-04-20 14:45:11 +02:00
Concedo	cd6788007e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-cross.yml # .github/workflows/build-self-hosted.yml # .github/workflows/release.yml # examples/llama.android/lib/src/main/cpp/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/ggml-rpc/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/sync_vendor.py # tests/test-chat.cpp # tests/test-mtmd-c-api.c # tools/server/README.md	2026-04-20 20:19:11 +08:00
Xuan-Son Nguyen	19124078be	mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change) (#22082 ) * mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos * fix build	2026-04-19 11:57:21 +02:00
Concedo	79882d669a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-android.yml # .github/workflows/build.yml # .github/workflows/release.yml # CMakeLists.txt # CODEOWNERS # common/CMakeLists.txt # common/common.h # docs/ops.md # docs/ops/Metal.csv # examples/batched/CMakeLists.txt # examples/convert-llama2c-to-ggml/CMakeLists.txt # examples/debug/CMakeLists.txt # examples/diffusion/CMakeLists.txt # examples/embedding/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/gen-docs/CMakeLists.txt # examples/idle/CMakeLists.txt # examples/lookahead/CMakeLists.txt # examples/lookup/CMakeLists.txt # examples/parallel/CMakeLists.txt # examples/passkey/CMakeLists.txt # examples/retrieval/CMakeLists.txt # examples/save-load-state/CMakeLists.txt # examples/speculative-simple/CMakeLists.txt # examples/speculative/CMakeLists.txt # examples/sycl/CMakeLists.txt # examples/training/CMakeLists.txt # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # pocs/vdot/CMakeLists.txt # src/CMakeLists.txt # tests/CMakeLists.txt # tests/test-quantize-stats.cpp # tools/batched-bench/CMakeLists.txt # tools/cli/CMakeLists.txt # tools/cli/cli.cpp # tools/completion/CMakeLists.txt # tools/cvector-generator/CMakeLists.txt # tools/cvector-generator/cvector-generator.cpp # tools/export-lora/CMakeLists.txt # tools/gguf-split/CMakeLists.txt # tools/gguf-split/gguf-split.cpp # tools/imatrix/CMakeLists.txt # tools/llama-bench/CMakeLists.txt # tools/llama-bench/llama-bench.cpp # tools/mtmd/CMakeLists.txt # tools/perplexity/CMakeLists.txt # tools/quantize/CMakeLists.txt # tools/quantize/quantize.cpp # tools/results/CMakeLists.txt # tools/server/CMakeLists.txt # tools/tokenize/CMakeLists.txt # tools/tts/CMakeLists.txt	2026-04-17 22:37:37 +08:00
Concedo	768527b031	Merge commit '`1e796eb41f`' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/workflows/build-riscv.yml # .github/workflows/build-vulkan.yml # .github/workflows/build.yml # docs/backend/SYCL.md # docs/build.md # docs/development/HOWTO-add-model.md # embd_res/templates/Reka-Edge.jinja # ggml/CMakeLists.txt # ggml/src/ggml-rpc/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_id.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl # tests/test-chat.cpp # tools/rpc/README.md	2026-04-17 21:47:29 +08:00
Yuri Khrustalev	a279d0f0f4	ci : add android arm64 build and release (#21647 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details Update Operations Documentation / update-ops-docs (push) Has been cancelled Details * server: respect the ignore eos flag * ci: add android arm64 build and release * patch * pin android-setup actions to v4 * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * lf in the suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-17 11:32:24 +02:00
65a	268d61e178	mtmd: add missing struct tag (#22023 )	2026-04-17 10:48:33 +02:00
Georgi Gerganov	6990e2f1f7	libs : rename libcommon -> libllama-common (#21936 ) * cmake : allow libcommon to be shared * cmake : rename libcommon to libllama-common * cont : set -fPIC for httplib * cont : export all symbols * cont : fix build_info exports * libs : add libllama-common-base * log : add common_log_get_verbosity_thold()	2026-04-17 11:11:46 +03:00
Xuan-Son Nguyen	408225bb1a	server: use random media marker (#21962 ) * server: use random media marker * nits * remove legacy <__image__> token * revert special char in random	2026-04-15 23:52:22 +02:00
Concedo	ac29e6f0c0	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/vulkan.Dockerfile # .github/workflows/build-self-hosted.yml # .github/workflows/build.yml # .github/workflows/release.yml # .github/workflows/server-self-hosted.yml # docs/build.md # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/hex-utils.h # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/hmx-utils.h # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-webgpu/ggml-webgpu.cpp # tests/test-backend-ops.cpp # tests/test-mtmd-c-api.c	2026-04-15 15:15:19 +08:00
Xuan-Son Nguyen	707c0b7a6e	mtmd: add mtmd_image_tokens_get_decoder_pos() API (#21851 ) * mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build	2026-04-14 16:07:41 +02:00
Concedo	236ae27329	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/close-issue.yml # docs/multimodal.md # embd_res/templates/deepseek-ai-DeepSeek-V3.2.jinja # ggml/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl # tests/peg-parser/test-gbnf-generation.cpp # tests/test-chat.cpp	2026-04-14 21:01:41 +08:00
Concedo	9c0b9b0bb1	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/development/HOWTO-add-model.md # docs/multimodal.md # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/gated_delta_net.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/upscale.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # tests/test-backend-ops.cpp # tests/test-llama-archs.cpp # tools/mtmd/CMakeLists.txt	2026-04-14 20:06:04 +08:00
Xuan-Son Nguyen	e974923698	docs: listing qwen3-asr and qwen3-omni as supported (#21857 ) * docs: listing qwen3-asr and qwen3-omni as supported * nits	2026-04-13 22:28:17 +02:00
Xuan-Son Nguyen	920b3e78cb	mtmd: use causal attn for gemma 4 audio (#21824 )	2026-04-13 09:47:55 +02:00
Sergiu	82764d8f40	mtmd: fix crash when sending image under 2x2 pixels (#21711 )	2026-04-12 23:59:21 +02:00
Xuan-Son Nguyen	21a4933042	mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) (#19441 ) * add qwen3a * wip * vision ok * no more deepstack for audio * convert ASR model ok * qwen3 asr working * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix bad merge * fix multi inheritance --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-12 23:57:25 +02:00
Xuan-Son Nguyen	aa4695c5e5	mtmd: add gemma 4 test (vision + audio) [no ci] (#21806 ) * mtmd: add gemma 4 test (vision + audio) * add to docs	2026-04-12 16:29:03 +02:00
Stephen Cox	547765a93e	mtmd: add Gemma 4 audio conformer encoder support (#21421 ) * mtmd: add Gemma 4 audio conformer encoder support Add audio processing for Gemma 4 E2B/E4B via a USM-style Conformer. Architecture: - 12-layer Conformer: FFN → Self-Attention → Causal Conv1D → FFN → Norm - Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm - Full self-attention with sinusoidal RPE and sliding window mask (24) - Logit softcapping at 50.0, ClippableLinear clamping - Output: 1024 → 1536 → RMSNorm → multimodal embedder Mel preprocessing (dedicated mtmd_audio_preprocessor_gemma4a): - HTK mel scale, 128 bins, magnitude STFT, mel_floor=1e-3 - Standard periodic Hann window (320 samples), zero-padded to FFT size - Semicausal left-padding (frame_length/2 samples) - Frame count matched to PyTorch (unfold formula) - No pre-emphasis, no Whisper-style normalization - Mel cosine similarity vs PyTorch: 0.9998 Key fixes: - Tensor loading dedup: prevent get_tensor() from creating duplicate entries in ctx_data. Fixed with std::set guard. - ClippableLinear clamp_info loading moved after per-layer tensors. - Sliding window mask (24 positions) matching PyTorch context_size. - Skip Whisper normalization for Gemma4 mel output. Tested on E2B and E4B with CPU and Vulkan backends. Transcribes: "Glad to see things are going well and business is starting to pick up" (matching ground truth). Ref: #21325	2026-04-12 14:15:26 +02:00
Concedo	5361b45fba	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # requirements/requirements-tool_bench.txt	2026-04-12 16:22:26 +08:00
Sirui He	073bb2c20b	mtmd : add MERaLiON-2 multimodal audio support (#21756 ) * mtmd : add MERaLiON-2 multimodal audio support Adds support for ASTAR's MERaLiON-2 audio-language model (3B and 10B) to the multimodal framework. Architecture: - Whisper large-v2 encoder for audio feature extraction - Gated MLP adaptor: ln_speech -> frame stack (x15) -> Linear+SiLU -> GLU -> out_proj - Gemma2 3B / 27B decoder The mmproj GGUF is generated via convert_hf_to_gguf.py --mmproj on the full MERaLiON-2 model directory (architecture: MERaLiON2ForConditionalGeneration). The decoder is converted separately as a standard Gemma2 model after stripping the text_decoder. weight prefix. New projector type: PROJECTOR_TYPE_MERALION Supports tasks: speech transcription (EN/ZH/MS/TA), translation, spoken QA. Model: https://huggingface.co/MERaLiON/MERaLiON-2-3B https://huggingface.co/MERaLiON/MERaLiON-2-10B simplify comments in meralion adaptor * meralion: use format_tensor_name, ascii arrows in comments	2026-04-11 14:15:48 +02:00
Concedo	8b90bfe094	Merge commit '`4ef9301e4d`' into concedo_experimental # Conflicts: # .github/labeler.yml # docs/multimodal.md # embd_res/ggml-vocab-gemma-4.gguf # embd_res/ggml-vocab-gemma-4.gguf.inp # embd_res/ggml-vocab-gemma-4.gguf.out # ggml/src/ggml-sycl/fattn-tile.cpp # ggml/src/ggml-sycl/fattn-tile.hpp # ggml/src/ggml-sycl/fattn-vec.hpp # ggml/src/ggml-sycl/fattn.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-f16.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q4_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q5_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-f16-q8_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-f16.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q4_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q5_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_0-q8_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-f16.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q4_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q5_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q4_1-q8_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-f16.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q4_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q5_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_0-q8_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-f16.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q4_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q5_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q5_1-q8_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-f16.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q4_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_0.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q5_1.cpp # ggml/src/ggml-sycl/template-instances/fattn-vec-instance-q8_0-q8_0.cpp # tests/CMakeLists.txt # tests/test-jinja.cpp # tools/mtmd/CMakeLists.txt	2026-04-11 09:38:50 +08:00
Xuan-Son Nguyen	501aeed18f	mtmd: support dots.ocr (#17575 ) * convert gguf * clip impl * fix conversion * wip * corrections * update docs * add gguf to test script	2026-04-09 12:16:38 +02:00
Concedo	c82c0b463a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/release.yml # examples/debug/debug.cpp # ggml/src/ggml-cuda/common.cuh # ggml/src/ggml-cuda/mmq.cuh # ggml/src/ggml-webgpu/ggml-webgpu.cpp # src/llama-vocab.cpp # tests/test-backend-ops.cpp # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp # tools/mtmd/CMakeLists.txt	2026-04-09 17:45:04 +08:00
forforever73	09343c0198	model : support step3-vl-10b (#21287 ) * feat: support step3-vl-10b * use fused QKV && mapping tensor in tensor_mapping.py * guard hardcoded params and drop crop metadata * get understand_projector_stride from global config * img_u8_resize_bilinear_to_f32 move in step3vl class * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix the \r\n mess * add width and heads to MmprojModel.set_gguf_parameters --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-08 09:51:31 +02:00
Concedo	15d269197e	Merge commit '`506200cf8b`' into concedo_experimental # Conflicts: # docs/multimodal.md # scripts/compare-llama-bench.py # src/llama-vocab.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp	2026-04-07 14:58:36 +08:00

1 2 3 4 5 ...

296 commits