koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-26 07:33:39 +00:00

Author	SHA1	Message	Date
Aditya Singh	c0c7e147e7	requirements : bump torch to 2.11.0 (#23503 ) * requirements: relax torch~=2.6.0 to torch>=2.6.0 for convert_hf_to_gguf The ~=2.6.0 operator resolves to >=2.6.0, <2.7.0, which fails on PyPI for platform/CPython combinations where 2.6.x is not present. The accompanying comment already says 'PyTorch 2.6.0 or later', so the looser >=2.6.0 matches the documented intent and unblocks pip install -r requirements/requirements-convert_hf_to_gguf.txt. Fixes #23408 * requirements: bump torch floor to 2.11.0 per maintainer * requirements: pin torch to ==2.11.0 per project policy * requirements: pin mtmd torch and torchvision to 2.11.0/0.26.0 per project policy * requirements: suppress check_requirements pin warning on mtmd The check_requirements script flags '==' on lines in files matched by //requirements.txt. Append the documented suppression comment to the pinned torch and torchvision lines (and to the s390x platform marker lines) so the check passes while keeping the pins required by project policy. * ty: silence Tensor/Module union check on model[0].auto_model With torch 2.11.0 stubs, nn.Sequential.__getitem__ now returns Tensor \| Module rather than Module, so model[0].auto_model fails ty on the SentenceTransformer code path. The runtime behavior is unchanged because SentenceTransformer always wraps a Module at index 0. Adding a targeted unresolved-attribute ignore keeps the type-check green without altering behavior. A follow-up issue tracks typing the variable explicitly.	2026-05-23 18:24:39 +02:00
wendadawen	6a257d4463	mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329 ) - HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference. - Collapse OCR into the HUNYUANVL projector + HUNYUAN_VL text arch	2026-05-21 00:35:37 +02:00
Saba Fallah	a8681a0ed2	mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#23345 ) * mtmd : deepseek-ocr fixes, improvements and refactoring - image processing changes to achieve full parity with Pillow (reference impl) - SAM mask casting only when flash-attn is on - SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it) - llama-chat changes to fix server/WebUI issue (new media_markers_first()) - adapted test-chat-template and added test cases for deepseek-ocr - changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model - ty.toml ignore unresolved-import for tools/mtmd/tests/** * image-text reordering fix removed * refactor bool add_padding + pad_rounding enum into a single pad_style enum	2026-05-20 17:37:10 +02:00
Xuan-Son Nguyen	e2b129e1bf	mtmd: fit_params now take into account mmproj (#21489 ) * mtmd: fit_params now take into account mmproj * rename alloc_compute_meta to reserve_compute_meta * rm unused functions * add ggml_backend_dev_t support * add debug log	2026-05-20 11:27:44 +02:00
Xuan-Son Nguyen	72e60f500d	mtmd: add chunks and fix preproc for qwen3a (#23073 ) * mtmd: add chunks and fix preproc for qwen3a * add attn_mask * limit mtmd_chunk size (avoid blow up memory) * correct audio tokens * re-order the set_input case * remove attn_mask	2026-05-15 19:32:47 +02:00
Georgi Gerganov	67b2b7f2f2	logs : reduce (#23021 ) Some checks failed Python Type-Check / python type-check (push) Waiting to run Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Update Operations Documentation / update-ops-docs (push) Has been cancelled Details * logs : reduce * args : fix envs * server : fix build * common : print verbosity level at start * server : clean-up logs * server : print prompt processing timings + sampling params * minor : whitespaces	2026-05-14 13:05:52 +03:00
Xuan-Son Nguyen	7bfe120c21	mtmd, server, common: expose modalities to /v1/models (#22952 ) * mtmd, server, common: expose modalities to /v1/models * fix build * rename to mtmd_caps	2026-05-12 19:08:07 +02:00
AesSedai	4178259130	mtmd: add MiMo v2.5 vision (#22883 ) * mimo-v2.5: vision support * mimo-v2.5: use fused qkv for vision * mimi-v2.5: fix f16 vision overflow * mimo-v2.5: comment cleanups * mimo-v2.5: Flash doesn't have mmproj more cleanup remember to use filter_tensors * mimo-v2.5: fix trailing whitespace	2026-05-12 11:11:14 +02:00
Pascal	cc97e45a14	mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT (#22770 )	2026-05-07 14:01:01 +02:00
tc-mb	2496f9c149	mtmd : support MiniCPM-V 4.6 (#22529 ) * Support MiniCPM-V 4.6 in new branch Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix code bug Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix pre-commit Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix convert Signed-off-by: tc-mb <tianchi_cai@icloud.com> * rename clip_graph_minicpmv4_6 Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use new TYPE_MINICPMV4_6 Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use build_attn to allow flash attention support Signed-off-by: tc-mb <tianchi_cai@icloud.com> * no use legacy code, restored here. Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use the existing tensors name Signed-off-by: tc-mb <tianchi_cai@icloud.com> * unused ctx->model.hparams.minicpmv_version Signed-off-by: tc-mb <tianchi_cai@icloud.com> * use n_merge for slice alignment Signed-off-by: tc-mb <tianchi_cai@icloud.com> * borrow wa_layer_indexes for vit_merger insertion point Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix code style Signed-off-by: tc-mb <tianchi_cai@icloud.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * use filter_tensors and add model.vision_tower Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix chkhsh Signed-off-by: tc-mb <tianchi_cai@icloud.com> * fix type check Signed-off-by: tc-mb <tianchi_cai@icloud.com> --------- Signed-off-by: tc-mb <tianchi_cai@icloud.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-06 21:54:09 +02:00
Yakine Tahtah	a00e47e422	mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) (#22101 ) * mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) Conformer encoder with Shaw relative position encoding, QFormer projector, log-mel spectrogram with frame stacking. Encoder uses GLU gating, folded batch norm, and SSM depthwise conv. QFormer compresses encoder output via windowed cross-attention (window=15, queries=3) into the LLM embedding space. Audio preprocessing: reflect-padded STFT, 80-bin mel filterbank, dynamic range compression, 2x frame stacking (80->160 mel). GGUF converter handles batch norm folding at export time, fused K/V split, and Conv1d weight reshaping. Tested against HF transformers reference: token-for-token match on 30s/60s audio clips with greedy decoding. * mtmd: rename gs_ prefixed tensors to generic/architecture names * mtmd: use tensor_mapping.py for all granite_speech tensors * convert: fold GraniteSpeechTextModel into GraniteModel * mtmd: replace n_layer hack with explicit has_standard_layers flag * mtmd: replace hardcoded magic numbers with GGUF hparams for granite speech * mtmd: align KEY_A_ define spacing * convert: register GraniteModel for GraniteSpeechForConditionalGeneration * convert: fix ty type-check for GraniteSpeechMmprojModel registration * mtmd: align TN_ define spacing * mtmd: use generic layer loop for granite speech tensor loading * mtmd: merge qformer_proj_layer into clip_layer * mtmd: granite_speech remove redundant ggml_build_forward_expand on inputs * mtmd: granite_speech add comment explaining why build_attn is not used * mtmd: granite_speech hard-code eps in cpp, remove from GGUF metadata * gguf: add spacing between granite_speech tensor mapping blocks * mtmd: make generic audio layer_norm_eps read optional * mtmd: granite_speech keep encoder eps in GGUF, only hard-code projector eps * mtmd: align defines and struct fields in clip-impl.h and clip-model.h * mtmd: fix alignment and ordering issues across granite speech files * convert: granite_speech use filter_tensors instead of modify_tensors for skipping	2026-05-06 14:40:59 +02:00
Adrien Gallouët	bf76ac77be	common : only load backends when required (#22290 ) * common : only load backends when required Signed-off-by: Adrien Gallouët <angt@huggingface.co> * llama : call ggml_backend_load_all() directly from llama_backend_init() Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add ggml_backend_load_all() where llama_backend_init() is not used Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-05 09:23:50 +02:00
Max Krasnyansky	5594d13224	common: fix missing exports in llama-common (#22340 ) * common: refactor common/debug to move abort_on_nan into base_callback_data Passing bool abort_on_nan as template parameter for common_debug_cb_eval is unnecessary and creates an issue with LTO. It should just be a member of the base_callback_data instead. * cont : cleanup * common : use pimpl in debug.h to reduce header dependencies Move common_debug_cb_user_data's data members (std::regex, std::vector<uint8_t>) into a private impl struct in debug.cpp. This removes the includes of common.h and <regex> from debug.h, reducing transitive dependencies for any translation unit that includes the header. Assisted-by: llama.cpp:local pi --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-27 08:06:39 +03:00
Xuan-Son Nguyen	82d3f4d3b2	mtmd: also support LLAMA_ROPE_TYPE_NONE (#22242 )	2026-04-22 12:16:29 +02:00
manayang	7bfe60fdf9	mtmd, llama : Update HunyuanVL vision-language model support (#22037 ) * mtmd, llama : add HunyuanVL vision-language model support - add LLM_ARCH_HUNYUAN_VL with M-RoPE (XD-RoPE) support - add PROJECTOR_TYPE_HUNYUANVL with PatchMerger vision encoder - add HunyuanVL-specific M-RoPE position encoding for image tokens - add GGUF conversion for HunyuanVL vision and text models - add smoke test in tools/mtmd/tests.sh * fix: fix HunyuanVL XD-RoPE h/w section order * fix: Remove redundant code * convert : fix HunyuanOCR / HunyuanVL conversion - Tested locally: both HunyuanOCR and HunyuanVL-4B convert to GGUF - successfully and produce correct inference output on Metal (F16 / Q8_0). * clip : fix -Werror=misleading-indentation in bilinear resize * fix CI: convert_hf_to_gguf type check error - convert_hf_to_gguf.py: give HunyuanVLTextModel.__init__ an explicit `dir_model: Path` parameter so ty can infer the type for load_hparams instead of reporting `Unknown \| None`. --------- Co-authored-by: wendadawen <wendadawen@tencent.com>	2026-04-22 11:58:43 +02:00
Kwa Jie Hao	98d2d2884e	mtmd: Add support for Reka Edge 2603 (#21616 ) * feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>	2026-04-21 20:02:49 +02:00
Xuan-Son Nguyen	9998d88bc8	mtmd: correct mtmd_decode_use_mrope() (#22188 )	2026-04-21 10:53:37 +02:00
Xuan-Son Nguyen	86f8daacfe	mtmd: correct get_n_pos / get_decoder_pos (#22175 )	2026-04-20 23:29:19 +02:00
Xuan-Son Nguyen	a678916623	mtmd: refactor mtmd_decode_use_mrope (#22161 )	2026-04-20 14:45:11 +02:00
Xuan-Son Nguyen	19124078be	mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change) (#22082 ) * mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos * fix build	2026-04-19 11:57:21 +02:00
Yuri Khrustalev	a279d0f0f4	ci : add android arm64 build and release (#21647 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details Update Operations Documentation / update-ops-docs (push) Has been cancelled Details * server: respect the ignore eos flag * ci: add android arm64 build and release * patch * pin android-setup actions to v4 * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * lf in the suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-17 11:32:24 +02:00
65a	268d61e178	mtmd: add missing struct tag (#22023 )	2026-04-17 10:48:33 +02:00
Georgi Gerganov	6990e2f1f7	libs : rename libcommon -> libllama-common (#21936 ) * cmake : allow libcommon to be shared * cmake : rename libcommon to libllama-common * cont : set -fPIC for httplib * cont : export all symbols * cont : fix build_info exports * libs : add libllama-common-base * log : add common_log_get_verbosity_thold()	2026-04-17 11:11:46 +03:00
Xuan-Son Nguyen	408225bb1a	server: use random media marker (#21962 ) * server: use random media marker * nits * remove legacy <__image__> token * revert special char in random	2026-04-15 23:52:22 +02:00
Xuan-Son Nguyen	707c0b7a6e	mtmd: add mtmd_image_tokens_get_decoder_pos() API (#21851 ) * mtmd: add mtmd_image_tokens_get_decoder_pos() API * consistent naming * fix build	2026-04-14 16:07:41 +02:00
Xuan-Son Nguyen	e974923698	docs: listing qwen3-asr and qwen3-omni as supported (#21857 ) * docs: listing qwen3-asr and qwen3-omni as supported * nits	2026-04-13 22:28:17 +02:00
Xuan-Son Nguyen	920b3e78cb	mtmd: use causal attn for gemma 4 audio (#21824 )	2026-04-13 09:47:55 +02:00
Sergiu	82764d8f40	mtmd: fix crash when sending image under 2x2 pixels (#21711 )	2026-04-12 23:59:21 +02:00
Xuan-Son Nguyen	21a4933042	mtmd: qwen3 audio support (qwen3-omni and qwen3-asr) (#19441 ) * add qwen3a * wip * vision ok * no more deepstack for audio * convert ASR model ok * qwen3 asr working * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix bad merge * fix multi inheritance --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-12 23:57:25 +02:00
Xuan-Son Nguyen	aa4695c5e5	mtmd: add gemma 4 test (vision + audio) [no ci] (#21806 ) * mtmd: add gemma 4 test (vision + audio) * add to docs	2026-04-12 16:29:03 +02:00
Stephen Cox	547765a93e	mtmd: add Gemma 4 audio conformer encoder support (#21421 ) * mtmd: add Gemma 4 audio conformer encoder support Add audio processing for Gemma 4 E2B/E4B via a USM-style Conformer. Architecture: - 12-layer Conformer: FFN → Self-Attention → Causal Conv1D → FFN → Norm - Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm - Full self-attention with sinusoidal RPE and sliding window mask (24) - Logit softcapping at 50.0, ClippableLinear clamping - Output: 1024 → 1536 → RMSNorm → multimodal embedder Mel preprocessing (dedicated mtmd_audio_preprocessor_gemma4a): - HTK mel scale, 128 bins, magnitude STFT, mel_floor=1e-3 - Standard periodic Hann window (320 samples), zero-padded to FFT size - Semicausal left-padding (frame_length/2 samples) - Frame count matched to PyTorch (unfold formula) - No pre-emphasis, no Whisper-style normalization - Mel cosine similarity vs PyTorch: 0.9998 Key fixes: - Tensor loading dedup: prevent get_tensor() from creating duplicate entries in ctx_data. Fixed with std::set guard. - ClippableLinear clamp_info loading moved after per-layer tensors. - Sliding window mask (24 positions) matching PyTorch context_size. - Skip Whisper normalization for Gemma4 mel output. Tested on E2B and E4B with CPU and Vulkan backends. Transcribes: "Glad to see things are going well and business is starting to pick up" (matching ground truth). Ref: #21325	2026-04-12 14:15:26 +02:00
Sirui He	073bb2c20b	mtmd : add MERaLiON-2 multimodal audio support (#21756 ) * mtmd : add MERaLiON-2 multimodal audio support Adds support for ASTAR's MERaLiON-2 audio-language model (3B and 10B) to the multimodal framework. Architecture: - Whisper large-v2 encoder for audio feature extraction - Gated MLP adaptor: ln_speech -> frame stack (x15) -> Linear+SiLU -> GLU -> out_proj - Gemma2 3B / 27B decoder The mmproj GGUF is generated via convert_hf_to_gguf.py --mmproj on the full MERaLiON-2 model directory (architecture: MERaLiON2ForConditionalGeneration). The decoder is converted separately as a standard Gemma2 model after stripping the text_decoder. weight prefix. New projector type: PROJECTOR_TYPE_MERALION Supports tasks: speech transcription (EN/ZH/MS/TA), translation, spoken QA. Model: https://huggingface.co/MERaLiON/MERaLiON-2-3B https://huggingface.co/MERaLiON/MERaLiON-2-10B simplify comments in meralion adaptor * meralion: use format_tensor_name, ascii arrows in comments	2026-04-11 14:15:48 +02:00
Xuan-Son Nguyen	501aeed18f	mtmd: support dots.ocr (#17575 ) * convert gguf * clip impl * fix conversion * wip * corrections * update docs * add gguf to test script	2026-04-09 12:16:38 +02:00
forforever73	09343c0198	model : support step3-vl-10b (#21287 ) * feat: support step3-vl-10b * use fused QKV && mapping tensor in tensor_mapping.py * guard hardcoded params and drop crop metadata * get understand_projector_stride from global config * img_u8_resize_bilinear_to_f32 move in step3vl class * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix the \r\n mess * add width and heads to MmprojModel.set_gguf_parameters --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-08 09:51:31 +02:00
Xuan-Son Nguyen	3979f2bb08	docs: add hunyuan-ocr gguf, also add test [no ci] (#21490 )	2026-04-06 14:02:37 +02:00
anchortense	58190cc84d	llama : correct platform-independent loading of BOOL metadata (#21428 ) * model-loader : fix GGUF bool array conversion * model-loader : fix remaining GGUF bool pointer uses	2026-04-06 01:40:38 +02:00
Richard Davison	af76639f72	model : add HunyuanOCR support (#21395 ) * HunyuanOCR: add support for text and vision models - Add HunyuanOCR vision projector (perceiver-based) with Conv2d merge - Add separate HUNYUAN_OCR chat template (content-before-role format) - Handle HunyuanOCR's invalid pad_token_id=-1 in converter - Fix EOS/EOT token IDs from generation_config.json - Support xdrope RoPE scaling type - Add tensor mappings for perceiver projector (mm.before_rms, mm.after_rms, etc.) - Register HunYuanVLForConditionalGeneration for both text and mmproj conversion * fix proper mapping * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * address comments * update * Fix typecheck * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-04-05 23:32:14 +02:00
Xuan-Son Nguyen	63f8fe0ef4	model, mtmd: fix gguf conversion for audio/vision mmproj (#21309 ) * fix gguf conversion for audio/vision mmproj * fix test	2026-04-02 17:10:32 +02:00
Adrien Gallouët	41361c8599	common : move up common_init() and fix Windows UTF-8 logs (#21176 ) The build info is now only for debug, so we avoid the duplicate with `--version`. The UTF-8 setup at the beginning is needed to avoid logging garbage on Windows. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-31 12:53:41 +02:00
Xuan-Son Nguyen	871f1a2d2f	mtmd: add more sanity checks (#21047 )	2026-03-27 11:00:52 +01:00
Xuan-Son Nguyen	a73bbd5d92	mtmd: refactor image preprocessing (#21031 ) * mtmd: refactor image pre-processing * correct some places * correct lfm2 * fix deepseek-ocr on server * add comment to clarify about mtmd_image_preprocessor_dyn_size	2026-03-26 19:49:20 +01:00
Saba Fallah	a970515bdb	mtmd: Add DeepSeekOCR Support (#17400 ) * mtmd: llama.cpp DeepSeekOCR support init commit * loading sam tensors * mtmd: fix vision model processing * deepseek-ocr clip-vit model impl * mtmd: add DeepSeek-OCR LM support with standard attention * mtmd: successfully runs DeepSeek-OCR LM in llama-cli * mtmd: Fix RoPE type for DeepSeek-OCR LM. * loading LM testing Vision model loading * sam warmup working * sam erroneous return corrected * clip-vit: corrected cls_embd concat * clip-vit: model convert qkv_proj split * corrected combining of image encoders' results * fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model * concat image_newline and image_seperator tokens * visual_model warmup (technically) works * window partitioning using standard ggml ops * sam implementation without using CPU only ops * clip: fixed warnings * Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr * mtmd: fix get_rel_pos * mtmd: fixed the wrong scaler for get_rel_pos * image encoding technically works but the output can't be checked singe image decoding fails * mtmd: minor changed * mtmd: add native resolution support * - image encoding debugged - issues fixed mainly related wrong config like n_patches etc. - configs need to be corrected in the converter * mtmd: correct token order * - dynamic resizing - changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4 * mtmd: quick fix token order * mtmd: fix danling pointer * mtmd: SAM numerically works * mtmd: debug CLIP-L (vit_pre_ln) * mtmd: debug CLIP-L & first working DeepSeek-OCR model * mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work * mtmd: simplify SAM patch embedding * mtmd: adapt Pillow image resizing function * mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing * mtmd: remove --dsocr-mode argument * mtmd: refactor code & remove unused helper functions * mtmd: fix tensor names for image newlines and view separator * clean up * reverting automatically removed spaces * reverting automatically removed spaces * mtmd: fixed bad ocr check in Deepseek2 (LM) * mtmd: support combined QKV projection in buid_vit * using common build_attn in sam * corrected code-branch when flash-attn disabled enabling usage of --flash-attn option * mtmd: minor fix * minor formatting and style * fixed flake8 lint issues * minor editorconfig-check fixes * minor editorconfig-check fixes * mtmd: simplify get_rel_pos * mtmd: make sam hparams configurable * mtmd: add detailed comments for resize_bicubic_pillow * mtmd: fixed wrong input setting * mtmd: convert model in FP16 * mtmd: minor fix * mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template * fix: test-1.jpg ORC issue with small (640) resolution setting min-resolution base (1024) max large (1280) for dynamic-resolution * minor: editconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909 added new opt to tests.sh to disable flash-attn * minor: editconfig-check fix * testing deepseek-ocr quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR * quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909 * refactoring, one single builder function and static helpers * added deepseek-ocr test to tests.sh * minor formatting fixes * check with fixed expected resutls * minor formatting * editorconfig-check fix * merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042 * minor - added GLM-4.6V to big tests - added missing deps for python test * convert: minor fix * mtmd: format code * convert: quick fix * convert: quick fix * minor python formatting * fixed merge build issue * merge resolved - fixed issues in convert - tested several deepseek models * minor fix * minor * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * - removed clip_is_deepseekocr - removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo - simplified image-preprocessing - removed/simplified debug functions * - cleaning commented out code * fixing instabilities issues reintroducing resize_bicubic_pillow * - use f16 model for deepseek-ocr test - ignore llama-arch test for deepseek-ocr * rename fc_w --> mm_fc_w * add links to OCR discussion * cleaner loading code * add missing .weight to some tensors * add default jinja template (to be used by server) * move test model to ggml-org * rolling back upscale change * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: bluebread <hotbread70127@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-03-25 19:57:40 +01:00
bssrdf	ec2b787ebe	mtmd: Add dynamic high-resolution image preprocessing for InternVL model (#20847 ) * added support for internvl's dynamic high-resolution (Qianfan-OCR needed) * add min/max dynamic patch to gguf meta * clean up * simplified handling min/max dynamic patch * reuse llava_uhd logic for slice images * provide default values for older models * flake8 * prevent writing 0 value to gguf * remove duplicated resolution candidates with a better algorithm * fix indentation * format * add protection from divide by zero * change to 0 to be safe --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-23 01:06:30 +01:00
DorianRudolph	d3ac030a5d	mtmd : fix LightOnOCR image preprocessing (#20877 )	2026-03-23 01:04:14 +01:00
Xuan-Son Nguyen	1e64534570	mtmd: add clip_graph::build_mm() (#20751 ) * clip: add build_mm() * apply to all models * add TODO for bias overload	2026-03-19 13:11:39 +01:00
Xuan-Son Nguyen	94d0262277	mtmd: add llama-mtmd-debug binary (#20508 ) * mtmd: add llama-mtmd-debug binary * adapt * fixes * fix compile error * fix windows compile error * rm legacy clip_debug_encode() * add MTMD_API to fix build	2026-03-14 15:52:29 +01:00
Daniel Bevenius	8f974d2392	mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105 ) This commit renames the the function `mtmd_get_audio_bitrate` to `mtmd_get_audio_sample_rate` to better reflect its purpose. The motivation for this is that the function currently returns the audio sample rate, not the bitrate (sample_rate × bit_depth × channels), and that is how it is used in the code as well. This is a breaking change, but I believe mtmd is still in experimental/development phase so it might be alright to simply rename.	2026-03-13 12:30:02 +01:00
DAN™	fdb17643d3	model : add support for Phi4ForCausalLMV (#20168 ) * Add support for Phi4ForCausalLMV. * Fix Phi-4 vision parity (correcting SigLIP2 patch-kernel export layout) and matching HF NaFlex resize behavior in mtmd. * Rename contants + fix tokenizer label * Clean-ups. * Fix GGUF export. * Set tokenizer.ggml.pre explicitly. * Default vocab name rather than forcing it. * Clean-ups. * Fix indent. * Fix subscriptable error. * remov overcomplicated code path * Clean-ups. --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-12 00:25:54 +01:00
Marcel Petrick	92f7da00b4	chore : correct typos [no ci] (#20041 ) * fix(docs): correct typos found during code review Non-functional changes only: - Fixed minor spelling mistakes in comments - Corrected typos in user-facing strings - No variables, logic, or functional code was modified. Signed-off-by: Marcel Petrick <mail@marcelpetrick.it> * Update docs/backend/CANN.md Co-authored-by: Aaron Teo <taronaeo@gmail.com> * Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8" This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256. * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Marcel Petrick <mail@marcelpetrick.it> Co-authored-by: Aaron Teo <taronaeo@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-05 08:50:21 +01:00
Sigbjørn Skjæret	d969e933e1	tools : add missing clocale include in mtmd-cli [no ci] (#20107 )	2026-03-04 14:18:04 +01:00

1 2 3 4

165 commits