Saba Fallah
da3f990a47
mtmd: Add DeepSeekOCR 2 Support ( #20975 )
...
* mtmd: DeepSeek-OCR 2 support, with multi-tile dynamic resolution
* introduced clip_image_f32::add_viewsep
* address PR review
- drop redundant ggml_cpy ops in both deepseekocr versions build
- drop no-op ggml_cont in build_sam
- assert num_image_tokens deepseekocr2
- view_seperator as (1, n_embd) at conversion (for both versions)
- drop redundant ggml_reshape_2d
* Update tools/mtmd/models/deepseekocr2.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-05-29 16:13:51 +02:00
fairydreaming
1f0aa2a696
model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation ( #23346 )
...
* llama : support DeepSeek V3.2 model family (with DSA lightning indexer)
* convert : handle DeepseekV32ForCausalLM architecture
* ggml : support for f16 GGML_OP_FILL
* memory : separate hparams argument in llama_kv_cache constructor
* memory : add llama_kv_cache_dsa memory (KV cache + lightning indexer cache)
* llama : support for LLM_ARCH_DEEPSEEK32
* model : llama_model_deepseek32 implementation
* model : merge two scale operations into one in DSA lightning indexer implementation
* chore : remove unused code
* model : support NVFP4 in DeepSeek V3.2
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* memory : refactoring TODO
Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>
2026-05-29 10:15:17 +02:00
ghleg
dbe9c0c8ce
convert : support Gemma4ForCausalLM architecture ( #23682 )
...
* convert : support Gemma4ForCausalLM architecture (#23674 )
* fix indent
---------
Co-authored-by: Oleg Afonin <your.email@example.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-26 08:00:31 +03:00
Niklas Sheth
c9d98295a3
model : add support for talkie-1930-13b ( #22596 )
...
* initial talkie support, coherent
* reorder to follow convention
* absorb inverse rope
* stop folding scalars to improve quantization
* use broadcasting instead of duplication
* style cleanup
* add scaling support to LoraTorchTensor; use that path in conversion
* use layer_out_scale instead of embd_skip_scale
2026-05-26 07:57:38 +03:00
Piotr Wilkin (ilintar)
cc7200bf12
Refactor: convert_hf_to_gguf.py ( #17114 )
...
* move conversion code to a dedicated conversion directory and split the files akin to the src/models architecture
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-05-15 15:18:12 +02:00