* mtmd, llama : add HunyuanVL vision-language model support
- add LLM_ARCH_HUNYUAN_VL with M-RoPE (XD-RoPE) support
- add PROJECTOR_TYPE_HUNYUANVL with PatchMerger vision encoder
- add HunyuanVL-specific M-RoPE position encoding for image tokens
- add GGUF conversion for HunyuanVL vision and text models
- add smoke test in tools/mtmd/tests.sh
* fix: fix HunyuanVL XD-RoPE h/w section order
* fix: Remove redundant code
* convert : fix HunyuanOCR / HunyuanVL conversion
- Tested locally: both HunyuanOCR and HunyuanVL-4B convert to GGUF
- successfully and produce correct inference output on Metal (F16 / Q8_0).
* clip : fix -Werror=misleading-indentation in bilinear resize
* fix CI: convert_hf_to_gguf type check error
- convert_hf_to_gguf.py: give HunyuanVLTextModel.__init__ an explicit `dir_model: Path` parameter so ty can infer the type for load_hparams instead of reporting `Unknown | None`.
---------
Co-authored-by: wendadawen <wendadawen@tencent.com>