common : add standard Hugging Face cache support (#20775)

* common : add standard Hugging Face cache support - Use HF API to find all files - Migrate all manifests to hugging face cache at startup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check with the quant tag Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Cleanup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Improve error handling and report API errors Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Restore common_cached_model_info and align mmproj filtering Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Prefer main when getting cached ref Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use cached files when HF API fails Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use final_path.. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check all inputs Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-05-05 15:31:04 +00:00 · 2026-03-24 07:30:33 +01:00 · 2026-03-24 07:30:33 +01:00 · 8c7957ca33
commit 8c7957ca33
parent e852eb4901
8 changed files with 1061 additions and 330 deletions
--- a/tools/server/tests/unit/test_router.py
+++ b/tools/server/tests/unit/test_router.py
@ -103,8 +103,8 @@ def test_router_models_max_evicts_lru():

    candidate_models = [
        "ggml-org/tinygemma3-GGUF:Q8_0",
-        "ggml-org/test-model-stories260K",
-        "ggml-org/test-model-stories260K-infill",
+        "ggml-org/test-model-stories260K:F32",
+        "ggml-org/test-model-stories260K-infill:F32",
    ]

    # Load only the first 2 models to fill the cache