Concedo
5a2808ffaf
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .flake8
# .github/labeler.yml
# .github/workflows/bench.yml.disabled
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# .github/workflows/server.yml
# .gitignore
# CMakeLists.txt
# CODEOWNERS
# Makefile
# README.md
# SECURITY.md
# build-xcframework.sh
# ci/run.sh
# docs/development/HOWTO-add-model.md
# docs/multimodal/MobileVLM.md
# docs/multimodal/glmedge.md
# docs/multimodal/llava.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.5.md
# docs/multimodal/minicpmv2.6.md
# examples/CMakeLists.txt
# examples/pydantic_models_to_grammar_examples.py
# grammars/README.md
# pyrightconfig.json
# requirements/requirements-all.txt
# scripts/fetch_server_test_models.py
# scripts/tool_bench.py
# scripts/xxd.cmake
# tests/CMakeLists.txt
# tests/run-json-schema-to-grammar.mjs
# tools/batched-bench/CMakeLists.txt
# tools/batched-bench/README.md
# tools/batched-bench/batched-bench.cpp
# tools/cvector-generator/CMakeLists.txt
# tools/cvector-generator/README.md
# tools/cvector-generator/completions.txt
# tools/cvector-generator/cvector-generator.cpp
# tools/cvector-generator/mean.hpp
# tools/cvector-generator/negative.txt
# tools/cvector-generator/pca.hpp
# tools/cvector-generator/positive.txt
# tools/export-lora/CMakeLists.txt
# tools/export-lora/README.md
# tools/export-lora/export-lora.cpp
# tools/gguf-split/CMakeLists.txt
# tools/gguf-split/README.md
# tools/imatrix/CMakeLists.txt
# tools/imatrix/README.md
# tools/imatrix/imatrix.cpp
# tools/llama-bench/CMakeLists.txt
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/llava/CMakeLists.txt
# tools/llava/README.md
# tools/llava/android/adb_run.sh
# tools/llava/android/build_64.sh
# tools/llava/clip-quantize-cli.cpp
# tools/main/CMakeLists.txt
# tools/main/README.md
# tools/perplexity/CMakeLists.txt
# tools/perplexity/README.md
# tools/perplexity/perplexity.cpp
# tools/quantize/CMakeLists.txt
# tools/rpc/CMakeLists.txt
# tools/rpc/README.md
# tools/rpc/rpc-server.cpp
# tools/run/CMakeLists.txt
# tools/run/README.md
# tools/run/linenoise.cpp/linenoise.cpp
# tools/run/linenoise.cpp/linenoise.h
# tools/run/run.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
# tools/server/bench/README.md
# tools/server/public_simplechat/readme.md
# tools/server/tests/README.md
# tools/server/themes/README.md
# tools/server/themes/buttons-top/README.md
# tools/server/themes/wild/README.md
# tools/tokenize/CMakeLists.txt
# tools/tokenize/tokenize.cpp
2025-05-03 12:15:36 +08:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
Concedo
d8f1f73dd7
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# cmake/build-info.cmake
# common/CMakeLists.txt
# examples/llava/README.md
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2025-05-02 16:54:15 +08:00
Concedo
ca53d1bedc
Merge commit ' 13c9a3319b
' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cpu/CMakeLists.txt
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
2025-05-02 16:42:16 +08:00
Georgi Gerganov
fab647e884
server : add cache reuse card link to help ( #13230 )
...
* server : add cache reuse card link to help
* args : use short url
2025-05-02 09:48:31 +03:00
Diego Devesa
d7a14c42a1
build : fix build info on windows ( #13239 )
...
* build : fix build info on windows
* fix cuda host compiler msg
2025-05-01 21:48:08 +02:00
Xuan-Son Nguyen
13c9a3319b
arg : remove CURLINFO_EFFECTIVE_METHOD ( #13228 )
2025-05-01 10:23:25 +02:00
Xuan-Son Nguyen
6f67cf1f48
arg : -hf do not fail if url mismatch ( #13219 )
...
* arg : -hf do not fail if url mismatch
* do not return if cannot parse metadata json
2025-04-30 21:29:15 +01:00
Olivier Chafik
3b127c7385
common : add -jf / --json-schema-file flag ( #12011 )
2025-04-30 14:52:35 +02:00
Concedo
8273739412
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/musa.Dockerfile
# .devops/rocm.Dockerfile
# .devops/vulkan.Dockerfile
# examples/llama-bench/llama-bench.cpp
# examples/rpc/rpc-server.cpp
# scripts/compare-llama-bench.py
# tests/test-quantize-stats.cpp
2025-04-30 17:22:18 +08:00
Xuan-Son Nguyen
5933e6fdc9
arg : allow using -hf offline ( #13202 )
...
* arg : allow using -hf offline
* add more comments in code [no ci]
2025-04-30 10:46:32 +02:00
Concedo
b2ecfa0f55
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama-bench/README.md
# examples/llama-bench/llama-bench.cpp
# examples/llava/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-chat-template.cpp
2025-04-29 21:05:16 +08:00
Georgi Gerganov
43f2b07193
common : fix noreturn compile warning ( #13151 )
...
ggml-ci
2025-04-28 11:57:19 +03:00
Xuan-Son Nguyen
85f36e5e71
arg : fix unused variable ( #13142 )
2025-04-28 08:16:59 +03:00
Concedo
36c8db1248
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/llava/clip-impl.h
# examples/llava/clip.cpp
# tests/test-arg-parser.cpp
# tests/test-json-schema-to-grammar.cpp
2025-04-27 12:51:02 +08:00
Xuan-Son Nguyen
2d451c8059
common : add common_remote_get_content ( #13123 )
...
* common : add common_remote_get_content
* support max size and timeout
* add tests
2025-04-26 22:58:12 +02:00
frob
d5fe4e81bd
grammar : handle maxItems == 0 in JSON schema ( #13117 )
...
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-26 10:10:20 +02:00
Concedo
6b6597ebf1
allow for single token prompt processing (actual batch size 1)
2025-04-25 16:54:46 +08:00
Georgi Gerganov
13b4548877
cmake : do not include ./src as public for libllama ( #13062 )
...
* cmake : do not include ./src as public for libllama
ggml-ci
* cmake : rework tests
ggml-ci
* llguidance : remove unicode include
ggml-ci
* cmake : make c++17 private
ggml-ci
2025-04-24 16:00:10 +03:00
Xuan-Son Nguyen
7c727fbe39
arg : add --no-mmproj-offload ( #13093 )
...
* arg : add --no-mmproj-offload
* Update common/arg.cpp
2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen
80982e815e
arg : clean up handling --mmproj with -hf ( #13082 )
...
* arg : clean up handling --mmproj with -hf
* rm change about no_mmproj
* Revert "rm change about no_mmproj"
This reverts commit 2cac8e0efb629d66c612f137e75d562f94bb9e6c.
* handle no_mmproj explicitly
* skip download mmproj on examples not using it
2025-04-24 12:14:13 +02:00
Concedo
8f1edcbdac
Merge commit ' dc39a5e7a8
' into concedo_experimental
...
# Conflicts:
# README.md
# SECURITY.md
# docs/multimodal/MobileVLM.md
# examples/llava/CMakeLists.txt
# examples/llava/README.md
# examples/llava/android/adb_run.sh
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/rope.cpp
# ggml/src/ggml-sycl/rope.hpp
2025-04-24 11:49:08 +08:00
Xuan-Son Nguyen
243453533e
llava : update documentations ( #13055 )
...
* llava : update documentations
* fix typo
2025-04-22 10:37:00 +02:00
Xuan-Son Nguyen
84a9bf2fc2
mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli
( #13012 )
...
* mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli`
* support for minicpmv
* remove cpp files of llava and minicpmv
* update hot topics
* mtmd : add not supported msg for qwen2vl
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-21 15:32:58 +02:00
Concedo
06159939d9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# Makefile
# docs/build.md
# examples/rpc/rpc-server.cpp
# examples/sycl/build.sh
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-hip/CMakeLists.txt
# scripts/sync-ggml.last
2025-04-17 00:52:37 +08:00
Prajwal B Mehendarkar
bc091a4dc5
common : Define cache directory on AIX ( #12915 )
2025-04-12 17:33:39 +02:00
Concedo
a0ae187563
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# examples/llava/CMakeLists.txt
# examples/llava/clip.cpp
# examples/rpc/rpc-server.cpp
# examples/run/run.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
2025-04-12 10:06:47 +08:00
Olivier Chafik
b6930ebc42
tool-call
: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900 )
...
* `tool-call`: don't call common_chat_params_init_hermes_2_pro when there aren't tools (or when there's a schema)
* test all chat formats w/o tools
2025-04-11 21:47:52 +02:00
yuri@FreeBSD
68b08f36d0
common : Define cache directory on FreeBSD ( #12892 )
2025-04-11 21:45:44 +02:00
tastelikefeet
b2034c2b55
contrib: support modelscope community ( #12664 )
...
* support download from modelscope
* support login
* remove comments
* add arguments
* fix code
* fix win32
* test passed
* fix readme
* revert readme
* change to MODEL_ENDPOINT
* revert tail line
* fix readme
* refactor model endpoint
* remove blank line
* fix header
* fix as comments
* update comment
* update readme
---------
Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>
2025-04-11 14:01:56 +02:00
Concedo
ebf924c5d1
Merge branch 'upstream' into concedo_experimental
2025-04-08 21:46:30 +08:00
Concedo
822cf2430e
Merge commit ' f1e3eb4249
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/backend/SYCL.md
# examples/llava/clip.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in
2025-04-08 20:48:53 +08:00
Prajwal B Mehendarkar
1d343b4069
arg : Including limits file on AIX ( #12822 )
2025-04-08 14:30:59 +02:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default ( #12761 )
...
* cmake : enable curl by default
* no curl if no examples
* fix build
* fix build-linux-cross
* add windows-setup-curl
* fix
* shell
* fix path
* fix windows-latest-cmake*
* run: include_directories
* LLAMA_RUN_EXTRA_LIBS
* sycl: no llama_curl
* no test-arg-parser on windows
* clarification
* try riscv64 / arm64
* windows: include libcurl inside release binary
* add msg
* fix mac / ios / android build
* will this fix xcode?
* try clearing the cache
* add bunch of licenses
* revert clear cache
* fix xcode
* fix xcode (2)
* fix typo
2025-04-07 13:35:19 +02:00
Sergey Fedorov
f1e3eb4249
common : fix includes in arg.cpp and gemma3-cli.cpp ( #12766 )
...
* arg.cpp: add a missing include
* gemma3-cli.cpp: fix cinttypes include
2025-04-05 17:46:00 +02:00
エシュナヴァリシア
c6ff5d2a8d
common: custom hf endpoint support ( #12769 )
...
* common: custom hf endpoint support
Add support for custom huggingface endpoints via HF_ENDPOINT environment variable
You can now specify a custom huggingface endpoint using the HF_ENDPOINT environment variable when using the --hf-repo flag, which works similarly to huggingface-cli's endpoint configuration.
Example usage:
HF_ENDPOINT=https://hf-mirror.com/ ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is"
The trailing slash in the URL is optional:
HF_ENDPOINT=https://hf-mirror.com ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is"
* Update common/arg.cpp
readability Improvement
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Apply suggestions from code review
---------
Co-authored-by: ベアトリーチェ <148695646+MakiSonomura@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-04-05 15:31:42 +02:00
Olivier Chafik
7a84777f42
sync: minja ( #12739 )
...
* sync: minja
https://github.com/google/minja/pull/57
* fix json include
2025-04-04 21:16:39 +01:00
Concedo
4e740311fe
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ci/run.sh
# docs/backend/SYCL.md
# docs/build.md
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# tests/test-chat-template.cpp
2025-04-04 15:07:47 +08:00
R0CKSTAR
5f696e88e0
sync : minja (inclusionAI/Ling) and update tests ( #12699 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-03 13:51:35 +02:00
Concedo
103d60ed2c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/common.cpp
# examples/batched-bench/batched-bench.cpp
# examples/batched/batched.cpp
# examples/export-lora/export-lora.cpp
# examples/gritlm/gritlm.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/acl_tensor.cpp
# ggml/src/ggml-cann/acl_tensor.h
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-vulkan/CMakeLists.txt
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
2025-04-03 18:57:49 +08:00
Diego Devesa
e0e912f49b
llama : add option to override model tensor buffers ( #11397 )
...
* llama : add option to override tensor buffers
* ggml : fix possible underflow in ggml_nbytes
2025-04-02 14:52:01 +02:00
Xuan-Son Nguyen
42eb248f46
common : remove json.hpp from common.cpp ( #12697 )
...
* common : remove json.hpp from common.cpp
* fix comment
2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen
267c1399f1
common : refactor downloading system, handle mmproj with -hf option ( #12694 )
...
* (wip) refactor downloading system [no ci]
* fix all examples
* fix mmproj with -hf
* gemma3: update readme
* only handle mmproj in llava example
* fix multi-shard download
* windows: fix problem with std::min and std::max
* fix 2
2025-04-01 23:44:05 +02:00
Concedo
8a4a9b8c19
Merge branch 'upstream' into concedo_experimental
2025-04-01 20:16:16 +08:00
R0CKSTAR
a6f32f0b34
Fix clang warning in gguf_check_reserved_keys ( #12686 )
...
* Fix clang warning in gguf_check_reserved_keys
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Fix typo
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-01 13:12:53 +02:00
Concedo
396875e1c4
update api docs and lite
2025-03-29 15:39:25 +08:00
Johannes Gäßler
dd373dd3bf
llama: fix error on bad grammar ( #12628 )
2025-03-28 18:08:52 +01:00
Piotr
2099a9d5db
server : Support listening on a unix socket ( #12613 )
...
* server : Bump cpp-httplib to include AF_UNIX windows support
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
* server : Allow running the server example on a unix socket
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
---------
Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-03-27 23:41:04 +01:00
Michał Moskal
2447ad8a98
upgrade to llguidance 0.7.10 ( #12576 )
2025-03-26 11:06:09 -07:00
Concedo
5d7c5e9e33
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/tts/tts.cpp
2025-03-16 15:42:39 +08:00