Commit graph

610 commits

Author SHA1 Message Date
Concedo
06159939d9 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	Makefile
#	docs/build.md
#	examples/rpc/rpc-server.cpp
#	examples/sycl/build.sh
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-hip/CMakeLists.txt
#	scripts/sync-ggml.last
2025-04-17 00:52:37 +08:00
Prajwal B Mehendarkar
bc091a4dc5
common : Define cache directory on AIX (#12915) 2025-04-12 17:33:39 +02:00
Concedo
a0ae187563 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	README.md
#	build-xcframework.sh
#	examples/llava/CMakeLists.txt
#	examples/llava/clip.cpp
#	examples/rpc/rpc-server.cpp
#	examples/run/run.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
2025-04-12 10:06:47 +08:00
Olivier Chafik
b6930ebc42
tool-call: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900)
* `tool-call`: don't call common_chat_params_init_hermes_2_pro when there aren't tools (or when there's a schema)

* test all chat formats w/o tools
2025-04-11 21:47:52 +02:00
yuri@FreeBSD
68b08f36d0
common : Define cache directory on FreeBSD (#12892) 2025-04-11 21:45:44 +02:00
tastelikefeet
b2034c2b55
contrib: support modelscope community (#12664)
* support download from modelscope

* support login

* remove comments

* add arguments

* fix code

* fix win32

* test passed

* fix readme

* revert readme

* change to MODEL_ENDPOINT

* revert tail line

* fix readme

* refactor model endpoint

* remove blank line

* fix header

* fix as comments

* update comment

* update readme

---------

Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>
2025-04-11 14:01:56 +02:00
Concedo
ebf924c5d1 Merge branch 'upstream' into concedo_experimental 2025-04-08 21:46:30 +08:00
Concedo
822cf2430e Merge commit 'f1e3eb4249' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	docs/backend/SYCL.md
#	examples/llava/clip.cpp
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in
2025-04-08 20:48:53 +08:00
Prajwal B Mehendarkar
1d343b4069
arg : Including limits file on AIX (#12822) 2025-04-08 14:30:59 +02:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default (#12761)
* cmake : enable curl by default

* no curl if no examples

* fix build

* fix build-linux-cross

* add windows-setup-curl

* fix

* shell

* fix path

* fix windows-latest-cmake*

* run: include_directories

* LLAMA_RUN_EXTRA_LIBS

* sycl: no llama_curl

* no test-arg-parser on windows

* clarification

* try riscv64 / arm64

* windows: include libcurl inside release binary

* add msg

* fix mac / ios / android build

* will this fix xcode?

* try clearing the cache

* add bunch of licenses

* revert clear cache

* fix xcode

* fix xcode (2)

* fix typo
2025-04-07 13:35:19 +02:00
Sergey Fedorov
f1e3eb4249
common : fix includes in arg.cpp and gemma3-cli.cpp (#12766)
* arg.cpp: add a missing include

* gemma3-cli.cpp: fix cinttypes include
2025-04-05 17:46:00 +02:00
エシュナヴァリシア
c6ff5d2a8d
common: custom hf endpoint support (#12769)
* common: custom hf endpoint support

Add support for custom huggingface endpoints via HF_ENDPOINT environment variable

You can now specify a custom huggingface endpoint using the HF_ENDPOINT environment variable when using the --hf-repo flag, which works similarly to huggingface-cli's endpoint configuration.

Example usage:
HF_ENDPOINT=https://hf-mirror.com/ ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is"

The trailing slash in the URL is optional:
HF_ENDPOINT=https://hf-mirror.com ./bin/llama-cli --hf-repo Qwen/Qwen1.5-0.5B-Chat-GGUF --hf-file qwen1_5-0_5b-chat-q2_k.gguf -p "The meaning to life and the universe is"

* Update common/arg.cpp

readability Improvement

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Apply suggestions from code review

---------

Co-authored-by: ベアトリーチェ <148695646+MakiSonomura@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-04-05 15:31:42 +02:00
Olivier Chafik
7a84777f42
sync: minja (#12739)
* sync: minja

https://github.com/google/minja/pull/57

* fix json include
2025-04-04 21:16:39 +01:00
Concedo
4e740311fe Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ci/run.sh
#	docs/backend/SYCL.md
#	docs/build.md
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
#	tests/test-chat-template.cpp
2025-04-04 15:07:47 +08:00
R0CKSTAR
5f696e88e0
sync : minja (inclusionAI/Ling) and update tests (#12699)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-03 13:51:35 +02:00
Concedo
103d60ed2c Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	common/common.cpp
#	examples/batched-bench/batched-bench.cpp
#	examples/batched/batched.cpp
#	examples/export-lora/export-lora.cpp
#	examples/gritlm/gritlm.cpp
#	examples/parallel/parallel.cpp
#	examples/passkey/passkey.cpp
#	examples/speculative-simple/speculative-simple.cpp
#	examples/speculative/speculative.cpp
#	ggml/src/ggml-cann/CMakeLists.txt
#	ggml/src/ggml-cann/acl_tensor.cpp
#	ggml/src/ggml-cann/acl_tensor.h
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	tests/test-arg-parser.cpp
#	tests/test-backend-ops.cpp
2025-04-03 18:57:49 +08:00
Diego Devesa
e0e912f49b
llama : add option to override model tensor buffers (#11397)
* llama : add option to override tensor buffers

* ggml : fix possible underflow in ggml_nbytes
2025-04-02 14:52:01 +02:00
Xuan-Son Nguyen
42eb248f46
common : remove json.hpp from common.cpp (#12697)
* common : remove json.hpp from common.cpp

* fix comment
2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen
267c1399f1
common : refactor downloading system, handle mmproj with -hf option (#12694)
* (wip) refactor downloading system [no ci]

* fix all examples

* fix mmproj with -hf

* gemma3: update readme

* only handle mmproj in llava example

* fix multi-shard download

* windows: fix problem with std::min and std::max

* fix 2
2025-04-01 23:44:05 +02:00
Concedo
8a4a9b8c19 Merge branch 'upstream' into concedo_experimental 2025-04-01 20:16:16 +08:00
R0CKSTAR
a6f32f0b34
Fix clang warning in gguf_check_reserved_keys (#12686)
* Fix clang warning in gguf_check_reserved_keys

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix typo

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-01 13:12:53 +02:00
Concedo
396875e1c4 update api docs and lite 2025-03-29 15:39:25 +08:00
Johannes Gäßler
dd373dd3bf
llama: fix error on bad grammar (#12628) 2025-03-28 18:08:52 +01:00
Piotr
2099a9d5db
server : Support listening on a unix socket (#12613)
* server : Bump cpp-httplib to include AF_UNIX windows support

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

* server : Allow running the server example on a unix socket

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

---------

Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-03-27 23:41:04 +01:00
Michał Moskal
2447ad8a98
upgrade to llguidance 0.7.10 (#12576) 2025-03-26 11:06:09 -07:00
Concedo
5d7c5e9e33 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/tts/tts.cpp
2025-03-16 15:42:39 +08:00
marcoStocchi
f4c3dd5daa
llama-tts : add '-o' option (#12398)
* added -o option to specify an output file name

* llama-tts returns ENOENT in case of file write error

note : PR #12042 is closed as superseded with this one.
2025-03-15 17:23:11 +01:00
Concedo
67851e5415 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/run/run.cpp
#	ggml/src/ggml-cann/aclnn_ops.cpp
2025-03-15 19:54:19 +08:00
Sigbjørn Skjæret
774973b8f3
main : add -sysf / --system-prompt-file (#12249) (#12250)
* add system_prompt_file

* add -sysf / --system-prompt-file

* remove system_prompt_file
2025-03-14 16:57:05 +01:00
fairydreaming
8fcb563613
Load all MoE experts during warmup (#11571)
* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup

* common : use new API to enable warmup mode during model warmup

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-14 13:47:05 +01:00
Concedo
7dc72db9de Merge branch 'upstream' into concedo_experimental 2025-03-14 11:58:53 +08:00
Concedo
0db4ae6237 traded my ink for a pen 2025-03-14 11:58:15 +08:00
Xuan-Son Nguyen
be7c303410
arg : no n_predict = -2 for examples except for main and infill (#12364) 2025-03-13 12:34:54 +01:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)
* llama : refactor llama_context, llama_kv_cache, llm_build_context

ggml-ci

* graph : don't mutate the KV cache during defrag

ggml-ci

* context : reduce virtuals + remove test function

ggml-ci

* context : move interface implementation to source file + factory

ggml-ci

* graph : move KV cache build functions to llama_context impl

ggml-ci

* graph : remove model reference from build_pooling

ggml-ci

* graph : remove llama_model reference

ggml-ci

* kv_cache : provide rope factors

ggml-ci

* graph : rework inputs to use only unique_ptr, remove attn input abstraction

ggml-ci

* context : remove llama_context_i abstraction

ggml-ci

* context : clean-up

ggml-ci

* graph : clean-up

ggml-ci

* llama : remove redundant keywords (struct, enum)

ggml-ci

* model : adapt gemma3

ggml-ci

* graph : restore same attention ops as on master

ggml-ci

* llama : remove TODO + fix indent

ggml-ci
2025-03-13 12:35:44 +02:00
Concedo
77debb1b1b gemma3 vision works, but is using more tokens than expected - may need resizing 2025-03-13 00:31:16 +08:00
marcoStocchi
6ef79a67ca
common : refactor '-o' option (#12278)
As discussed in PR 'llama-tts : add -o option' (#12042):

* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.

* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
2025-03-10 13:34:13 +02:00
Olivier Chafik
4e39a3c332
server: extract <think> tags from qwq outputs (#12297)
* extract <think> tags from qwq outputs

* const for all static regexes in chat.cpp
2025-03-10 10:59:03 +00:00
Olivier Chafik
87c2630546
allow missing content in message if tool_calls provided (#12293) 2025-03-10 09:45:07 +00:00
Concedo
6b7c3ae1d3 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	AUTHORS
#	README.md
#	ci/run.sh
#	docs/build.md
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-metal/CMakeLists.txt
#	scripts/sync-ggml.last
2025-03-10 10:32:41 +08:00
Georgi Gerganov
1e2f78a004
server : add speculative decoding presets for FIM (#12287) 2025-03-09 19:08:20 +02:00
Olivier Chafik
7cf64f6bee
sync: minja - support QwQ-32B (#12235)
8a76f7815e
2025-03-07 09:33:37 +00:00
Concedo
ec43d2b147 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	common/common.cpp
#	examples/embedding/embedding.cpp
#	examples/json_schema_to_grammar.py
#	examples/llama.android/llama/src/main/cpp/llama-android.cpp
#	examples/llama.swiftui/README.md
#	examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj
#	examples/lookahead/lookahead.cpp
#	examples/parallel/parallel.cpp
#	examples/passkey/passkey.cpp
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	requirements.txt
#	requirements/requirements-all.txt
#	scripts/fetch_server_test_models.py
#	tests/test-chat.cpp
#	tests/test-json-schema-to-grammar.cpp
2025-03-06 18:54:58 +08:00
Olivier Chafik
669912d9a5
tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
mgroeber9110
5bbe6a9fe9
ggml : portability fixes for VS 2017 (#12150)
* Add include files for std::min/max and std::toupper/tolower

* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined

* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode

* win32: only use __restrict in MSVC if C11/C17 support is not enabled

---------

Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
2025-03-04 18:53:26 +02:00
Concedo
0cddbe1f0b Merge branch 'upstream' into concedo_experimental 2025-03-05 00:22:06 +08:00
Sigbjørn Skjæret
56d7a9f812
main: allow preloading conversation with -p and add -st / --single-turn (#12145)
* Add chat template formatting to -no-cnv

* only enable prompt formatting if explicitly enabled

* add -st / --single-turn

* add --single-turn and -p in conversation mode

* fix -sys + -p

* reword warning

* small readability change and fix (long) outdated example usage

* only activate single turn in conversation mode
2025-03-04 12:19:39 -04:00
dm4
c43af9276b
tts: add speaker file support (#12048)
* tts: add speaker file support

Signed-off-by: dm4 <sunrisedm4@gmail.com>

* tts: handle outetts-0.3

* tts : add new line in error message

---------

Signed-off-by: dm4 <sunrisedm4@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-03 15:09:29 +02:00
Sigbjørn Skjæret
45a8e76745
common : add --system-prompt parameter, replace behavior of -p in conversation mode (#12131)
* Add --system-prompt parameter

* use user defined system prompt

* clarify

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* add warning

* clarify

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-03-01 13:56:45 +01:00
Concedo
6d7ef10671 Merge branch 'upstream' into concedo_experimental
Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902

# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	.gitignore
#	CONTRIBUTING.md
#	Makefile
#	common/CMakeLists.txt
#	common/arg.cpp
#	common/common.cpp
#	examples/main/main.cpp
#	examples/run/run.cpp
#	examples/server/tests/README.md
#	ggml/src/ggml-cuda/mma.cuh
#	scripts/get_chat_template.py
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-chat.cpp
2025-02-20 23:17:20 +08:00
Daniel Bevenius
d07c621393
common : add llama.vim preset for Qwen2.5 Coder (#11945)
This commit adds a preset for llama.vim to use the default Qwen 2.5
Coder models.

The motivation for this change is to make it easier to start a server
suitable to be used with the llama.vim plugin. For example, the server
can be started with a command like the following:
```console
$ llama.vim --fim-qwen-1.5b-default
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10932
2025-02-19 12:29:52 +01:00