Concedo
db6db9dff9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/server.yml
# AUTHORS
# CMakeLists.txt
# Makefile
# README.md
# cmake/llama.pc.in
# common/CMakeLists.txt
# docs/build.md
# examples/batched.swift/Sources/main.swift
# examples/llama.swiftui/llama.cpp.swift/LibLlama.swift
# examples/llava/CMakeLists.txt
# examples/llava/clip.h
# examples/run/run.cpp
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2025-02-07 00:52:31 +08:00
Olivier Chafik
9f4cc8f8d3
sync
: minja (#11641 )
...
* `sync`: minja
182de30cda
https://github.com/google/minja/pull/46
https://github.com/google/minja/pull/45
2025-02-05 01:00:12 +00:00
Radoslav Gerganov
1bef571f6a
arg : list RPC devices first when using --list-devices ( #11655 )
...
List devices in the same order as they appear when evaluating the model
and splitting tensors across devices, i.e. RPC devices come first in the
list.
ref #11435
2025-02-04 18:16:20 +02:00
Olivier Chafik
db288b60cb
tool-call
: command r7b fix for normal responses (#11608 )
...
* fix command r7b normal response regex + add to server test
* test multiline non-tool-call responses in test-chat
2025-02-04 15:48:53 +00:00
Olivier Chafik
cde3833239
tool-call
: allow --chat-template chatml
w/ --jinja
, default to chatml upon parsing issue, avoid double bos (#11616 )
...
* tool-call: allow `--jinja --chat-template chatml`
* fix double bos issue (drop bos/eos tokens from jinja template)
* add missing try catch around jinja parsing to default to chatml
* Simplify default chatml logic
2025-02-03 23:49:27 +00:00
Eric Curtin
84ec8a58f7
Name colors ( #11573 )
...
It's more descriptive, use #define's so we can use compile-time
concatenations.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-02-02 15:14:48 +00:00
Olivier Chafik
bfcce4d693
tool-call
: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 )
...
* `tool-call`: support Command R7B (w/ tool_plan return)
* `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override
* `tool-call`: test cleanup / handle lazy grammar triggers
2025-02-02 09:25:38 +00:00
Olivier Chafik
69804487e0
Fix exotic ci env that lacks ostringstream::str ( #11581 )
2025-02-02 09:10:15 +00:00
Michał Moskal
ff227703d6
sampling : support for llguidance grammars ( #10224 )
...
* initial porting of previous LLG patch
* update for new APIs
* build: integrate llguidance as an external project
* use '%llguidance' as marker to enable llg lark syntax
* add some docs
* clarify docs
* code style fixes
* remove llguidance.h from .gitignore
* fix tests when llg is enabled
* pass vocab not model to llama_sampler_init_llg()
* copy test-grammar-integration.cpp to test-llguidance.cpp
* clang fmt
* fix ref-count bug
* build and run test
* gbnf -> lark syntax
* conditionally include llguidance test based on LLAMA_LLGUIDANCE flag
* rename llguidance test file to test-grammar-llguidance.cpp
* add gh action for llg test
* align tests with LLG grammar syntax and JSON Schema spec
* llama_tokenizer() in fact requires valid utf8
* update llg
* format file
* add $LLGUIDANCE_LOG_LEVEL support
* fix whitespace
* fix warning
* include <cmath> for INFINITY
* add final newline
* fail llama_sampler_init_llg() at runtime
* Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes
* simplify #includes
* improve doc string for LLAMA_LLGUIDANCE
* typo in merge
* bump llguidance to 0.6.12
2025-02-02 09:55:32 +02:00
Olivier Chafik
cfd74c86db
sync
: minja (418a2364b5
) ( #11574 )
2025-02-01 12:24:51 +00:00
Concedo
f13498df13
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/tools.sh
# .devops/vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .github/workflows/server.yml
# Makefile
# README.md
# cmake/llama-config.cmake.in
# common/CMakeLists.txt
# examples/gbnf-validator/gbnf-validator.cpp
# examples/run/run.cpp
# examples/server/README.md
# examples/server/tests/README.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
2025-02-01 17:14:59 +08:00
Olivier Chafik
a83f528688
tool-call
: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539 )
...
* An empty tool_call_id is better than none!
* sync: minja (tool call name optional https://github.com/google/minja/pull/36 )
* Force-disable parallel_tool_calls if template doesn't support it
* More debug logs
* Llama 3.x tools: accept / trigger on more varied spaced outputs
* Fix empty content for functionary v3.2 tool call
* Add proper tool call docs to server README
* readme: function calling *is* supported now
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-31 14:15:25 +00:00
Steve Grubb
1bd3047a93
common: Add missing va_end ( #11529 )
...
The va_copy man page states that va_end must be called to revert
whatever the copy did. For some implementaions, not calling va_end
has no consequences. For others it could leak memory.
2025-01-31 07:58:55 +02:00
Olivier Chafik
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )
...
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Olivier Chafik
3d804dec76
sync: minja ( #11499 )
2025-01-30 10:30:27 +00:00
Daniel Bevenius
b636228c0a
embedding : enable --no-warmup option ( #11475 )
...
This commit enables the `--no-warmup` option for the llama-embeddings.
The motivation for this change is to allow the user to disable the
warmup when running the the program.
2025-01-29 10:38:54 +02:00
Concedo
bec231422a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# common/CMakeLists.txt
# docs/backend/SYCL.md
# docs/build.md
# docs/docker.md
# examples/export-lora/export-lora.cpp
# examples/main/README.md
# examples/main/main.cpp
# examples/run/README.md
# examples/run/run.cpp
# examples/server/README.md
# examples/simple-chat/simple-chat.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# src/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2025-01-25 14:16:50 +08:00
Olivier Chafik
c64d2becb1
minja
: sync at 0f5f7f2b37
( #11352 )
2025-01-22 16:16:27 +00:00
Olivier Chafik
a94f3b2727
common
: utils to split / join / repeat strings (from json converter) (#11342 )
...
* Factor string_join, string_split, string_repeat into common
* json: refactor to surface a versatile builder
* Update common.cpp
2025-01-22 09:51:44 +00:00
Olivier Chafik
6171c9d258
Add Jinja template support ( #11016 )
...
* Copy minja from 58f0ca6dd7
* Add --jinja and --chat-template-file flags
* Add missing <optional> include
* Avoid print in get_hf_chat_template.py
* No designated initializers yet
* Try and work around msvc++ non-macro max resolution quirk
* Update test_chat_completion.py
* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
* Refactor test-chat-template
* Test templates w/ minja
* Fix deprecation
* Add --jinja to llama-run
* Update common_chat_format_example to use minja template wrapper
* Test chat_template in e2e test
* Update utils.py
* Update test_chat_completion.py
* Update run.cpp
* Update arg.cpp
* Refactor common_chat_* functions to accept minja template + use_jinja option
* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
* Revert LLAMA_CHATML_TEMPLATE refactor
* Normalize newlines in test-chat-templates for windows tests
* Forward decl minja::chat_template to avoid eager json dep
* Flush stdout in chat template before potential crash
* Fix copy elision warning
* Rm unused optional include
* Add missing optional include to server.cpp
* Disable jinja test that has a cryptic windows failure
* minja: fix vigogne (https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626
* Update minja to https://github.com/google/minja/pull/25
* Update minja from https://github.com/google/minja/pull/27
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
Georgi Gerganov
80d0d6b4b7
common : add -hfd option for the draft model ( #11318 )
...
* common : add -hfd option for the draft model
* cont : fix env var
* cont : more fixes
2025-01-20 22:29:43 +02:00
LostRuins Concedo
6390a998bf
tts : add guide tokens support ( #11186 )
...
* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.
* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start
2025-01-18 12:20:57 +02:00
Concedo
96407502cd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama-bench/llama-bench.cpp
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
2025-01-17 23:13:50 +08:00
Radoslav Gerganov
667d72846c
rpc : early register backend devices ( #11262 )
...
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609
2025-01-17 10:57:09 +02:00
Concedo
11cd7c7bb0
survived the storm, again
2025-01-16 22:25:18 +08:00
Concedo
2a00ee8fa8
broken commit
2025-01-16 21:41:18 +08:00
Xuan Son Nguyen
84a44815f7
cli : auto activate conversation mode if chat template is available ( #11214 )
...
* cli : auto activate conversation mode if chat template is detected
* add warn on bad template
* update readme (writing with the help of chatgpt)
* update readme (2)
* do not activate -cnv for non-instruct models
2025-01-13 20:18:12 +01:00
Xuan Son Nguyen
00b4c3da62
common : support tag-based --hf-repo like on ollama ( #11195 )
...
* common : support tag-based hf_repo like on ollama
* fix build
* various fixes
* small fixes
* fix style
* fix windows build?
* move common_get_hf_file to common.cpp
* fix complain with noreturn
2025-01-13 13:56:23 +01:00
Xuan Son Nguyen
9a483999a6
llama : fix chat template gguf key ( #11201 )
2025-01-12 13:45:14 +01:00
Georgi Gerganov
afa8a9ec9b
llama : add llama_vocab
, functions -> methods, naming ( #11110 )
...
* llama : functions -> methods (#11110 )
* llama : add struct llama_vocab to the API (#11156 )
ggml-ci
* hparams : move vocab params to llama_vocab (#11159 )
ggml-ci
* vocab : more pimpl (#11165 )
ggml-ci
* vocab : minor tokenization optimizations (#11160 )
ggml-ci
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* lora : update API names (#11167 )
ggml-ci
* llama : update API names to use correct prefix (#11174 )
* llama : update API names to use correct prefix
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* minor [no ci]
* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174 )
ggml-ci
* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174 )
ggml-ci
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-12 11:32:42 +02:00
Concedo
07173e84a0
add ability to use guide tokens for TTS, ref: https://github.com/ggerganov/llama.cpp/pull/11186
2025-01-11 17:18:21 +08:00
Concedo
1b49dc305f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# examples/run/run.cpp
# examples/server/README.md
# scripts/sync-ggml.last
2025-01-09 16:50:29 +08:00
Concedo
e788b8289a
You'll never take us alive
...
We swore that death will do us part
They'll call our crimes a work of art
2025-01-09 11:27:06 +08:00
Concedo
dcfa1eca4e
Merge commit ' 017cc5f446
' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# CODEOWNERS
# examples/batched-bench/batched-bench.cpp
# examples/batched/batched.cpp
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
# examples/gritlm/gritlm.cpp
# examples/llama-bench/llama-bench.cpp
# examples/passkey/passkey.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/run/run.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/tokenize/tokenize.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-autorelease.cpp
# tests/test-model-load-cancel.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
2025-01-08 23:15:21 +08:00
Georgi Gerganov
a3c1232c3f
arg : option to exclude arguments from specific examples ( #11136 )
...
* arg : option to exclude arguments from specific examples
ggml-ci
* readme : remove old args [no ci]
2025-01-08 12:55:36 +02:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes ( #11030 )
...
* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types
2025-01-07 18:01:58 +01:00
Georgi Gerganov
47182dd03f
llama : update llama_model API names ( #11063 )
...
* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci
2025-01-06 10:55:18 +02:00
Georgi Gerganov
727368c60f
llama : use LLAMA_TOKEN_NULL ( #11062 )
...
ggml-ci
2025-01-06 10:52:15 +02:00
Concedo
dca7ab5d9e
fixed tools compile error
2025-01-04 18:45:21 +08:00
Concedo
f9f1585a7f
broken merge - kcpp changes will be applied above this commit for better tracking.
2025-01-03 23:49:17 +08:00
Molly Sophia
4b0c638b9a
common : disable KV cache shifting automatically for unsupported models ( #11053 )
...
* Disable KV cache shifting automatically for unsupported models
instead of exiting directly
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-03 14:13:18 +02:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp
( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00
Concedo
911da8765f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/run/run.cpp
# examples/server/README.md
# examples/server/bench/README.md
# examples/server/tests/README.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# tests/test-backend-ops.cpp
2025-01-03 11:56:20 +08:00
Xuan Son Nguyen
45095a61bf
server : clean up built-in template detection ( #11026 )
...
* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition
2024-12-31 15:22:01 +01:00
Peter
6e1531aca5
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON ( #11013 )
...
In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)
In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)
In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)
2024-12-31 01:46:06 +01:00
Concedo
4c56b7cada
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/gbnf-validator/gbnf-validator.cpp
# examples/llava/clip.cpp
# examples/run/README.md
# examples/run/run.cpp
# examples/server/README.md
# ggml/src/ggml-cpu/CMakeLists.txt
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-llama-grammar.cpp
2024-12-21 09:41:49 +08:00
Molly Sophia
0a11f8b7b5
convert : fix RWKV v6 model conversion ( #10913 )
...
* Enable --no-context-shift for llama-perplexity example
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV 6: Fix error in ggml_cuda_op_bin_bcast
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-12-20 11:44:58 +02:00
Georgi Gerganov
36319dec5d
tts : small QoL for easy model fetch ( #10903 )
2024-12-19 17:35:15 +02:00
Concedo
ee486bad3e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# examples/CMakeLists.txt
# examples/batched/batched.cpp
# examples/gritlm/gritlm.cpp
# examples/llama.android/llama/build.gradle.kts
# examples/main/README.md
# examples/retrieval/retrieval.cpp
# examples/server/CMakeLists.txt
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml.c
# scripts/compare-commits.sh
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-sampling.cpp
2024-12-19 11:57:43 +08:00
Georgi Gerganov
0bf2d10c55
tts : add OuteTTS support ( #10784 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder
2024-12-18 19:27:21 +02:00