koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-12 01:54:37 +00:00

Author	SHA1	Message	Date
Concedo	db6db9dff9	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/close-issue.yml # .github/workflows/server.yml # AUTHORS # CMakeLists.txt # Makefile # README.md # cmake/llama.pc.in # common/CMakeLists.txt # docs/build.md # examples/batched.swift/Sources/main.swift # examples/llama.swiftui/llama.cpp.swift/LibLlama.swift # examples/llava/CMakeLists.txt # examples/llava/clip.h # examples/run/run.cpp # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp	2025-02-07 00:52:31 +08:00
Olivier Chafik	9f4cc8f8d3	`sync`: minja (#11641 ) * `sync`: minja `182de30cda` https://github.com/google/minja/pull/46 https://github.com/google/minja/pull/45	2025-02-05 01:00:12 +00:00
Radoslav Gerganov	1bef571f6a	arg : list RPC devices first when using --list-devices (#11655 ) List devices in the same order as they appear when evaluating the model and splitting tensors across devices, i.e. RPC devices come first in the list. ref #11435	2025-02-04 18:16:20 +02:00
Olivier Chafik	db288b60cb	`tool-call`: command r7b fix for normal responses (#11608 ) * fix command r7b normal response regex + add to server test * test multiline non-tool-call responses in test-chat	2025-02-04 15:48:53 +00:00
Olivier Chafik	cde3833239	`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to chatml upon parsing issue, avoid double bos (#11616 ) * tool-call: allow `--jinja --chat-template chatml` * fix double bos issue (drop bos/eos tokens from jinja template) * add missing try catch around jinja parsing to default to chatml * Simplify default chatml logic	2025-02-03 23:49:27 +00:00
Eric Curtin	84ec8a58f7	Name colors (#11573 ) It's more descriptive, use #define's so we can use compile-time concatenations. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-02-02 15:14:48 +00:00
Olivier Chafik	bfcce4d693	`tool-call`: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 ) * `tool-call`: support Command R7B (w/ tool_plan return) * `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override * `tool-call`: test cleanup / handle lazy grammar triggers	2025-02-02 09:25:38 +00:00
Olivier Chafik	69804487e0	Fix exotic ci env that lacks ostringstream::str (#11581 )	2025-02-02 09:10:15 +00:00
Michał Moskal	ff227703d6	sampling : support for llguidance grammars (#10224 ) * initial porting of previous LLG patch * update for new APIs * build: integrate llguidance as an external project * use '%llguidance' as marker to enable llg lark syntax * add some docs * clarify docs * code style fixes * remove llguidance.h from .gitignore * fix tests when llg is enabled * pass vocab not model to llama_sampler_init_llg() * copy test-grammar-integration.cpp to test-llguidance.cpp * clang fmt * fix ref-count bug * build and run test * gbnf -> lark syntax * conditionally include llguidance test based on LLAMA_LLGUIDANCE flag * rename llguidance test file to test-grammar-llguidance.cpp * add gh action for llg test * align tests with LLG grammar syntax and JSON Schema spec * llama_tokenizer() in fact requires valid utf8 * update llg * format file * add $LLGUIDANCE_LOG_LEVEL support * fix whitespace * fix warning * include <cmath> for INFINITY * add final newline * fail llama_sampler_init_llg() at runtime * Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes * simplify #includes * improve doc string for LLAMA_LLGUIDANCE * typo in merge * bump llguidance to 0.6.12	2025-02-02 09:55:32 +02:00
Olivier Chafik	cfd74c86db	`sync`: minja (`418a2364b5`) (#11574 )	2025-02-01 12:24:51 +00:00
Concedo	f13498df13	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/tools.sh # .devops/vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/docker.yml # .github/workflows/server.yml # Makefile # README.md # cmake/llama-config.cmake.in # common/CMakeLists.txt # examples/gbnf-validator/gbnf-validator.cpp # examples/run/run.cpp # examples/server/README.md # examples/server/tests/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp	2025-02-01 17:14:59 +08:00
Olivier Chafik	a83f528688	`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539 ) * An empty tool_call_id is better than none! * sync: minja (tool call name optional https://github.com/google/minja/pull/36) * Force-disable parallel_tool_calls if template doesn't support it * More debug logs * Llama 3.x tools: accept / trigger on more varied spaced outputs * Fix empty content for functionary v3.2 tool call * Add proper tool call docs to server README * readme: function calling is supported now * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-31 14:15:25 +00:00
Steve Grubb	1bd3047a93	common: Add missing va_end (#11529 ) The va_copy man page states that va_end must be called to revert whatever the copy did. For some implementaions, not calling va_end has no consequences. For others it could leak memory.	2025-01-31 07:58:55 +02:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Olivier Chafik	3d804dec76	sync: minja (#11499 )	2025-01-30 10:30:27 +00:00
Daniel Bevenius	b636228c0a	embedding : enable --no-warmup option (#11475 ) This commit enables the `--no-warmup` option for the llama-embeddings. The motivation for this change is to allow the user to disable the warmup when running the the program.	2025-01-29 10:38:54 +02:00
Concedo	bec231422a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # common/CMakeLists.txt # docs/backend/SYCL.md # docs/build.md # docs/docker.md # examples/export-lora/export-lora.cpp # examples/main/README.md # examples/main/main.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # examples/simple-chat/simple-chat.cpp # ggml/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2025-01-25 14:16:50 +08:00
Olivier Chafik	c64d2becb1	`minja`: sync at `0f5f7f2b37` (#11352 )	2025-01-22 16:16:27 +00:00
Olivier Chafik	a94f3b2727	`common`: utils to split / join / repeat strings (from json converter) (#11342 ) * Factor string_join, string_split, string_repeat into common * json: refactor to surface a versatile builder * Update common.cpp	2025-01-22 09:51:44 +00:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-21 13:18:51 +00:00
Georgi Gerganov	80d0d6b4b7	common : add -hfd option for the draft model (#11318 ) * common : add -hfd option for the draft model * cont : fix env var * cont : more fixes	2025-01-20 22:29:43 +02:00
LostRuins Concedo	6390a998bf	tts : add guide tokens support (#11186 ) * Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences. * applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start	2025-01-18 12:20:57 +02:00
Concedo	96407502cd	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/llama-bench/llama-bench.cpp # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt # src/llama-vocab.cpp # tests/test-backend-ops.cpp	2025-01-17 23:13:50 +08:00
Radoslav Gerganov	667d72846c	rpc : early register backend devices (#11262 ) Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609	2025-01-17 10:57:09 +02:00
Concedo	11cd7c7bb0	survived the storm, again	2025-01-16 22:25:18 +08:00
Concedo	2a00ee8fa8	broken commit	2025-01-16 21:41:18 +08:00
Xuan Son Nguyen	84a44815f7	cli : auto activate conversation mode if chat template is available (#11214 ) * cli : auto activate conversation mode if chat template is detected * add warn on bad template * update readme (writing with the help of chatgpt) * update readme (2) * do not activate -cnv for non-instruct models	2025-01-13 20:18:12 +01:00
Xuan Son Nguyen	00b4c3da62	common : support tag-based --hf-repo like on ollama (#11195 ) * common : support tag-based hf_repo like on ollama * fix build * various fixes * small fixes * fix style * fix windows build? * move common_get_hf_file to common.cpp * fix complain with noreturn	2025-01-13 13:56:23 +01:00
Xuan Son Nguyen	9a483999a6	llama : fix chat template gguf key (#11201 )	2025-01-12 13:45:14 +01:00
Georgi Gerganov	afa8a9ec9b	llama : add `llama_vocab`, functions -> methods, naming (#11110 ) * llama : functions -> methods (#11110) * llama : add struct llama_vocab to the API (#11156) ggml-ci * hparams : move vocab params to llama_vocab (#11159) ggml-ci * vocab : more pimpl (#11165) ggml-ci * vocab : minor tokenization optimizations (#11160) ggml-ci Co-authored-by: Diego Devesa <slarengh@gmail.com> * lora : update API names (#11167) ggml-ci * llama : update API names to use correct prefix (#11174) * llama : update API names to use correct prefix ggml-ci * cont ggml-ci * cont ggml-ci * minor [no ci] * vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174) ggml-ci * vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174) ggml-ci --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-12 11:32:42 +02:00
Concedo	07173e84a0	add ability to use guide tokens for TTS, ref: https://github.com/ggerganov/llama.cpp/pull/11186	2025-01-11 17:18:21 +08:00
Concedo	1b49dc305f	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # examples/run/run.cpp # examples/server/README.md # scripts/sync-ggml.last	2025-01-09 16:50:29 +08:00
Concedo	e788b8289a	You'll never take us alive We swore that death will do us part They'll call our crimes a work of art	2025-01-09 11:27:06 +08:00
Concedo	dcfa1eca4e	Merge commit '`017cc5f446`' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # CODEOWNERS # examples/batched-bench/batched-bench.cpp # examples/batched/batched.cpp # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp # examples/gritlm/gritlm.cpp # examples/llama-bench/llama-bench.cpp # examples/passkey/passkey.cpp # examples/quantize-stats/quantize-stats.cpp # examples/run/run.cpp # examples/simple-chat/simple-chat.cpp # examples/simple/simple.cpp # examples/tokenize/tokenize.cpp # ggml/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # scripts/sync-ggml.last # src/llama.cpp # tests/test-autorelease.cpp # tests/test-model-load-cancel.cpp # tests/test-tokenizer-0.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-spm.cpp	2025-01-08 23:15:21 +08:00
Georgi Gerganov	a3c1232c3f	arg : option to exclude arguments from specific examples (#11136 ) * arg : option to exclude arguments from specific examples ggml-ci * readme : remove old args [no ci]	2025-01-08 12:55:36 +02:00
Johannes Gäßler	53ff6b9b9f	GGUF: C++ refactor, backend support, misc fixes (#11030 ) * GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types	2025-01-07 18:01:58 +01:00
Georgi Gerganov	47182dd03f	llama : update llama_model API names (#11063 ) * llama : deprecate llama_free_model, add llama_model_free ggml-ci * llama : change `llama_load_model_from_file` -> `llama_model_load_from_file` ggml-ci	2025-01-06 10:55:18 +02:00
Georgi Gerganov	727368c60f	llama : use LLAMA_TOKEN_NULL (#11062 ) ggml-ci	2025-01-06 10:52:15 +02:00
Concedo	dca7ab5d9e	fixed tools compile error	2025-01-04 18:45:21 +08:00
Concedo	f9f1585a7f	broken merge - kcpp changes will be applied above this commit for better tracking.	2025-01-03 23:49:17 +08:00
Molly Sophia	4b0c638b9a	common : disable KV cache shifting automatically for unsupported models (#11053 ) * Disable KV cache shifting automatically for unsupported models instead of exiting directly Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update common/common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-03 14:13:18 +02:00
Georgi Gerganov	f66f582927	llama : refactor `src/llama.cpp` (#10902 ) * llama : scatter llama.cpp into multiple modules (wip) * llama : control-vector -> adapter * llama : arch * llama : mmap ggml-ci * ci : remove BUILD_SHARED_LIBS=OFF ggml-ci * llama : arch (cont) ggml-ci * llama : chat ggml-ci * llama : model ggml-ci * llama : hparams ggml-ci * llama : adapter ggml-ci * examples : fix ggml-ci * rebase ggml-ci * minor * llama : kv cache ggml-ci * llama : impl ggml-ci * llama : batch ggml-ci * cont ggml-ci * llama : context ggml-ci * minor * llama : context (cont) ggml-ci * llama : model loader ggml-ci * common : update lora ggml-ci * llama : quant ggml-ci * llama : quant (cont) ggml-ci * minor [no ci]	2025-01-03 10:18:53 +02:00
Concedo	911da8765f	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/run/run.cpp # examples/server/README.md # examples/server/bench/README.md # examples/server/tests/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # tests/test-backend-ops.cpp	2025-01-03 11:56:20 +08:00
Xuan Son Nguyen	45095a61bf	server : clean up built-in template detection (#11026 ) * server : clean up built-in template detection * fix compilation * add chat template test * fix condition	2024-12-31 15:22:01 +01:00
Peter	6e1531aca5	common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013 ) In common/common.cpp: * Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature) * Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2) In examples/run/run.cpp: * Add io.h header inclusion (error cannot find function _get_osfhandle) * Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members) * Add initialiser for hFile (warning it may be uninitialised) * Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int) In ggml/src/ggml-opencl/ggml-opencl.cpp: * Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)	2024-12-31 01:46:06 +01:00
Concedo	4c56b7cada	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/gbnf-validator/gbnf-validator.cpp # examples/llava/clip.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # ggml/src/ggml-cpu/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-12-21 09:41:49 +08:00
Molly Sophia	0a11f8b7b5	convert : fix RWKV v6 model conversion (#10913 ) * Enable --no-context-shift for llama-perplexity example Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV 6: Fix error in ggml_cuda_op_bin_bcast Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-12-20 11:44:58 +02:00
Georgi Gerganov	36319dec5d	tts : small QoL for easy model fetch (#10903 )	2024-12-19 17:35:15 +02:00
Concedo	ee486bad3e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # examples/CMakeLists.txt # examples/batched/batched.cpp # examples/gritlm/gritlm.cpp # examples/llama.android/llama/build.gradle.kts # examples/main/README.md # examples/retrieval/retrieval.cpp # examples/server/CMakeLists.txt # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml.c # scripts/compare-commits.sh # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-sampling.cpp	2024-12-19 11:57:43 +08:00
Georgi Gerganov	0bf2d10c55	tts : add OuteTTS support (#10784 ) * server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : be explicit about the pooling type in the tests ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * llama : add OuteTTS support (wip) * wip * extract features * first conv * group norm * resnet conv * resnet * attn * pos net * layer norm * convnext * head * hann window * fix n_embd + remove llama.cpp hacks * compute hann window * fft * spectrum processing * clean-up * tts : receive input text and generate codes * clip : fix new conv name * tts : minor fix * tts : add header + minor fixes ggml-ci * tts : add matchematical constant ggml-ci * tts : fix sampling + cut initial noise * tts : fixes * tts : update default samplers ggml-ci * tts : text pre-processing * tts : outetts-voc -> wavtokenizer-dec * tts : remove hardcoded constants ggml-ci * tts : fix tensor shapes * llama : refactor wavtokenizer tensors ggml-ci * cont ggml-ci * cont [no ci] * llama : update WavTokenizer to non-causal attn * llama : handle no-vocab detokenization * tts : add Python example for OuteTTS (wip) * tts : extend python example to generate spectrogram ggml-ci * server : fix rebase artifacts * tts : enable "return_tokens" in Python example ggml-ci * tts : minor fixes * common : support HF download for vocoder	2024-12-18 19:27:21 +02:00

1 2 3 4 5 ...

589 commits