Concedo
34a0fab87c
updated to latest clinfo from https://github.com/Oblomov/clinfo
...
direct link: https://ci.appveyor.com/api/projects/oblomov/clinfo/artifacts/clinfo.exe?job=platform%3a+x64
2025-02-21 19:51:27 +08:00
Concedo
f2ac10c014
added nsigma to lite
2025-02-21 15:11:24 +08:00
EquinoxPsychosis
2740af3660
add top n sigma sampler from llama.cpp ( #1384 )
...
* Add N Sigma Sampler
* update nsigma sampler chain
* xtc position fix
* remove stray newline
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-02-21 14:31:42 +08:00
Concedo
5f74ee3c3b
merge sd fix
2025-02-21 11:16:26 +08:00
Concedo
6d7ef10671
Merge branch 'upstream' into concedo_experimental
...
Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .gitignore
# CONTRIBUTING.md
# Makefile
# common/CMakeLists.txt
# common/arg.cpp
# common/common.cpp
# examples/main/main.cpp
# examples/run/run.cpp
# examples/server/tests/README.md
# ggml/src/ggml-cuda/mma.cuh
# scripts/get_chat_template.py
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
2025-02-20 23:17:20 +08:00
Concedo
41350df81f
updated lite, added ability to export kcpps via CLI
2025-02-20 22:58:12 +08:00
Johannes Gäßler
d04e7163c8
doc: add links to ggml examples [no ci] ( #11958 )
2025-02-19 20:45:17 +01:00
Daniel Bevenius
d07c621393
common : add llama.vim preset for Qwen2.5 Coder ( #11945 )
...
This commit adds a preset for llama.vim to use the default Qwen 2.5
Coder models.
The motivation for this change is to make it easier to start a server
suitable to be used with the llama.vim plugin. For example, the server
can be started with a command like the following:
```console
$ llama.vim --fim-qwen-1.5b-default
```
Refs: https://github.com/ggml-org/llama.cpp/issues/10932
2025-02-19 12:29:52 +01:00
Georgi Gerganov
abd4d0bc4f
speculative : update default params ( #11954 )
...
* speculative : update default params
* speculative : do not discard the last drafted token
2025-02-19 13:29:42 +02:00
Daniel Bevenius
9626d9351a
llama : fix indentation in llama-grammar [no ci] ( #11943 )
...
This commit adjusts the indentation for the functions `parse_sequence`
and `parse_rule` in src/llama-grammar.cpp.
The motivation is consistency and improve readability.
2025-02-19 06:16:23 +01:00
igardev
b58934c183
server : (webui) Enable communication with parent html (if webui is in iframe) ( #11940 )
...
* Webui: Enable communication with parent html (if webui is in iframe):
- Listens for "setText" command from parent with "text" and "context" fields. "text" is set in inputMsg, "context" is used as hidden context on the following requests to the llama.cpp server
- On pressing na Escape button sends command "escapePressed" to the parent
Example handling from the parent html side:
- Send command "setText" from parent html to webui in iframe:
const iframe = document.getElementById('askAiIframe');
if (iframe) {
iframe.contentWindow.postMessage({ command: 'setText', text: text, context: context }, '*');
}
- Listen for Escape key from webui on parent html:
// Listen for escape key event in the iframe
window.addEventListener('keydown', (event) => {
if (event.key === 'Escape') {
// Process case when Escape is pressed inside webui
}
});
* Move the extraContext from storage to app.context.
* Fix formatting.
* add Message.extra
* format + build
* MessageExtraContext
* build
* fix display
* rm console.log
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-02-18 23:01:44 +01:00
Olivier Chafik
63e489c025
tool-call: refactor common chat / tool-call api (+ tests / fixes) ( #11900 )
...
* tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type
* addressed clang-tidy lints in [test-]chat.*
* rm minja deps from util & common & move it to common/minja/
* add name & tool_call_id to common_chat_msg
* add common_chat_tool
* added json <-> tools, msgs conversions to chat.h
* fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens)
* fix deepseek r1 slow test (no longer <think> opening w/ new template)
* allow empty tools w/ auto + grammar
* fix & test server grammar & json_schema params w/ & w/o --jinja
2025-02-18 18:03:23 +00:00
Xuan-Son Nguyen
63ac128563
server : add TEI API format for /rerank endpoint ( #11942 )
...
* server : add TEI API format for /rerank endpoint
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix
* also gitignore examples/server/*.gz.hpp
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-18 14:21:41 +01:00
MoonRide303
5137da7b8c
scripts: corrected encoding when getting chat template ( #11866 ) ( #11907 )
...
Signed-off-by: MoonRide303 <moonride303@gmail.com>
2025-02-18 10:30:16 +01:00
xiaobing318
09aaf4f1f5
docs : Fix duplicated file extension in test command ( #11935 )
...
This commit fixes an issue in the llama.cpp project where the command for testing the llama-server object contained a duplicated file extension. The original command was:
./tests.sh unit/test_chat_completion.py.py -v -x
It has been corrected to:
./tests.sh unit/test_chat_completion.py -v -x
This change ensures that the test script correctly locates and executes the intended test file, preventing test failures due to an incorrect file name.
2025-02-18 10:12:49 +01:00
Johannes Gäßler
73e2ed3ce3
CUDA: use async data loading for FlashAttention ( #11894 )
...
* CUDA: use async data loading for FlashAttention
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-17 14:03:24 +01:00
Eve
f7b1116af1
update release requirements ( #11897 )
2025-02-17 12:20:23 +01:00
Antoine Viallon
c4d29baf32
server : fix divide-by-zero in metrics reporting ( #11915 )
2025-02-17 11:25:12 +01:00
Rémy O
2eea03d86a
vulkan: implement several ops relevant for ggml_opt ( #11769 )
...
* vulkan: support memset_tensor
* vulkan: support GGML_OP_SUM
* vulkan: implement GGML_OP_ARGMAX
* vulkan: implement GGML_OP_SUB
* vulkan: implement GGML_OP_COUNT_EQUAL
* vulkan: implement GGML_OP_OPT_STEP_ADAMW
* vulkan: fix check_results RWKV_WKV6 crash and memory leaks
* vulkan: implement GGML_OP_REPEAT_BACK
* tests: remove invalid test-backend-ops REPEAT_BACK tests
* vulkan: fix COUNT_EQUAL memset using a fillBuffer command
2025-02-17 07:55:57 +01:00
Concedo
a67044270a
Merge remote-tracking branch 'jg/cuda-fa-mma-17' into debug4
2025-02-17 09:50:11 +08:00
Xuan-Son Nguyen
0f2bbe6564
server : bump httplib to 0.19.0 ( #11908 )
2025-02-16 17:11:22 +00:00
Concedo
6fa50f78bf
allow kcppt for config switching
2025-02-17 00:48:34 +08:00
Concedo
15ae98c9cd
better error handling for downloads
2025-02-16 23:13:09 +08:00
Concedo
58380153b2
safer autoguess fix
...
verbose outputs (+3 squashed commit)
Squashed commit:
[7bbbfc10] fixed a retry history bug
[824b9bf7] another autoguess fix
2025-02-16 21:13:45 +08:00
standby24x7
fe163d5bf3
common : Fix a typo in help ( #11899 )
...
This patch fixes a typo in command help.
prefx -> prefix
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
2025-02-16 10:51:13 +01:00
Xuan-Son Nguyen
818a340ea8
ci : fix (again) arm64 build fails ( #11895 )
...
* docker : attempt fixing arm64 build on ci
* qemu v7.0.0-28
2025-02-16 10:36:39 +01:00
Jeff Bolz
bf42a23d0a
vulkan: support multi/vision rope, and noncontiguous rope ( #11902 )
2025-02-16 08:52:23 +01:00
Hale Chan
c2ea16f260
metal : fix the crash caused by the lack of residency set support on Intel Macs. ( #11904 )
2025-02-16 08:50:26 +02:00
Concedo
e0bdb2f622
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/imatrix/README.md
# scripts/compare-llama-bench.py
2025-02-16 12:48:54 +08:00
Concedo
5a79dd57b9
add short delay before launching browser
2025-02-16 12:45:14 +08:00
Concedo
299d6ce0ed
horde advertised max ctx
2025-02-16 11:59:08 +08:00
Johannes Gäßler
727db805a2
try CI fix
2025-02-15 22:44:27 +01:00
Johannes Gäßler
eb4f7954b6
CUDA: use async data loading for FlashAttention
2025-02-15 21:39:40 +01:00
Johannes Gäßler
6dde178248
scripts: fix compare-llama-bench commit hash logic ( #11891 )
2025-02-15 20:23:22 +01:00
708-145
fc10c38ded
examples: fix typo in imatrix/README.md ( #11884 )
...
* simple typo fixed
* Update examples/imatrix/README.md
---------
Co-authored-by: Tobias Bergmann <tobias.bergmann@gmx.de>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-15 21:03:30 +02:00
Adrian Kretz
22885105a6
metal : optimize dequant q6_K kernel ( #11892 )
2025-02-15 20:39:20 +02:00
Georgi Gerganov
c2cd24fbfd
readme : add notice about new package registry ( #11890 )
...
* readme : add notice about new package registry
* cont : fix whitespace
2025-02-15 20:29:56 +02:00
Concedo
f144b1f345
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/nix/package.nix
# .devops/rocm.Dockerfile
# .github/ISSUE_TEMPLATE/020-enhancement.yml
# .github/ISSUE_TEMPLATE/030-research.yml
# .github/ISSUE_TEMPLATE/040-refactor.yml
# .github/ISSUE_TEMPLATE/config.yml
# .github/pull_request_template.md
# .github/workflows/bench.yml.disabled
# .github/workflows/build.yml
# .github/workflows/labeler.yml
# CONTRIBUTING.md
# Makefile
# README.md
# SECURITY.md
# ci/README.md
# common/CMakeLists.txt
# docs/android.md
# docs/backend/SYCL.md
# docs/build.md
# docs/cuda-fedora.md
# docs/development/HOWTO-add-model.md
# docs/docker.md
# docs/install.md
# docs/llguidance.md
# examples/cvector-generator/README.md
# examples/imatrix/README.md
# examples/imatrix/imatrix.cpp
# examples/llama.android/llama/src/main/cpp/CMakeLists.txt
# examples/llama.swiftui/README.md
# examples/llama.vim
# examples/lookahead/README.md
# examples/lookup/README.md
# examples/main/README.md
# examples/passkey/README.md
# examples/pydantic_models_to_grammar_examples.py
# examples/retrieval/README.md
# examples/server/CMakeLists.txt
# examples/server/README.md
# examples/simple-cmake-pkg/README.md
# examples/speculative/README.md
# flake.nix
# grammars/README.md
# pyproject.toml
# scripts/check-requirements.sh
2025-02-16 02:08:39 +08:00
Concedo
fd211dbeb3
fixed lite
2025-02-16 01:56:26 +08:00
Concedo
673e33ca03
correction
2025-02-16 00:55:14 +08:00
Concedo
2ca13694f3
trying new ubuntu for ci
2025-02-15 22:59:33 +08:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Concedo
5b9fc4b3a3
Revert "try use docker to prepare for upcoming deprecation"
...
This reverts commit d4791c5188eb99b3d160f5bbe482a93749cf96a0. (+2 squashed commit)
Squashed commit:
[d4791c51] try use docker to prepare for upcoming deprecation
[1e120978] updated lite
2025-02-15 22:11:33 +08:00
Olivier Chafik
f355229692
server: fix type promotion typo causing crashes w/ --jinja w/o tools ( #11880 )
2025-02-15 10:11:36 +00:00
Concedo
f48bd3f919
added automatic recovery if bad config is loaded, will restore to known good config
2025-02-15 17:16:21 +08:00
Rémy O
fc1b0d0936
vulkan: initial support for IQ1_S and IQ1_M quantizations ( #11528 )
...
* vulkan: initial support for IQ1_S and IQ1_M quantizations
* vulkan: define MMV kernels for IQ1 quantizations
* devops: increase timeout of Vulkan tests again
* vulkan: simplify ifdef for init_iq_shmem
2025-02-15 09:01:40 +01:00
Concedo
302fedc649
updated lite
2025-02-15 13:03:43 +08:00
Concedo
f723b08347
fixed adapter bug
2025-02-15 12:06:45 +08:00
Michał Moskal
89daa2564f
llguidance build fixes for Windows ( #11664 )
...
* setup windows linking for llguidance; thanks @phil-scott-78
* add build instructions for windows and update script link
* change VS Community link from DE to EN
* whitespace fix
2025-02-14 12:46:08 -08:00
lhez
300907b211
opencl: Fix rope and softmax ( #11833 )
...
* opencl: fix `ROPE`
* opencl: fix `SOFT_MAX`
* Add fp16 variant
* opencl: enforce subgroup size for `soft_max`
2025-02-14 12:12:23 -07:00