Commit graph

12412 commits

Author SHA1 Message Date
Concedo
674b7f5eee indicate support for claude messages api 2026-03-29 00:57:58 +08:00
Concedo
e3b7905e1c added anthropic messages api support 2026-03-29 00:55:32 +08:00
Concedo
5ad9e3ee31 crude openai responses streaming 2026-03-29 00:16:30 +08:00
Concedo
94b266a6b0 musicui fix reset defaults 2026-03-28 21:09:40 +08:00
Concedo
1e787cd03a improve responses api 2026-03-28 18:42:15 +08:00
Concedo
f768b2a4bd whatever, i tried 2026-03-28 17:32:07 +08:00
Concedo
f80fdd4314 updated sdui 2026-03-28 11:24:03 +08:00
Concedo
547659fdbf allow planning music with llm (+1 squashed commits)
Squashed commits:

[9a3bbf072] allow planning music with llm
2026-03-28 11:19:39 +08:00
Concedo
3ec6381123 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build-self-hosted.yml
#	.github/workflows/build.yml
#	.github/workflows/copilot-setup-steps.yml
#	.github/workflows/gguf-publish.yml
#	ci/run.sh
#	docs/backend/OPENVINO.md
#	examples/llama.android/lib/src/main/cpp/ai_chat.cpp
#	ggml/src/ggml-sycl/add-id.cpp
#	requirements/requirements-pydantic.txt
#	tests/test-gguf.cpp
#	tests/test-jinja.cpp
#	tests/test-llama-archs.cpp
#	tools/gguf-split/README.md
#	tools/llama-bench/llama-bench.cpp
2026-03-28 01:18:20 +08:00
Concedo
2cdf02102e preserve previous filename 2026-03-28 01:13:03 +08:00
Wagner Bruna
e3c6227d46
sd: report back image generation parameters and metadata (#2062)
* sd: refactor image generation result handling

* sd: report back image generation metadata
2026-03-28 00:49:03 +08:00
Concedo
0c2b679ea3 support bf16 quantkv cache type 2026-03-28 00:01:17 +08:00
Concedo
326542f480 rudimentary responses api, not usable yet 2026-03-27 23:38:08 +08:00
Concedo
81cebb6179 remove unused field 2026-03-27 22:52:36 +08:00
scottf007
f0818e1eae
Add socket timeout to is_port_in_use() to fix ~280s startup delay on WSL2 (#2077)
On WSL2 with networkingMode=mirrored, connect_ex() to non-listening ports
gets black-holed through the Windows host networking stack instead of
returning ECONNREFUSED. Without a timeout, TCP SYN retransmits with
exponential backoff (1+2+4+8+16+32+64 ≈ 127s per port), causing Router
Mode's port scan of 15001-15010 to stall for ~280 seconds on startup.

Adding a 1-second timeout makes connect_ex() fail fast, reducing startup
from ~303s to ~23s on affected systems.

Tested on WSL2 Ubuntu 24.04 with mirrored networking, KoboldCpp v1.110,
RTX 3090 Ti, Qwen3.5-27B Q4_K_M.
2026-03-27 22:50:59 +08:00
Concedo
a03998bed6 added jinja kwargs support 2026-03-27 00:28:59 +08:00
Concedo
c91f350ed5 increase max images, take images from the end instead of beginning if too many images 2026-03-26 23:03:52 +08:00
Concedo
4a5c903718 sd model model replacement logic: adjusted approach for easy merge 2026-03-26 21:57:42 +08:00
Concedo
25216a0793 update cuda toolkit to use node24 with a fork 2026-03-26 17:16:22 +08:00
Concedo
633222d2e3 fix tool builds 2026-03-26 15:15:58 +08:00
Concedo
9de6e0db8b up version for github actions except for jimver (not available yet) 2026-03-25 23:46:03 +08:00
Concedo
c00fe0af5a Merge commit '9f102a1407' into concedo_experimental
# Conflicts:
#	.devops/intel.Dockerfile
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/011-bug-results.yml
#	.github/pull_request_template.md
#	CODEOWNERS
#	README.md
#	common/CMakeLists.txt
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/binary-ops.c
#	ggml/src/ggml-hexagon/htp/hex-dma.c
#	ggml/src/ggml-hexagon/htp/hex-dma.h
#	ggml/src/ggml-hexagon/htp/hex-dump.h
#	ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-utils.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/ssm-conv.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	scripts/snapdragon/adb/run-bench.sh
#	scripts/sync_vendor.py
#	tests/test-backend-ops.cpp
#	tools/llama-bench/llama-bench.cpp
2026-03-25 23:45:41 +08:00
Concedo
39938e19d3 allow router mode to auto-wake other endpoints if put to sleep by auto unload 2026-03-25 23:17:20 +08:00
Concedo
8a6c41dc5c Merge commit '841bc203e2' into concedo_experimental
# Conflicts:
#	.github/workflows/ai-issues.yml
#	embd_res/templates/HuggingFaceTB-SmolLM3-3B.jinja
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-hip/CMakeLists.txt
#	ggml/src/ggml-musa/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	ggml/src/ggml-openvino/ggml-openvino.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-chat-auto-parser.cpp
#	tests/test-jinja.cpp
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/server/README.md
2026-03-25 22:49:53 +08:00
Concedo
c6213e9be6 Revert "Revert "llama : disable graph reuse with pipeline parallelism (#20463)""
This reverts commit 8043f35b22.
2026-03-25 22:25:20 +08:00
Concedo
b81103d6ba clean up colab a bit 2026-03-25 22:14:38 +08:00
Aman Gupta
9c600bcd4b
llama-bench: print -n-cpu-moe when offloaded layers > 1 (#20984)
Some checks failed
Copilot Setup Steps / copilot-setup-steps (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
2026-03-25 21:17:27 +08:00
Masato Nakasaka
b2704f9028
ci: Allow ninja to be used during unit test (#20742)
* Remove make dependency

* Added option to specify Ninja generator

* use ninja-build as default for several CI

* Revert "use ninja-build as default for several CI"

This reverts commit f552c4559b85e222aab37f654da764af4283fee7.

* changed use plain string rather than arrays

* Enabled ninja build by default for experimentation

* ci: add run.sh to test conditions to trigger GitHub CI and self-hosted runners

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* Enabled ninja build by default on self-hosted envs for experimentation

* ci: revert generator to ninja instead of ninja multi-config

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ci: install ninja-build for self-hosted workflows

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ci: revert ninja from self-hosted runners

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ci: missed one self-hosted step

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ci: fix windows ci errors from an errenous revert

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* Added explicit build types for Ninja

Also reverted some needless change

* ci: use ninja multi-config for vulkan-x64 build

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* added time command to measure build time

* Keeping some configs to use Ninja which show improvement

* minor fix based on review

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

* ci: rm `time` from custom containers

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-25 21:00:49 +08:00
Georgi Gerganov
3fab96cd04
ci : disable self-hosted mac jobs (#20985) 2026-03-25 14:46:40 +02:00
Xuan-Son Nguyen
914eb5ff0c
jinja: fix macro with kwargs (#20960)
* jinja: fix macro with kwargs

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix newline problem

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-25 12:22:48 +01:00
Francisco Herrera
8fc17493c3
gguf-split : clarify operation of gguf-split (#19749)
* clarify operation of gguf-split

so that you don't have to find out by trial and error

* formatting
2026-03-25 13:12:50 +02:00
Johannes Gäßler
36dafba5c4
llama: fix llama-model-saver (#20503)
* llama : add fd-based model loading via llama_model_load_from_fd

* llama : address review feedback for fd-based model loading

* llama : use FILE pointer instead of fd in public API

* llama : use FILE pointer consistently, address review feedback

* fixup

* fix tensor names

* fix llama-model-saver

* roundtrip tests

* fixup

* refactor tests

* fix prints

* fix model saving

* fix CI, disable Chameleon

* print seed

---------

Co-authored-by: Siddhesh2377 <siddheshsonar2377@gmail.com>
2026-03-25 12:53:16 +02:00
Aleksander Grygier
69e0ecef06
webui: Fix editing assistant message without branching (#20944)
* fix: Editing assistant response without branching

* chore: update webui build output
2026-03-25 12:47:33 +02:00
Pascal
062cca58fc
Add SLEEPING status to the WebUI model selector (#20949)
* webui: handle sleeping model status, fix favourite -> favorite

* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: fix optional event parameter in sleeping model onclick

* typo

* webui: restore orange sleeping indicator dot with hover unload

* chore: update webui build output

* webui: move stopPropagation into ActionIcon onclick, remove svelte-ignore

* chore: update webui build output

* webui: fix favourite -> favorite (UK -> US spelling) everywhere

Address review feedback from WhyNotHugo

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-25 11:02:32 +01:00
yikechayedan
406f4e3f61
android : fix-pointer-dangling (#20974) 2026-03-25 11:51:26 +02:00
Neo Zhang
53dc8b59bf
sycl : fix wrong variable check by assert (#20903)
* fix wrong variable check by assert

* use GGML api
2026-03-25 11:48:37 +02:00
Sigbjørn Skjæret
403c9c9cef
ci : bump gguf publish python version (#20982) 2026-03-25 11:04:59 +02:00
Sigbjørn Skjæret
8fc85db9d2
ci : limit requirements versions (#20980)
* set requests version

* limit versions outside requirements
2026-03-25 10:55:37 +02:00
Dowon
3a60d06ad9
convert : register Qwen3Model architecture (#20967) 2026-03-25 10:37:59 +02:00
Ravi Panchumarthy
abd86ef175
docs : Update OpenVINO backend docs (#20968)
* OpenVINO doc updates

* Update docs/backend/OPENVINO.md

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

---------

Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-25 10:33:51 +02:00
Concedo
24ab1c1451 upgrade musicui to do tts, show musicui for tts models (+1 squashed commits)
Squashed commits:

[975630b15] upgrade musicui to do tts
2026-03-25 00:24:44 +08:00
Concedo
efdc52fe8b q3tts custom voice support 2026-03-24 23:38:18 +08:00
Georgi Gerganov
9f102a1407
models : move the token embedding norms to the first layer (#20943)
* models : move the token embedding norms to the first layer

* cont : fix LLM_TENSOR_CONV1D + fix il indexing
2026-03-24 17:00:30 +02:00
Aman Gupta
3fc6f1aed1
ggml-backend: re-enable graph reuse with pipeline parallelism (#20927) 2026-03-24 20:47:00 +08:00
Alessandro de Oliveira Faria (A.K.A.CABELO)
29771a0a4c
vendor : update cpp-httplib to 0.39.0 (#20933) 2026-03-24 13:33:33 +01:00
Adrien Gallouët
42ebce3beb
common : fix get_gguf_split_info (#20946)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 13:33:14 +01:00
BlueMöhre
a94fdb090a
WebUI: fix edit msg form textarea height (#20830)
* autoresize textarea on mount

* allow textarea to grow to same height as rendered messages

* add UI build file
2026-03-24 13:17:45 +01:00
Adrien Gallouët
c9dc43333f
readme : clarify MODEL_ENDPOINT usage (#20941)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 10:35:07 +01:00
Adrien Gallouët
2d2d9c2062
common : add a WARNING for HF cache migration (#20935)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 09:24:39 +01:00
nuri
92080b4396
metal : add FLOOR, CEIL, ROUND, TRUNC unary ops (#20930)
Co-authored-by: nryoo <nryoo@nryooui-MacBookPro.local>
2026-03-24 10:13:07 +02:00