Concedo
674b7f5eee
indicate support for claude messages api
2026-03-29 00:57:58 +08:00
Concedo
e3b7905e1c
added anthropic messages api support
2026-03-29 00:55:32 +08:00
Concedo
5ad9e3ee31
crude openai responses streaming
2026-03-29 00:16:30 +08:00
Concedo
94b266a6b0
musicui fix reset defaults
2026-03-28 21:09:40 +08:00
Concedo
1e787cd03a
improve responses api
2026-03-28 18:42:15 +08:00
Concedo
f768b2a4bd
whatever, i tried
2026-03-28 17:32:07 +08:00
Concedo
f80fdd4314
updated sdui
2026-03-28 11:24:03 +08:00
Concedo
547659fdbf
allow planning music with llm (+1 squashed commits)
...
Squashed commits:
[9a3bbf072] allow planning music with llm
2026-03-28 11:19:39 +08:00
Concedo
3ec6381123
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# .github/workflows/copilot-setup-steps.yml
# .github/workflows/gguf-publish.yml
# ci/run.sh
# docs/backend/OPENVINO.md
# examples/llama.android/lib/src/main/cpp/ai_chat.cpp
# ggml/src/ggml-sycl/add-id.cpp
# requirements/requirements-pydantic.txt
# tests/test-gguf.cpp
# tests/test-jinja.cpp
# tests/test-llama-archs.cpp
# tools/gguf-split/README.md
# tools/llama-bench/llama-bench.cpp
2026-03-28 01:18:20 +08:00
Concedo
2cdf02102e
preserve previous filename
2026-03-28 01:13:03 +08:00
Wagner Bruna
e3c6227d46
sd: report back image generation parameters and metadata ( #2062 )
...
* sd: refactor image generation result handling
* sd: report back image generation metadata
2026-03-28 00:49:03 +08:00
Concedo
0c2b679ea3
support bf16 quantkv cache type
2026-03-28 00:01:17 +08:00
Concedo
326542f480
rudimentary responses api, not usable yet
2026-03-27 23:38:08 +08:00
Concedo
81cebb6179
remove unused field
2026-03-27 22:52:36 +08:00
scottf007
f0818e1eae
Add socket timeout to is_port_in_use() to fix ~280s startup delay on WSL2 ( #2077 )
...
On WSL2 with networkingMode=mirrored, connect_ex() to non-listening ports
gets black-holed through the Windows host networking stack instead of
returning ECONNREFUSED. Without a timeout, TCP SYN retransmits with
exponential backoff (1+2+4+8+16+32+64 ≈ 127s per port), causing Router
Mode's port scan of 15001-15010 to stall for ~280 seconds on startup.
Adding a 1-second timeout makes connect_ex() fail fast, reducing startup
from ~303s to ~23s on affected systems.
Tested on WSL2 Ubuntu 24.04 with mirrored networking, KoboldCpp v1.110,
RTX 3090 Ti, Qwen3.5-27B Q4_K_M.
2026-03-27 22:50:59 +08:00
Concedo
a03998bed6
added jinja kwargs support
2026-03-27 00:28:59 +08:00
Concedo
c91f350ed5
increase max images, take images from the end instead of beginning if too many images
2026-03-26 23:03:52 +08:00
Concedo
4a5c903718
sd model model replacement logic: adjusted approach for easy merge
2026-03-26 21:57:42 +08:00
Concedo
25216a0793
update cuda toolkit to use node24 with a fork
2026-03-26 17:16:22 +08:00
Concedo
633222d2e3
fix tool builds
2026-03-26 15:15:58 +08:00
Concedo
9de6e0db8b
up version for github actions except for jimver (not available yet)
2026-03-25 23:46:03 +08:00
Concedo
c00fe0af5a
Merge commit ' 9f102a1407' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/pull_request_template.md
# CODEOWNERS
# README.md
# common/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/hex-dump.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/ssm-conv.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/snapdragon/adb/run-bench.sh
# scripts/sync_vendor.py
# tests/test-backend-ops.cpp
# tools/llama-bench/llama-bench.cpp
2026-03-25 23:45:41 +08:00
Concedo
39938e19d3
allow router mode to auto-wake other endpoints if put to sleep by auto unload
2026-03-25 23:17:20 +08:00
Concedo
8a6c41dc5c
Merge commit ' 841bc203e2' into concedo_experimental
...
# Conflicts:
# .github/workflows/ai-issues.yml
# embd_res/templates/HuggingFaceTB-SmolLM3-3B.jinja
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-openvino/ggml-openvino.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-jinja.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-03-25 22:49:53 +08:00
Concedo
c6213e9be6
Revert "Revert "llama : disable graph reuse with pipeline parallelism ( #20463 )""
...
This reverts commit 8043f35b22 .
2026-03-25 22:25:20 +08:00
Concedo
b81103d6ba
clean up colab a bit
2026-03-25 22:14:38 +08:00
Aman Gupta
9c600bcd4b
llama-bench: print -n-cpu-moe when offloaded layers > 1 ( #20984 )
Copilot Setup Steps / copilot-setup-steps (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
2026-03-25 21:17:27 +08:00
Masato Nakasaka
b2704f9028
ci: Allow ninja to be used during unit test ( #20742 )
...
* Remove make dependency
* Added option to specify Ninja generator
* use ninja-build as default for several CI
* Revert "use ninja-build as default for several CI"
This reverts commit f552c4559b85e222aab37f654da764af4283fee7.
* changed use plain string rather than arrays
* Enabled ninja build by default for experimentation
* ci: add run.sh to test conditions to trigger GitHub CI and self-hosted runners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Enabled ninja build by default on self-hosted envs for experimentation
* ci: revert generator to ninja instead of ninja multi-config
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ci: install ninja-build for self-hosted workflows
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ci: revert ninja from self-hosted runners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ci: missed one self-hosted step
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ci: fix windows ci errors from an errenous revert
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Added explicit build types for Ninja
Also reverted some needless change
* ci: use ninja multi-config for vulkan-x64 build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* added time command to measure build time
* Keeping some configs to use Ninja which show improvement
* minor fix based on review
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* ci: rm `time` from custom containers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-25 21:00:49 +08:00
Georgi Gerganov
3fab96cd04
ci : disable self-hosted mac jobs ( #20985 )
2026-03-25 14:46:40 +02:00
Xuan-Son Nguyen
914eb5ff0c
jinja: fix macro with kwargs ( #20960 )
...
* jinja: fix macro with kwargs
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* fix newline problem
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-25 12:22:48 +01:00
Francisco Herrera
8fc17493c3
gguf-split : clarify operation of gguf-split ( #19749 )
...
* clarify operation of gguf-split
so that you don't have to find out by trial and error
* formatting
2026-03-25 13:12:50 +02:00
Johannes Gäßler
36dafba5c4
llama: fix llama-model-saver ( #20503 )
...
* llama : add fd-based model loading via llama_model_load_from_fd
* llama : address review feedback for fd-based model loading
* llama : use FILE pointer instead of fd in public API
* llama : use FILE pointer consistently, address review feedback
* fixup
* fix tensor names
* fix llama-model-saver
* roundtrip tests
* fixup
* refactor tests
* fix prints
* fix model saving
* fix CI, disable Chameleon
* print seed
---------
Co-authored-by: Siddhesh2377 <siddheshsonar2377@gmail.com>
2026-03-25 12:53:16 +02:00
Aleksander Grygier
69e0ecef06
webui: Fix editing assistant message without branching ( #20944 )
...
* fix: Editing assistant response without branching
* chore: update webui build output
2026-03-25 12:47:33 +02:00
Pascal
062cca58fc
Add SLEEPING status to the WebUI model selector ( #20949 )
...
* webui: handle sleeping model status, fix favourite -> favorite
* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* webui: fix optional event parameter in sleeping model onclick
* typo
* webui: restore orange sleeping indicator dot with hover unload
* chore: update webui build output
* webui: move stopPropagation into ActionIcon onclick, remove svelte-ignore
* chore: update webui build output
* webui: fix favourite -> favorite (UK -> US spelling) everywhere
Address review feedback from WhyNotHugo
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-25 11:02:32 +01:00
yikechayedan
406f4e3f61
android : fix-pointer-dangling ( #20974 )
2026-03-25 11:51:26 +02:00
Neo Zhang
53dc8b59bf
sycl : fix wrong variable check by assert ( #20903 )
...
* fix wrong variable check by assert
* use GGML api
2026-03-25 11:48:37 +02:00
Sigbjørn Skjæret
403c9c9cef
ci : bump gguf publish python version ( #20982 )
2026-03-25 11:04:59 +02:00
Sigbjørn Skjæret
8fc85db9d2
ci : limit requirements versions ( #20980 )
...
* set requests version
* limit versions outside requirements
2026-03-25 10:55:37 +02:00
Dowon
3a60d06ad9
convert : register Qwen3Model architecture ( #20967 )
2026-03-25 10:37:59 +02:00
Ravi Panchumarthy
abd86ef175
docs : Update OpenVINO backend docs ( #20968 )
...
* OpenVINO doc updates
* Update docs/backend/OPENVINO.md
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
---------
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-25 10:33:51 +02:00
Concedo
24ab1c1451
upgrade musicui to do tts, show musicui for tts models (+1 squashed commits)
...
Squashed commits:
[975630b15] upgrade musicui to do tts
2026-03-25 00:24:44 +08:00
Concedo
efdc52fe8b
q3tts custom voice support
2026-03-24 23:38:18 +08:00
Georgi Gerganov
9f102a1407
models : move the token embedding norms to the first layer ( #20943 )
...
* models : move the token embedding norms to the first layer
* cont : fix LLM_TENSOR_CONV1D + fix il indexing
2026-03-24 17:00:30 +02:00
Aman Gupta
3fc6f1aed1
ggml-backend: re-enable graph reuse with pipeline parallelism ( #20927 )
2026-03-24 20:47:00 +08:00
Alessandro de Oliveira Faria (A.K.A.CABELO)
29771a0a4c
vendor : update cpp-httplib to 0.39.0 ( #20933 )
2026-03-24 13:33:33 +01:00
Adrien Gallouët
42ebce3beb
common : fix get_gguf_split_info ( #20946 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 13:33:14 +01:00
BlueMöhre
a94fdb090a
WebUI: fix edit msg form textarea height ( #20830 )
...
* autoresize textarea on mount
* allow textarea to grow to same height as rendered messages
* add UI build file
2026-03-24 13:17:45 +01:00
Adrien Gallouët
c9dc43333f
readme : clarify MODEL_ENDPOINT usage ( #20941 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 10:35:07 +01:00
Adrien Gallouët
2d2d9c2062
common : add a WARNING for HF cache migration ( #20935 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 09:24:39 +01:00
nuri
92080b4396
metal : add FLOOR, CEIL, ROUND, TRUNC unary ops ( #20930 )
...
Co-authored-by: nryoo <nryoo@nryooui-MacBookPro.local>
2026-03-24 10:13:07 +02:00