Concedo
2cdf02102e
preserve previous filename
2026-03-28 01:13:03 +08:00
Wagner Bruna
e3c6227d46
sd: report back image generation parameters and metadata ( #2062 )
...
* sd: refactor image generation result handling
* sd: report back image generation metadata
2026-03-28 00:49:03 +08:00
Concedo
0c2b679ea3
support bf16 quantkv cache type
2026-03-28 00:01:17 +08:00
Concedo
326542f480
rudimentary responses api, not usable yet
2026-03-27 23:38:08 +08:00
Concedo
81cebb6179
remove unused field
2026-03-27 22:52:36 +08:00
scottf007
f0818e1eae
Add socket timeout to is_port_in_use() to fix ~280s startup delay on WSL2 ( #2077 )
...
On WSL2 with networkingMode=mirrored, connect_ex() to non-listening ports
gets black-holed through the Windows host networking stack instead of
returning ECONNREFUSED. Without a timeout, TCP SYN retransmits with
exponential backoff (1+2+4+8+16+32+64 ≈ 127s per port), causing Router
Mode's port scan of 15001-15010 to stall for ~280 seconds on startup.
Adding a 1-second timeout makes connect_ex() fail fast, reducing startup
from ~303s to ~23s on affected systems.
Tested on WSL2 Ubuntu 24.04 with mirrored networking, KoboldCpp v1.110,
RTX 3090 Ti, Qwen3.5-27B Q4_K_M.
2026-03-27 22:50:59 +08:00
Concedo
a03998bed6
added jinja kwargs support
2026-03-27 00:28:59 +08:00
Concedo
c91f350ed5
increase max images, take images from the end instead of beginning if too many images
2026-03-26 23:03:52 +08:00
Concedo
4a5c903718
sd model model replacement logic: adjusted approach for easy merge
2026-03-26 21:57:42 +08:00
Concedo
25216a0793
update cuda toolkit to use node24 with a fork
2026-03-26 17:16:22 +08:00
Concedo
633222d2e3
fix tool builds
2026-03-26 15:15:58 +08:00
Concedo
9de6e0db8b
up version for github actions except for jimver (not available yet)
2026-03-25 23:46:03 +08:00
Concedo
c00fe0af5a
Merge commit ' 9f102a1407' into concedo_experimental
...
# Conflicts:
# .devops/intel.Dockerfile
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/pull_request_template.md
# CODEOWNERS
# README.md
# common/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/binary-ops.c
# ggml/src/ggml-hexagon/htp/hex-dma.c
# ggml/src/ggml-hexagon/htp/hex-dma.h
# ggml/src/ggml-hexagon/htp/hex-dump.h
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/ssm-conv.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# scripts/snapdragon/adb/run-bench.sh
# scripts/sync_vendor.py
# tests/test-backend-ops.cpp
# tools/llama-bench/llama-bench.cpp
2026-03-25 23:45:41 +08:00
Concedo
39938e19d3
allow router mode to auto-wake other endpoints if put to sleep by auto unload
2026-03-25 23:17:20 +08:00
Concedo
8a6c41dc5c
Merge commit ' 841bc203e2' into concedo_experimental
...
# Conflicts:
# .github/workflows/ai-issues.yml
# embd_res/templates/HuggingFaceTB-SmolLM3-3B.jinja
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-openvino/ggml-openvino.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-chat-auto-parser.cpp
# tests/test-jinja.cpp
# tools/cli/README.md
# tools/completion/README.md
# tools/server/README.md
2026-03-25 22:49:53 +08:00
Concedo
c6213e9be6
Revert "Revert "llama : disable graph reuse with pipeline parallelism ( #20463 )""
...
This reverts commit 8043f35b22 .
2026-03-25 22:25:20 +08:00
Concedo
b81103d6ba
clean up colab a bit
2026-03-25 22:14:38 +08:00
Concedo
24ab1c1451
upgrade musicui to do tts, show musicui for tts models (+1 squashed commits)
...
Squashed commits:
[975630b15] upgrade musicui to do tts
2026-03-25 00:24:44 +08:00
Concedo
efdc52fe8b
q3tts custom voice support
2026-03-24 23:38:18 +08:00
Georgi Gerganov
9f102a1407
models : move the token embedding norms to the first layer ( #20943 )
...
* models : move the token embedding norms to the first layer
* cont : fix LLM_TENSOR_CONV1D + fix il indexing
2026-03-24 17:00:30 +02:00
Aman Gupta
3fc6f1aed1
ggml-backend: re-enable graph reuse with pipeline parallelism ( #20927 )
2026-03-24 20:47:00 +08:00
Alessandro de Oliveira Faria (A.K.A.CABELO)
29771a0a4c
vendor : update cpp-httplib to 0.39.0 ( #20933 )
2026-03-24 13:33:33 +01:00
Adrien Gallouët
42ebce3beb
common : fix get_gguf_split_info ( #20946 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 13:33:14 +01:00
BlueMöhre
a94fdb090a
WebUI: fix edit msg form textarea height ( #20830 )
...
* autoresize textarea on mount
* allow textarea to grow to same height as rendered messages
* add UI build file
2026-03-24 13:17:45 +01:00
Adrien Gallouët
c9dc43333f
readme : clarify MODEL_ENDPOINT usage ( #20941 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 10:35:07 +01:00
Adrien Gallouët
2d2d9c2062
common : add a WARNING for HF cache migration ( #20935 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 09:24:39 +01:00
nuri
92080b4396
metal : add FLOOR, CEIL, ROUND, TRUNC unary ops ( #20930 )
...
Co-authored-by: nryoo <nryoo@nryooui-MacBookPro.local>
2026-03-24 10:13:07 +02:00
Georgi Gerganov
342d6125bc
metal : add FA instantiations for HSK=512, HSV=512 ( #20902 )
2026-03-24 10:03:09 +02:00
Aaron Teo
c2e224d829
issues: add openvino backends ( #20932 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2026-03-24 14:41:10 +08:00
Adrien Gallouët
8c7957ca33
common : add standard Hugging Face cache support ( #20775 )
...
* common : add standard Hugging Face cache support
- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check with the quant tag
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Improve error handling and report API errors
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Restore common_cached_model_info and align mmproj filtering
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Prefer main when getting cached ref
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use cached files when HF API fails
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use final_path..
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check all inputs
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 07:30:33 +01:00
Concedo
8437c346a7
fixed tts instruction regex, encapsulate thinking by default
2026-03-24 13:53:46 +08:00
Aman Gupta
e852eb4901
llama-fit: fix regex pattern for gate_up tensors ( #20910 )
...
* llama-fit: fix regex pattern for gate_up tensors
* Apply suggestions from code review
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-24 12:57:57 +08:00
Aldehir Rojas
312d870a89
common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss ( #20912 )
2026-03-23 22:21:47 -05:00
Max Krasnyansky
7cadbfce10
hexagon: general DMA and Binary Op fixes for large strides ( #20918 )
...
* hex-dma: make chained dma the default to handle newer models
This also includes some new instrumentation that we can remove later.
* hexagon: add uint32 dump helper
* hexagon: use single-page VTCM allocation to avoid issues with large gather ops in ssm-conv
ssm-conv uses HVX gather instruction and that instruction cannot handle cases where the base+offset
spans page boundaries.
* hexagon: update ssm-conv to make base-addr compute a bit easier to read
* hex-dma: use 1d mode for reshaping, it supports sizes up to 24-bits (>16MB)
* hex-bin: fix incorrect stride logic
* hexagon: make sure repack buffs are dumped for verbose > 2
* hex-bin: consistently use dma_queue_push even for dummy dst transactions
* hex-dma: start using 2d-wide mode on v75 and up
The removes the need to deal with the 16-bit limitaion for the strides.
* hex-bin: cleanup kernel selection logic
* hex-bin: cleanup binary op core and fix transposed tensor handling
* snapdragon: update run-bench to use larger ubatch and fa-on
2026-03-23 15:33:49 -07:00
Max Krasnyansky
1fb2290a51
Add codeowners for scripts/snapdragon and docs/snapdragon ( #20915 )
...
* Add codeowners for scripts/snapdragon
* Also add docs/backends/snapdragon
2026-03-23 14:57:18 -07:00
lhez
1772701f99
opencl: add q6_K gemm and gemv kernels for Adreno ( #20089 )
...
* opencl: add q6_K noshuffle kernels, initial q6_K gemv, some host code
* opencl: add q6_K transpose
* opencl: fix cvt kernel name
* opencl: add call to q6_K gemv
* opencl: fix q6_K scale transpose
* opencl: fix loading for gemv q6_K, refactor
* opencl: fix transpose_8_buf kernel assignment, refactor
* opencl: refactor q6_K transpose
* opencl: add gemm_noshuffle_q6_k_f32
* opencl: fix qh loading
* opencl: refactor q6_K gemv host side, release bufs and imgs
* opencl: refactor
* opencl: fix q6_K dequant and scale selection
* opencl: workaround compiler bug, fix dump_tensor
* opencl: refactor q6_K convert kernels
* opencl: unpack transformed q6_K in get_tensor
* opencl: refactor, handle non-uniform workgroups
* opencl: support non-vector subgroup bcast
2026-03-23 12:44:18 -07:00
las7
39bf0d3c6a
rpc : RCE patch ( #20908 )
2026-03-23 19:54:57 +02:00
Xuan-Son Nguyen
bd6992180b
contrib: add "Requirements" section to PR template ( #20841 )
...
* contrib: add "Requirements" section to PR template
* typo [no ci]
* use h2, add "Additional information"
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-03-23 16:59:02 +01:00
Davi Henrique Linhares
fd18364755
devops: upgraded default oneAPI version ( #20731 )
2026-03-23 21:47:34 +08:00
Concedo
9e9028b1a9
fixed cpu mis-selection
2026-03-23 21:30:57 +08:00
Aleksander Grygier
11fb11b901
webui: Improve chat form positioning ( #20901 )
2026-03-23 14:30:55 +01:00
Geo Maciolek
35b662bb5d
docs: Fix typo in reasoning flag documentation ( #20780 )
...
Tested to verify - the typo is just in the docs, not the actual flag.
2026-03-23 21:24:55 +08:00
Georgi Gerganov
f93c09e267
memory : fix seq_id bounds in llama_memory_recurrent::state_read_meta() ( #20887 )
2026-03-23 14:08:46 +02:00
Eric Zhang
841bc203e2
docs : rerun llama-gen-docs to include new CLI args ( #20892 )
2026-03-23 12:33:38 +01:00
Xuan-Son Nguyen
31a5cf4c3f
server: use httplib dynamic threads ( #20817 )
...
* server: use httplib dynamic threads
* change to n_threads_http + 1024
2026-03-23 12:22:46 +01:00
Georgi Gerganov
e32d243849
ai : update gh permissions ( #20895 )
2026-03-23 13:21:41 +02:00
Concedo
e7ffe718f0
updated lite
2026-03-23 19:01:02 +08:00
Concedo
0d50cafd8b
added CustomVoice support
2026-03-23 18:50:08 +08:00
Pascal
c44a932cf4
webui: fix --webui-config-file settings not applied on load ( #20823 )
...
* webui: fix --webui-config-file settings not applied on load
* chore: update webui build output
2026-03-23 11:25:35 +01:00
Wagner Bruna
abe55fa424
sd: fix metadata for generated images ( #2061 )
...
* sd: fix metadata for generated images
* sd: refactor output image conversion
2026-03-23 17:04:32 +08:00