Concedo
9ba8c7a661
fixed colab
2026-03-21 10:21:18 +08:00
Concedo
1225b1b155
add nothink autoguess
2026-03-20 21:21:06 +08:00
Concedo
2d349723d3
fixed colab
2026-03-20 18:19:59 +08:00
Gustavo Rocha Dias
8e045b33a1
fix - w64devkit vulkan build ( #2048 )
2026-03-20 16:37:22 +08:00
Concedo
d6aae073b6
fixed typo
2026-03-20 12:02:04 +08:00
Concedo
1f73eabb46
fixed colab
2026-03-20 11:45:10 +08:00
Concedo
c4b1a17e1a
tools debug
2026-03-19 23:13:02 +08:00
Concedo
699bc6b278
github actions updates for deprecation of nodejs 20 (+1 squashed commits)
...
Squashed commits:
[0ed5af384] checkout to v4
2026-03-19 14:39:43 +08:00
Concedo
2f63f94fd8
fix router nocertify mode
2026-03-19 12:45:19 +08:00
Concedo
8cf9ba34e9
fixed SSL in routermode
2026-03-19 12:43:11 +08:00
Concedo
48f914e374
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ci/run.sh
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/arch/riscv/repack.cpp
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/hvx-base.h
# ggml/src/ggml-hexagon/htp/hvx-exp.h
# ggml/src/ggml-hexagon/htp/hvx-sigmoid.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-hexagon/htp/softmax-ops.c
# ggml/src/ggml-hexagon/htp/unary-ops.c
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/sync-ggml.last
# tests/test-backend-sampler.cpp
# tests/test-chat.cpp
# tests/test-jinja.cpp
# tools/cli/cli.cpp
2026-03-19 02:23:06 +08:00
crsawyer
5744d7ec43
Rebuild index.html.gz ( #20724 )
2026-03-18 18:49:57 +01:00
Reese Levine
8ced5f41f9
Move to no timeout for WaitAny in graph submission to avoid deadlocks in some cases on llvm-pipe backends ( #20618 )
2026-03-18 10:23:47 -07:00
Shaw Nguyen
78d550b541
ggml-cpu/x86: fix unused changemask warning in repack ( #20692 )
2026-03-18 18:45:06 +02:00
Concedo
15e86010d8
autofit will clear moecpu and overridetensors
2026-03-18 21:20:57 +08:00
Georgi Gerganov
4efd326e71
sync : ggml
2026-03-18 15:17:28 +02:00
Georgi Gerganov
b08f7322ee
ggml : bump version to 0.9.8 (ggml/1442)
2026-03-18 15:17:28 +02:00
Georgi Gerganov
79187f2fb8
ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441)
2026-03-18 15:17:28 +02:00
Julien Chaumond
48e61238e1
webui: improve tooltip wording for attachment requirements ( #20688 )
...
* webui: improve tooltip wording for attachment requirements
Co-Authored-By: Claude <Agents+claude@huggingface.co>
* chore: update webui build output
* chore: update webui build output
---------
Co-authored-by: Claude <Agents+claude@huggingface.co>
2026-03-18 14:01:02 +01:00
Pop Flamingo
312cf03328
llama : re-enable manual LoRA adapter free ( #19983 )
...
* Re-enable manual LoRA adapter free
* Remove stale "all adapters must be loaded before context creation" stale comments
2026-03-18 12:03:26 +02:00
Masato Nakasaka
f4049ad735
tests : fix test-jinja-py Windows failures by bypassing command-line args [no ci] ( #20483 )
...
* Fix errors occurring on Windows
* Reverted fix
#20365 will take care of CRLF isue
* Changed to write to directly to stdin
* Prevent fclose to happen twice
2026-03-18 10:43:31 +01:00
Aldehir Rojas
5e8910a0db
common : rework gpt-oss parser ( #20393 )
...
* common : rework gpt-oss parser
* cont : fix gpt-oss tests
* cont : add structured output test
* cont : rename final to final_msg
2026-03-18 10:41:25 +01:00
Aaron Teo
fe00a84b4b
tests: enable kv_unified to prevent cuda oom error on rtx 2060 ( #20645 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2026-03-18 17:40:22 +08:00
Aleksander Grygier
7ab321d40d
webui: Fix duplicated messages on q param ( #20715 )
...
* fix: Remove duplicate message sending on `?q` param
* chore: update webui build output
2026-03-18 10:32:43 +01:00
uvos
7533a7d509
HIP : ignore return of hipMemAdvise [no ci] ( #20696 )
2026-03-18 09:53:13 +01:00
Andreas Obersteiner
a69d54f990
context : fix graph not resetting when control vector changes ( #20381 )
2026-03-18 08:10:13 +02:00
Concedo
f8aa711e5c
musicui seed randomizer
2026-03-18 11:28:46 +08:00
Concedo
f796878022
updated lite
2026-03-18 11:20:32 +08:00
Krishna Sridhar
cf23ee2447
hexagon: add neg, exp, sigmoid, softplus ops, cont, repeat ops ( #20701 )
...
Add element-wise unary ops needed by Qwen 3.5's DeltaNet linear
attention layers. These ops follow the existing unary-ops pattern
with VTCM DMA double-buffering.
- neg: negate via scale by -1.0
- exp: uses existing hvx_exp_f32 HVX intrinsics
- sigmoid: uses existing hvx_sigmoid_f32_aa HVX intrinsics
- softplus: log(1 + exp(x)) scalar fallback
- CONT reuses the existing CPY infrastructure since making a tensor
contiguous is equivalent to a same-type copy.
- REPEAT implements tiled memory copy with multi-threaded execution via
the worker pool, supporting f32 and f16 types. The kernel parallelizes
across output rows and uses memcpy for each tile.
Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
2026-03-17 15:34:36 -07:00
Ruben Ortlam
892e3c333a
vulkan: disable mmvq on Intel Windows driver ( #20672 )
...
* vulkan: disable mmvq on Intel Windows driver
* improve comment
2026-03-17 21:51:43 +01:00
Concedo
ded5486d52
Merge commit ' ab0bb93748' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-apple.yml
# .github/workflows/build-sanitize.yml
# .github/workflows/build-vulkan.yml
# .github/workflows/build.yml
# .github/workflows/copilot-setup-steps.yml
# .github/workflows/release.yml
2026-03-18 01:22:34 +08:00
Kevin Hannon
ee4801e5a6
ggml-blas: set mkl threads from thread context ( #20602 )
...
* ggml blas: set mkl threads from thread context
* add code to run blas locally
2026-03-18 01:16:49 +08:00
Piotr Wilkin (ilintar)
d2ecd2d1cf
common/parser: add --skip-chat-parsing to force a pure content parser. ( #20289 )
...
* Add `--force-pure-content` to force a pure content parser.
* Update common/arg.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Change parameter name [no ci]
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Taimur Ahmad
054d8b0f24
ggml-cpu: fix RVV checks in quants and repacking ( #20682 )
...
* ggml-cpu: refactor quants.c; add rvv check
* ggml-cpu: refactor; disable generic fallback
2026-03-17 16:03:40 +02:00
Sigbjørn Skjæret
ab0bb93748
ci : bump ccache [no ci] ( #20679 )
...
* bump ccache
* forgotten
* disable for s390x
* disable also for ppc64le
2026-03-17 14:54:31 +01:00
Ruben Ortlam
3a5cb629b1
vulkan: async and event fixes ( #20518 )
...
* vulkan: fix event wait submission, event command buffer reset
* fix event command buffer reset validation error
* also reset command buffers before reuse
* use timeline semaphores instead of fences for event_synchronize
* don't use initializer list for semaphore wait info
* use multiple events to avoid reset issues
* fix event reuse issue with multiple vectors
* add semaphore wait condition also if compute_ctx already exists
* remove event pending stage
2026-03-17 14:27:23 +01:00
Georgi Gerganov
8cc2d81264
server : fix ctx checkpoint invalidation ( #20671 )
2026-03-17 15:21:14 +02:00
Concedo
40f0c0555b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
2026-03-17 20:34:27 +08:00
Justin Bradford
627670601a
kleidiai : fix MUL_MAT support for batched (3D) inputs ( #20620 )
...
* kleidiai : fix MUL_MAT support for batched (3D) inputs
The supports_op() check incorrectly rejected MUL_MAT operations with 3D
inputs (ne[2] > 1), but the actual compute_forward_qx() implementation
handles batched inputs correctly via a loop over ne12.
This caused models with Q4_0/Q8_0 weights to crash during graph scheduling
when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during
loading (tested with 2D inputs) but the runtime used 3D inputs.
Also relax the buffer check to allow supports_op() to be called during
weight loading when src[0]->buffer is NULL.
Fixes #20608
* Kleidiai support_ops should only return true for 3D inputs, not also 4D
2026-03-17 14:03:54 +02:00
Concedo
8043f35b22
Revert "llama : disable graph reuse with pipeline parallelism ( #20463 )"
...
This reverts commit 57819b8d4b .
2026-03-17 18:51:14 +08:00
Ruben Ortlam
740a447fc3
vulkan: allow graphics queue only through env var ( #20599 )
...
* vulkan: avoid graphics queue on non-RADV AMD drivers
* avoid graphics queues on small GPUs
* change to only use graphics queue if overridden with env var GGML_VK_ALLOW_GRAPHICS_QUEUE
* reenable transfer queue if graphics queue is not used
2026-03-17 10:09:59 +01:00
Concedo
d85272a958
fixed wrong encoding (+1 squashed commits)
...
Squashed commits:
[a87d059a8] fixed wrong encoding
2026-03-17 15:54:54 +08:00
Concedo
e09ddc8fff
mcp fix (+1 squashed commits)
...
Squashed commits:
[c5a959a07] mcp fix
2026-03-17 15:45:05 +08:00
Concedo
837fe9d832
mcp stdio fixes
2026-03-17 15:34:05 +08:00
Concedo
39f9007d12
handle notifications in mcp
2026-03-17 15:13:42 +08:00
Concedo
f31b040941
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/labeler.yml
# .github/workflows/build-self-hosted.yml
# benches/nemotron/nemotron-dgx-spark.md
# docs/ops.md
# docs/ops/SYCL.csv
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
# tests/test-jinja.cpp
# tests/test-llama-archs.cpp
2026-03-17 14:05:23 +08:00
Concedo
6d3f01d139
compact css, fix .py variable name error
2026-03-17 11:11:46 +08:00
Concedo
9084527b36
Merge commit ' 67a2209fab' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-cache.yml
# .github/workflows/build-cross.yml
# .github/workflows/build-self-hosted.yml
# .github/workflows/build.yml
# .github/workflows/python-lint.yml
# .github/workflows/release.yml
# .github/workflows/server-self-hosted.yml
# .github/workflows/server-webui.yml
# .github/workflows/server.yml
# CODEOWNERS
# ggml/src/ggml-sycl/gated_delta_net.cpp
# scripts/sync_vendor.py
# tools/cli/cli.cpp
2026-03-17 11:11:25 +08:00
henk717
927d3c68bb
502 Loading page ( #2042 )
...
* Proper Loading page
* Loading page wording
* Different wording
2026-03-17 10:59:44 +08:00
Concedo
da4852a734
updated lite
2026-03-17 10:39:56 +08:00