Commit graph

12252 commits

Author SHA1 Message Date
Concedo
9ba8c7a661 fixed colab 2026-03-21 10:21:18 +08:00
Concedo
1225b1b155 add nothink autoguess 2026-03-20 21:21:06 +08:00
Concedo
2d349723d3 fixed colab 2026-03-20 18:19:59 +08:00
Gustavo Rocha Dias
8e045b33a1
fix - w64devkit vulkan build (#2048) 2026-03-20 16:37:22 +08:00
Concedo
d6aae073b6 fixed typo 2026-03-20 12:02:04 +08:00
Concedo
1f73eabb46 fixed colab 2026-03-20 11:45:10 +08:00
Concedo
c4b1a17e1a tools debug 2026-03-19 23:13:02 +08:00
Concedo
699bc6b278 github actions updates for deprecation of nodejs 20 (+1 squashed commits)
Squashed commits:

[0ed5af384] checkout to v4
2026-03-19 14:39:43 +08:00
Concedo
2f63f94fd8 fix router nocertify mode 2026-03-19 12:45:19 +08:00
Concedo
8cf9ba34e9 fixed SSL in routermode 2026-03-19 12:43:11 +08:00
Concedo
48f914e374 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ci/run.sh
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cpu/arch/riscv/repack.cpp
#	ggml/src/ggml-cpu/arch/x86/repack.cpp
#	ggml/src/ggml-cpu/repack.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-exp.h
#	ggml/src/ggml-hexagon/htp/hvx-sigmoid.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/sync-ggml.last
#	tests/test-backend-sampler.cpp
#	tests/test-chat.cpp
#	tests/test-jinja.cpp
#	tools/cli/cli.cpp
2026-03-19 02:23:06 +08:00
crsawyer
5744d7ec43
Rebuild index.html.gz (#20724) 2026-03-18 18:49:57 +01:00
Reese Levine
8ced5f41f9
Move to no timeout for WaitAny in graph submission to avoid deadlocks in some cases on llvm-pipe backends (#20618) 2026-03-18 10:23:47 -07:00
Shaw Nguyen
78d550b541
ggml-cpu/x86: fix unused changemask warning in repack (#20692) 2026-03-18 18:45:06 +02:00
Concedo
15e86010d8 autofit will clear moecpu and overridetensors 2026-03-18 21:20:57 +08:00
Georgi Gerganov
4efd326e71 sync : ggml 2026-03-18 15:17:28 +02:00
Georgi Gerganov
b08f7322ee ggml : bump version to 0.9.8 (ggml/1442) 2026-03-18 15:17:28 +02:00
Georgi Gerganov
79187f2fb8 ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441) 2026-03-18 15:17:28 +02:00
Julien Chaumond
48e61238e1
webui: improve tooltip wording for attachment requirements (#20688)
* webui: improve tooltip wording for attachment requirements

Co-Authored-By: Claude <Agents+claude@huggingface.co>

* chore: update webui build output

* chore: update webui build output

---------

Co-authored-by: Claude <Agents+claude@huggingface.co>
2026-03-18 14:01:02 +01:00
Pop Flamingo
312cf03328
llama : re-enable manual LoRA adapter free (#19983)
* Re-enable manual LoRA adapter free

* Remove stale "all adapters must be loaded before context creation" stale comments
2026-03-18 12:03:26 +02:00
Masato Nakasaka
f4049ad735
tests : fix test-jinja-py Windows failures by bypassing command-line args [no ci] (#20483)
* Fix errors occurring on Windows

* Reverted fix

#20365 will take care of CRLF isue

* Changed to write to directly to stdin

* Prevent fclose to happen twice
2026-03-18 10:43:31 +01:00
Aldehir Rojas
5e8910a0db
common : rework gpt-oss parser (#20393)
* common : rework gpt-oss parser

* cont : fix gpt-oss tests

* cont : add structured output test

* cont : rename final to final_msg
2026-03-18 10:41:25 +01:00
Aaron Teo
fe00a84b4b
tests: enable kv_unified to prevent cuda oom error on rtx 2060 (#20645)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2026-03-18 17:40:22 +08:00
Aleksander Grygier
7ab321d40d
webui: Fix duplicated messages on q param (#20715)
* fix: Remove duplicate message sending on `?q` param

* chore: update webui build output
2026-03-18 10:32:43 +01:00
uvos
7533a7d509
HIP : ignore return of hipMemAdvise [no ci] (#20696) 2026-03-18 09:53:13 +01:00
Andreas Obersteiner
a69d54f990
context : fix graph not resetting when control vector changes (#20381) 2026-03-18 08:10:13 +02:00
Concedo
f8aa711e5c musicui seed randomizer 2026-03-18 11:28:46 +08:00
Concedo
f796878022 updated lite 2026-03-18 11:20:32 +08:00
Krishna Sridhar
cf23ee2447
hexagon: add neg, exp, sigmoid, softplus ops, cont, repeat ops (#20701)
Add element-wise unary ops needed by Qwen 3.5's DeltaNet linear
attention layers. These ops follow the existing unary-ops pattern
with VTCM DMA double-buffering.

- neg: negate via scale by -1.0
- exp: uses existing hvx_exp_f32 HVX intrinsics
- sigmoid: uses existing hvx_sigmoid_f32_aa HVX intrinsics
- softplus: log(1 + exp(x)) scalar fallback
- CONT reuses the existing CPY infrastructure since making a tensor
  contiguous is equivalent to a same-type copy.
- REPEAT implements tiled memory copy with multi-threaded execution via
  the worker pool, supporting f32 and f16 types. The kernel parallelizes
  across output rows and uses memcpy for each tile.

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
2026-03-17 15:34:36 -07:00
Ruben Ortlam
892e3c333a
vulkan: disable mmvq on Intel Windows driver (#20672)
* vulkan: disable mmvq on Intel Windows driver

* improve comment
2026-03-17 21:51:43 +01:00
Concedo
ded5486d52 Merge commit 'ab0bb93748' into concedo_experimental
# Conflicts:
#	.github/workflows/build-apple.yml
#	.github/workflows/build-sanitize.yml
#	.github/workflows/build-vulkan.yml
#	.github/workflows/build.yml
#	.github/workflows/copilot-setup-steps.yml
#	.github/workflows/release.yml
2026-03-18 01:22:34 +08:00
Kevin Hannon
ee4801e5a6
ggml-blas: set mkl threads from thread context (#20602)
* ggml blas: set mkl threads from thread context

* add code to run blas locally
2026-03-18 01:16:49 +08:00
Piotr Wilkin (ilintar)
d2ecd2d1cf
common/parser: add --skip-chat-parsing to force a pure content parser. (#20289)
* Add `--force-pure-content` to force a pure content parser.

* Update common/arg.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Change parameter name [no ci]

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Taimur Ahmad
054d8b0f24
ggml-cpu: fix RVV checks in quants and repacking (#20682)
* ggml-cpu: refactor quants.c; add rvv check

* ggml-cpu: refactor; disable generic fallback
2026-03-17 16:03:40 +02:00
Sigbjørn Skjæret
ab0bb93748
ci : bump ccache [no ci] (#20679)
* bump ccache

* forgotten

* disable for s390x

* disable also for ppc64le
2026-03-17 14:54:31 +01:00
Ruben Ortlam
3a5cb629b1
vulkan: async and event fixes (#20518)
* vulkan: fix event wait submission, event command buffer reset

* fix event command buffer reset validation error

* also reset command buffers before reuse

* use timeline semaphores instead of fences for event_synchronize

* don't use initializer list for semaphore wait info

* use multiple events to avoid reset issues

* fix event reuse issue with multiple vectors

* add semaphore wait condition also if compute_ctx already exists

* remove event pending stage
2026-03-17 14:27:23 +01:00
Georgi Gerganov
8cc2d81264
server : fix ctx checkpoint invalidation (#20671) 2026-03-17 15:21:14 +02:00
Concedo
40f0c0555b Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
2026-03-17 20:34:27 +08:00
Justin Bradford
627670601a
kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620)
* kleidiai : fix MUL_MAT support for batched (3D) inputs

The supports_op() check incorrectly rejected MUL_MAT operations with 3D
inputs (ne[2] > 1), but the actual compute_forward_qx() implementation
handles batched inputs correctly via a loop over ne12.

This caused models with Q4_0/Q8_0 weights to crash during graph scheduling
when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during
loading (tested with 2D inputs) but the runtime used 3D inputs.

Also relax the buffer check to allow supports_op() to be called during
weight loading when src[0]->buffer is NULL.

Fixes #20608

* Kleidiai support_ops should only return true for 3D inputs, not also 4D
2026-03-17 14:03:54 +02:00
Concedo
8043f35b22 Revert "llama : disable graph reuse with pipeline parallelism (#20463)"
This reverts commit 57819b8d4b.
2026-03-17 18:51:14 +08:00
Ruben Ortlam
740a447fc3
vulkan: allow graphics queue only through env var (#20599)
* vulkan: avoid graphics queue on non-RADV AMD drivers

* avoid graphics queues on small GPUs

* change to only use graphics queue if overridden with env var GGML_VK_ALLOW_GRAPHICS_QUEUE

* reenable transfer queue if graphics queue is not used
2026-03-17 10:09:59 +01:00
Concedo
d85272a958 fixed wrong encoding (+1 squashed commits)
Squashed commits:

[a87d059a8] fixed wrong encoding
2026-03-17 15:54:54 +08:00
Concedo
e09ddc8fff mcp fix (+1 squashed commits)
Squashed commits:

[c5a959a07] mcp fix
2026-03-17 15:45:05 +08:00
Concedo
837fe9d832 mcp stdio fixes 2026-03-17 15:34:05 +08:00
Concedo
39f9007d12 handle notifications in mcp 2026-03-17 15:13:42 +08:00
Concedo
f31b040941 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	.github/workflows/build-self-hosted.yml
#	benches/nemotron/nemotron-dgx-spark.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
#	ggml/src/ggml-sycl/backend.hpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/sync-ggml.last
#	tests/test-jinja.cpp
#	tests/test-llama-archs.cpp
2026-03-17 14:05:23 +08:00
Concedo
6d3f01d139 compact css, fix .py variable name error 2026-03-17 11:11:46 +08:00
Concedo
9084527b36 Merge commit '67a2209fab' into concedo_experimental
# Conflicts:
#	.github/workflows/build-cache.yml
#	.github/workflows/build-cross.yml
#	.github/workflows/build-self-hosted.yml
#	.github/workflows/build.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/release.yml
#	.github/workflows/server-self-hosted.yml
#	.github/workflows/server-webui.yml
#	.github/workflows/server.yml
#	CODEOWNERS
#	ggml/src/ggml-sycl/gated_delta_net.cpp
#	scripts/sync_vendor.py
#	tools/cli/cli.cpp
2026-03-17 11:11:25 +08:00
henk717
927d3c68bb
502 Loading page (#2042)
* Proper Loading page

* Loading page wording

* Different wording
2026-03-17 10:59:44 +08:00
Concedo
da4852a734 updated lite 2026-03-17 10:39:56 +08:00