Commit graph

9813 commits

Author SHA1 Message Date
Concedo
bb06956b2d allow wan to use img2img via init image 2025-10-04 11:25:46 +08:00
Concedo
db37688b47 qwen image disable VAE tiling as it's broken 2025-10-04 11:19:19 +08:00
henk717
118e589743
ROCm 7 CI (#1752)
* Bump ROCm

* Container experiment

* Can 7.0 compile it on its own?

* Clean the env before pulling docker

* Cleanup attempt 2

* Fix cleanup test 2

* Bing attempts to save ROCm users

* CI binary location fix attempt

* Attempt to fix Docker env vars (make it compile rocm again)

* Update kcpp-build-release-linux-rocm.yaml

* Less fancy ROCm spelling
2025-10-04 09:11:12 +08:00
Concedo
9df2a02c4c preserve rocm6 ci 2025-10-04 09:10:41 +08:00
Concedo
7447a362d1 hide a debug print 2025-10-03 20:56:57 +08:00
Concedo
15249baea1 apply jeffbolz f32 patch https://github.com/leejet/stable-diffusion.cpp/pull/851#issuecomment-3335515302 2025-10-03 19:18:46 +08:00
Concedo
f282362414 added qwen image support (+1 squashed commits)
Squashed commits:

[92df28061] added qwen image support (+1 squashed commits)

Squashed commits:

[1485c71ed] wip adding qwen image
2025-10-03 18:58:48 +08:00
Concedo
e706d33367 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/intel.Dockerfile
#	.devops/rocm.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	CODEOWNERS
#	ci/run.sh
#	docs/backend/SYCL.md
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cuda/fattn-wmma-f16.cu
#	ggml/src/ggml-hip/CMakeLists.txt
#	ggml/src/ggml-musa/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/rms_norm.wgsl
#	tests/test-backend-ops.cpp
#	tests/test-barrier.cpp
#	tests/test-chat.cpp
2025-10-03 16:44:33 +08:00
Concedo
85b46f35e0 cleanup 2025-10-03 16:18:02 +08:00
Concedo
1731a3212c Merge commit 'ded67b9444' into concedo_experimental
# Conflicts:
#	.devops/rocm.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	.github/workflows/release.yml
#	CODEOWNERS
#	common/CMakeLists.txt
#	common/arg.cpp
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/get_rows.cl
#	ggml/src/ggml-opencl/kernels/pad.cl
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
#	tests/test-arg-parser.cpp
#	tests/test-backend-ops.cpp
#	tools/run/run.cpp
2025-10-03 16:15:27 +08:00
Aleksander Grygier
136bda78c5
webui : Fix messages payload sent to chat completions (#16402)
Some checks failed
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* fix: Include just the currently active message branches instead of all in chat completions request

* chore: Build webui static output

* chore: Formatting

* chore: update webui build output
2025-10-03 10:11:34 +03:00
Concedo
4f8f0e5949 move embeds into their own dir, detach sd vocab into separate files 2025-10-03 14:21:09 +08:00
Pascal
5113efd34c
fix: track viewportHeight via window.innerHeight to avoid unwanted scrolling (#16356)
Use <svelte:window bind:innerHeight> instead of manual resize listener

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-10-03 08:01:31 +02:00
Sigbjørn Skjæret
d64c8104f0
test-barrier : do not use more threads than physically available (#16389)
* do not use more threads than physically available

* ensure n_threads > 0

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>

---------

Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-10-02 20:10:12 +02:00
Reese Levine
ef07a40906
ggml webgpu: add support for soft_max, optimize rms_norm (#16357)
* Add inplace softmax

* Move rms_norm to split row approach

* Update debug for supports_op

* clean up debug statements

* Update tests/test-backend-ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-10-02 11:00:31 -07:00
Piotr Wilkin (ilintar)
34fcc5a4ac
model : Apertus model implementation (#15852)
* First attempt

* No permute during convert (fixes qk tensors), proper norm application.

* RoPE = NeoX

* Coherence!

* Migrate xielu params from tensors to hyperparameters

* Simple CUDA kernel

* Revert stupid LLM refactorings

* Chat template support

* configchecker / flake8 errors

* Reorder unary.cu

* I do conclude that LLMs are, in fact, stupid.

* Fix after merge

* Final newline

* Make xIELU an UNARY_OP

* Final newline

* Correctly account for parameter shift

* Argh.

* Update ggml/src/ggml-cpu/unary-ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Refactor: remove unused methods, inline and factorize softplus, add const modifiers

* Revert CUDA changes, implement xIELU as a separate OP

* Pesky newline

* Add float2half / half2float for F16 inputs/outputs

* CUDA variants, attempt 2

* Actually, attempt 3

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Missing convert header

* Proper formula and reference for xIELU in the comments.

* Modify unary-ops.cpp to add the functor-based logic besides the template system to retain optimizations

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Add tensor mappings for Apertus to global list instead

* Fix lazy on scalars

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Add comment about the constraints on positive/negative alpha

* Change `softplus` to `ggml_softplus`

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-02 20:43:22 +03:00
Concedo
c00ae93421 makefile fix vulkan noext compile (+1 squashed commits)
Squashed commits:

[eae88fd49] makefile fix vulkan noext compile
2025-10-02 23:19:45 +08:00
R0CKSTAR
91a2a56556
musa: update compile flags (#16265)
Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
2025-10-02 16:29:56 +03:00
Concedo
f2ad0b78d4 refactor of added images done 2025-10-02 21:22:35 +08:00
Sigbjørn Skjæret
72ee736c44
ci : fix ubuntu-latest-cmake-rpc (disable ccache) (#16388) 2025-10-02 13:51:36 +02:00
Concedo
df87da4694 wip refactor ref image bufs 2025-10-02 16:37:17 +08:00
Eve
f09aefaa84
ci: update vulkan ci (#16294) 2025-10-02 10:10:07 +02:00
Georgi Gerganov
bbd32bc038
ci : fix clean-up of old logs (#16381) 2025-10-02 10:35:43 +03:00
Neo Zhang Jianyu
2be72c2b12
SYCL: Update to oneAPI 2025.2 (#16371)
* update oneapi to 2025.2, use deep-learning-essentials to replace base-tool

* update to 2025.2 use deeplearn essi to replace base toolkit

* add missed dll

* add deep learning essentials

* add sycl-ls

---------

Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>
2025-10-02 10:16:25 +03:00
uvos
95ce098544
HIP: add IMbackK to codeowner (#16375) 2025-10-02 05:52:59 +02:00
Concedo
539db70eac clip to cpu by default 2025-10-02 10:56:51 +08:00
Wagner Bruna
ac6be8ab8c
sd: do not force T5 on CPU anymore (#1769)
We now have the clip_cpu config parameter for that. Todo: Will make clip cpu on by default
2025-10-02 10:48:16 +08:00
Concedo
4587ccb71a prepare to refactor reference image 2025-10-02 10:41:29 +08:00
uvos
c8dedc9999
CI: reenable cdna in rocm docker builds (#16376) 2025-10-01 23:32:39 +02:00
uvos
e95fec640f
HIP: Disable ROCWMMA fattn on CDNA when compiled against ROCWMMA 2.0.0 (#16221)
* HIP: Disable ROCWMMA fatt on CDNA when compiled against ROCWMMA 2.0.0

rocwmma 2.0.0 includes a bug in the code fakeing fp16 accumulation on CDNA

* CUDA: Fix volta condition in ggml_cuda_should_use_wmma_fattn
2025-10-01 23:09:25 +02:00
Shunta Saito
ded67b9444
llama : parameter conversion and loading fixes for PLaMo2 variants (#16075)
* Fix to use hidden_size_per_head

* Fix num heads

* Fix array

* Fix loading weights

* Support old GGUF converted by the previous version of llama.cpp

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Move shared parameter definitions to the outside of loop

* Not calculating n_embd_head_k,v by n_embd / n_head

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-01 23:08:15 +02:00
uvos
1fe4e38cc2
ci: Properly install rocwmma for hip builds (#16305)
* CI: Properly install rocwmma for hip builds

on windows we now windows install rocwmma from ubuntu pacakges

* CI: update linux rocm docker build to use rocm 7.0
2025-10-01 20:18:03 +02:00
Adrien Gallouët
4201deae9c
common: introduce http.h for httplib-based client (#16373)
* common: introduce http.h for httplib-based client

This change moves cpp-httplib based URL parsing and client setup into
a new header `common/http.h`, and integrates it in `arg.cpp` and `run.cpp`.

It is an iteration towards removing libcurl, while intentionally
minimizing changes to existing code to guarantee the same behavior when
`LLAMA_CURL` is used.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* tools : add missing WIN32_LEAN_AND_MEAN

Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2025-10-01 20:22:18 +03:00
Aleksander Grygier
764799279f
Conversation action dialogs as singletons from Chat Sidebar + apply conditional rendering for Actions Dropdown for Chat Conversation Items (#16369)
* fix: Render Conversation action dialogs as singletons from Chat Sidebar level

* chore: update webui build output

* fix: Render Actions Dropdown conditionally only when user hovers conversation item + remove unused markup

* chore: Update webui static build

* fix: Always truncate conversation names

* chore: Update webui static build
2025-10-01 18:18:10 +02:00
Concedo
e4c40405fb update lite 2025-10-01 23:23:10 +08:00
Aleksander Grygier
2a9b63383a
Improve code block color theming (#16325)
* feat: Improve code block theming

* chore: update webui build output

* chore: Update webui static build
2025-10-01 15:54:42 +02:00
Sigbjørn Skjæret
1104ca1a1c
ci : use registry cache for docker builds (#16366) 2025-10-01 14:09:52 +02:00
Aleksander Grygier
4f1575921c
Add optional setting for showing "Model used:" information (#16337)
* feat: Add a setting to include model name used to generate the message

* feat: UI improvements

* feat: Save model info along with the database message entry creation

* chore: Build webui static output
2025-10-01 12:08:16 +02:00
Concedo
e49ac6b120 allow clip_vision to be loaded via clip_l or clip_g param 2025-10-01 17:57:49 +08:00
Concedo
2fc31d36c0 gif mime type for animated images 2025-10-01 17:18:00 +08:00
Eve
132d673554
vulkan: make ggml_vk_default_dispatcher support older vulkan headers (#16345)
* make ggml_vk_default_dispatcher support older vulkan headers

* simpilfy with using
2025-10-01 09:56:36 +02:00
Aleksander Grygier
aa9538a63a
webui: Remove running llama-server within WebUI dev.sh script (#16363) 2025-10-01 08:40:26 +03:00
Bartowski
e74c92e842
model : support GLM 4.6 (make a few NextN/MTP tensors not required) (#16359)
* Make a few GLM tensors not required

layer.nextn.shared_head_head and layer.nextn.embed_tokens are both excluded from GLM 4.6 resulting in the model not loading after conversion/quantization, this marks those tensors as not required which makes it work

* Update llama-model.cpp

layer.nextn.shared_head_norm also not required in case of future models
2025-09-30 22:24:36 +02:00
Sigbjørn Skjæret
b2ba81dbe0
ci : fix ccache key for ubuntu-cpu-cmake (#16355)
* fix ccache key for ubuntu-cpu-cmake

* set it for release as well [no ci]
2025-09-30 21:41:42 +02:00
Adrien Gallouët
bf6f3b3a19
common : disable progress bar without a tty (#16352)
* common : disable progress bar without a tty

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Add missing headers

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-30 20:52:41 +03:00
lhez
7c156df414
opencl: support pad_ext (#15888) 2025-09-30 10:45:45 -07:00
Pascal
16b0ca0d2e
Chatapi ignore empty sampling (#16330)
* fix: skip empty sampling fields instead of coercing to 0 in chat API options

* chore: update webui build output
2025-09-30 19:18:54 +02:00
Reese Levine
8d78cd2613
ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)
* Work on rope

* Simplify inplace operation generation and combine mul/add generation

* Work on rope variants

* implement neox rope

* rope complete

* Add sub,div,glu operators

* implement scale op

* Update cpy shader to handle cont/more types

* formatting

* Update test vars printing for rope,rms_norm

* Avoid ROPE hardcoded constants

* Add TODO to change ROPE constants to enum

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix TODO comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-09-30 09:57:51 -07:00
lhez
d1c84a662d
opencl: support ne3 in get_rows (#15866) 2025-09-30 09:55:13 -07:00
Adrien Gallouët
364a7a6d4a
common : remove common_has_curl() (#16351)
`test-arg-parser.cpp` has been updated to work consistently,
regardless of whether CURL or SSL support is available, and
now always points to `ggml.ai`.

The previous timeout test has been removed, but it can be
added back by providing a dedicated URL under `ggml.ai`.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-30 17:39:44 +03:00