Concedo
85b46f35e0
cleanup
2025-10-03 16:18:02 +08:00
Concedo
1731a3212c
Merge commit ' ded67b9444' into concedo_experimental
...
# Conflicts:
# .devops/rocm.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .github/workflows/release.yml
# CODEOWNERS
# common/CMakeLists.txt
# common/arg.cpp
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/get_rows.cl
# ggml/src/ggml-opencl/kernels/pad.cl
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
# tools/run/run.cpp
2025-10-03 16:15:27 +08:00
Concedo
4f8f0e5949
move embeds into their own dir, detach sd vocab into separate files
2025-10-03 14:21:09 +08:00
Concedo
c00ae93421
makefile fix vulkan noext compile (+1 squashed commits)
...
Squashed commits:
[eae88fd49] makefile fix vulkan noext compile
2025-10-02 23:19:45 +08:00
Concedo
f2ad0b78d4
refactor of added images done
2025-10-02 21:22:35 +08:00
Concedo
df87da4694
wip refactor ref image bufs
2025-10-02 16:37:17 +08:00
Concedo
539db70eac
clip to cpu by default
2025-10-02 10:56:51 +08:00
Wagner Bruna
ac6be8ab8c
sd: do not force T5 on CPU anymore ( #1769 )
...
We now have the clip_cpu config parameter for that. Todo: Will make clip cpu on by default
2025-10-02 10:48:16 +08:00
Concedo
4587ccb71a
prepare to refactor reference image
2025-10-02 10:41:29 +08:00
Shunta Saito
ded67b9444
llama : parameter conversion and loading fixes for PLaMo2 variants ( #16075 )
...
* Fix to use hidden_size_per_head
* Fix num heads
* Fix array
* Fix loading weights
* Support old GGUF converted by the previous version of llama.cpp
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Move shared parameter definitions to the outside of loop
* Not calculating n_embd_head_k,v by n_embd / n_head
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-01 23:08:15 +02:00
uvos
1fe4e38cc2
ci: Properly install rocwmma for hip builds ( #16305 )
...
* CI: Properly install rocwmma for hip builds
on windows we now windows install rocwmma from ubuntu pacakges
* CI: update linux rocm docker build to use rocm 7.0
2025-10-01 20:18:03 +02:00
Adrien Gallouët
4201deae9c
common: introduce http.h for httplib-based client ( #16373 )
...
* common: introduce http.h for httplib-based client
This change moves cpp-httplib based URL parsing and client setup into
a new header `common/http.h`, and integrates it in `arg.cpp` and `run.cpp`.
It is an iteration towards removing libcurl, while intentionally
minimizing changes to existing code to guarantee the same behavior when
`LLAMA_CURL` is used.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* tools : add missing WIN32_LEAN_AND_MEAN
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2025-10-01 20:22:18 +03:00
Aleksander Grygier
764799279f
Conversation action dialogs as singletons from Chat Sidebar + apply conditional rendering for Actions Dropdown for Chat Conversation Items ( #16369 )
...
* fix: Render Conversation action dialogs as singletons from Chat Sidebar level
* chore: update webui build output
* fix: Render Actions Dropdown conditionally only when user hovers conversation item + remove unused markup
* chore: Update webui static build
* fix: Always truncate conversation names
* chore: Update webui static build
2025-10-01 18:18:10 +02:00
Concedo
e4c40405fb
update lite
2025-10-01 23:23:10 +08:00
Aleksander Grygier
2a9b63383a
Improve code block color theming ( #16325 )
...
* feat: Improve code block theming
* chore: update webui build output
* chore: Update webui static build
2025-10-01 15:54:42 +02:00
Sigbjørn Skjæret
1104ca1a1c
ci : use registry cache for docker builds ( #16366 )
2025-10-01 14:09:52 +02:00
Aleksander Grygier
4f1575921c
Add optional setting for showing "Model used:" information ( #16337 )
...
* feat: Add a setting to include model name used to generate the message
* feat: UI improvements
* feat: Save model info along with the database message entry creation
* chore: Build webui static output
2025-10-01 12:08:16 +02:00
Concedo
e49ac6b120
allow clip_vision to be loaded via clip_l or clip_g param
2025-10-01 17:57:49 +08:00
Concedo
2fc31d36c0
gif mime type for animated images
2025-10-01 17:18:00 +08:00
Eve
132d673554
vulkan: make ggml_vk_default_dispatcher support older vulkan headers ( #16345 )
...
* make ggml_vk_default_dispatcher support older vulkan headers
* simpilfy with using
2025-10-01 09:56:36 +02:00
Aleksander Grygier
aa9538a63a
webui: Remove running llama-server within WebUI dev.sh script ( #16363 )
2025-10-01 08:40:26 +03:00
Bartowski
e74c92e842
model : support GLM 4.6 (make a few NextN/MTP tensors not required) ( #16359 )
...
* Make a few GLM tensors not required
layer.nextn.shared_head_head and layer.nextn.embed_tokens are both excluded from GLM 4.6 resulting in the model not loading after conversion/quantization, this marks those tensors as not required which makes it work
* Update llama-model.cpp
layer.nextn.shared_head_norm also not required in case of future models
2025-09-30 22:24:36 +02:00
Sigbjørn Skjæret
b2ba81dbe0
ci : fix ccache key for ubuntu-cpu-cmake ( #16355 )
...
* fix ccache key for ubuntu-cpu-cmake
* set it for release as well [no ci]
2025-09-30 21:41:42 +02:00
Adrien Gallouët
bf6f3b3a19
common : disable progress bar without a tty ( #16352 )
...
* common : disable progress bar without a tty
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add missing headers
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-30 20:52:41 +03:00
lhez
7c156df414
opencl: support pad_ext ( #15888 )
2025-09-30 10:45:45 -07:00
Pascal
16b0ca0d2e
Chatapi ignore empty sampling ( #16330 )
...
* fix: skip empty sampling fields instead of coercing to 0 in chat API options
* chore: update webui build output
2025-09-30 19:18:54 +02:00
Reese Levine
8d78cd2613
ggml webgpu: support for rope,div,sub,glu,scale,cont operators ( #16187 )
...
* Work on rope
* Simplify inplace operation generation and combine mul/add generation
* Work on rope variants
* implement neox rope
* rope complete
* Add sub,div,glu operators
* implement scale op
* Update cpy shader to handle cont/more types
* formatting
* Update test vars printing for rope,rms_norm
* Avoid ROPE hardcoded constants
* Add TODO to change ROPE constants to enum
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix TODO comment
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-09-30 09:57:51 -07:00
lhez
d1c84a662d
opencl: support ne3 in get_rows ( #15866 )
2025-09-30 09:55:13 -07:00
Adrien Gallouët
364a7a6d4a
common : remove common_has_curl() ( #16351 )
...
`test-arg-parser.cpp` has been updated to work consistently,
regardless of whether CURL or SSL support is available, and
now always points to `ggml.ai`.
The previous timeout test has been removed, but it can be
added back by providing a dedicated URL under `ggml.ai`.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-30 17:39:44 +03:00
Concedo
20c802a198
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CODEOWNERS
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
2025-09-30 22:28:53 +08:00
Sigbjørn Skjæret
2df5bcf357
ci : disable ccache for android ( #16348 )
2025-09-30 15:38:01 +02:00
Georgi Gerganov
075c01567b
ggml : bump version to 0.9.4 (ggml/1363)
2025-09-30 13:53:55 +03:00
Concedo
b3a0ba5e37
adjust max frames
2025-09-30 17:43:27 +08:00
Concedo
2201ddb759
fix tool builds
2025-09-30 16:29:11 +08:00
anavp-nvidia
a014310374
cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) ( #16328 )
...
* Fix Nemotron Nano v2 9B not executing as CUDA Graph on NVIDIA GPUs
* fix to ensure test-backend-ops check passes
2025-09-30 11:13:22 +03:00
Georgi Gerganov
35fb82497e
metal : dynamic simdgroups for MV kernels ( #16340 )
...
* metal : dynamic simdgroups for MV kernels
* cont : minor
2025-09-30 11:03:23 +03:00
Adrien Gallouët
3c62aed89f
common : simplify etag tracking by removing json ( #16342 )
...
The JSON parser is temporarily kept only for backward compatibility. It
reads the etag from old .json files to prevent unnecessary re-downloads
for existing users.
This legacy code can be removed in a future version.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-30 10:36:33 +03:00
Charles Xu
f1eb1cb1eb
kleidiai : fix work size and threads sync for fp16 ( #16246 )
2025-09-30 10:07:20 +03:00
Concedo
9e4c29fda7
generate both gif and pick smaller (+1 squashed commits)
...
Squashed commits:
[09122d052] generate both gif and pick the smaller one
2025-09-30 14:42:58 +08:00
Concedo
4117542eae
switch to msf gif
2025-09-30 13:56:41 +08:00
lhez
de41f2b7bf
codeowners: add codeowners for opencl backend ( #16344 )
2025-09-30 08:30:16 +03:00
Jeff Bolz
a74a0d69f3
tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences ( #16295 )
...
* tests: override test_set_rows::max_nmse_err to allow for occasional rounding differences
* apply similar error bounds to test_cpy
2025-09-29 19:26:34 -05:00
Pascal
5f7e166cbf
Fix thinking blocks with quotes + add handling [THINK]...[/THINK] blocks ( #16326 )
...
* fix: prevent reasoning blocks with quotes from being truncated
* chore: update webui build output
* feat: Improve thinking content parsing
* test: Adds ChatMessage component stories for different thinking blocks
* chore: update webui build output
* fix: ChatMessage story fix
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-09-29 18:49:47 +02:00
Concedo
4f2b951547
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/musa.Dockerfile
# .github/workflows/build-linux-cross.yml
# .github/workflows/build-riscv-native.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CODEOWNERS
# ci/run.sh
# ggml/CMakeLists.txt
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tools/perplexity/perplexity.cpp
# tools/server/README.md
2025-09-30 00:36:38 +08:00
Concedo
1a1ebfc304
Merge commit ' 75a3a6c2cd' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# common/CMakeLists.txt
# ggml/src/ggml-cuda/fattn-vec-f16.cuh
# ggml/src/ggml-cuda/fattn.cu
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-09-30 00:33:36 +08:00
Concedo
4b1c89ca5c
can save animated gifs
2025-09-29 22:52:42 +08:00
Georgi Gerganov
d72f5f7ba2
ci : add AMD runners and workflows ( #16249 )
...
Python Type-Check / pyright type-check (push) Has been cancelled
* ci : add AMD runners and workflows
* ci : move AMD jobs to separate workflow
* cont : fix paths
2025-09-29 17:51:48 +03:00
alex-spacemit
b77e6c18e1
ggml: riscv: add riscv spacemit backend ( #15288 )
...
* ggml: add spacemit backend
Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23
* add new line at end of file
Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2
* add riscv zba extension limit
Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce
* fixed for review comments, file renamed and format
Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce
* fixed for code format, after clang-format
Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2
* use _Float16 instead of __fp16
Change-Id: I039fb02bb95270e641bc4442204e658735859d43
* add ci for riscv64-spacemit-ime-native
Change-Id: I711c1033061df1a289ea77891b2997599dfe8279
* update debian-13-riscv64-spacemit-ime-native ci label
Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a
* remove license comment for spacemit ime
Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3
* upgrade binutils for gcc ime
Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45
* add spacemit ime cross jobs
Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6
* remove native compile for riscv64-spacemit-ime
Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e
* ci : add caching for spacemit ime cross toolchain
Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de
* ci: bug fixed for cache path and env
Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a
* Update .github/workflows/build-linux-cross.yml for cache path
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* bugfixed for build-linux-cross.yml, syntax error
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: cailinxi <linxi.cai@spacemit.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-29 17:50:44 +03:00
Georgi Gerganov
2ddd3f2356
sync : ggml
2025-09-29 17:43:58 +03:00
Georgi Gerganov
4d3d455d3c
sync : whisper.cpp (ggml/1359)
...
* ggml : Fix MKL detection by quoting BLAS_INCLUDE_DIRS (whisper/3426)
* sync : whisper.cpp
2025-09-29 17:43:58 +03:00