Commit graph

8381 commits

Author SHA1 Message Date
Concedo
33809c9e82 doing what i must because i can, after the mess that is https://github.com/ggml-org/llama.cpp/pull/13892
there is so much duplicate code in each cpu arch, i expect upstream will prune it eventually
arch detection has no fallback if all the arches are not found, by right we should set GGML_CPU_GENERIC
i should be relaxing its the weekend
2025-06-14 01:41:16 +08:00
Concedo
f50c793140 not working - refactoring 2025-06-14 00:03:21 +08:00
Concedo
c494525b33 update deprecated apis 2025-06-13 22:21:15 +08:00
Concedo
4204f111f7 Merge commit '8f47e25f56' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	.github/workflows/build-linux-cross.yml
#	docs/backend/CANN.md
#	examples/batched.swift/Sources/main.swift
#	examples/embedding/embedding.cpp
#	examples/gritlm/gritlm.cpp
#	examples/llama.android/llama/src/main/cpp/llama-android.cpp
#	examples/llama.swiftui/llama.cpp.swift/LibLlama.swift
#	examples/lookahead/lookahead.cpp
#	examples/lookup/lookup.cpp
#	examples/parallel/parallel.cpp
#	examples/passkey/passkey.cpp
#	examples/retrieval/retrieval.cpp
#	examples/save-load-state/save-load-state.cpp
#	examples/simple-chat/simple-chat.cpp
#	examples/speculative-simple/speculative-simple.cpp
#	examples/speculative/speculative.cpp
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-sycl/convert.cpp
#	ggml/src/ggml-sycl/cpy.cpp
#	ggml/src/ggml-sycl/dequantize.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	tools/batched-bench/batched-bench.cpp
#	tools/cvector-generator/cvector-generator.cpp
#	tools/imatrix/imatrix.cpp
#	tools/llama-bench/llama-bench.cpp
#	tools/perplexity/perplexity.cpp
#	tools/run/run.cpp
2025-06-13 22:05:03 +08:00
Wagner Bruna
f6d2d1ce5c
configurable resolution limit (#1586)
* refactor image gen configuration screen

* make image size limit configurable

* fix resolution limits and keep dimensions closer to the original ratio

* use 0.0 for the configured default image size limit

This prevents the current default value from being saved into the
config files, in case we later decide to adopt a different value.

* export image model version when loading

* restore model-specific default image size limit

* change the image area restriction to be specified by a square side

* move image resolution limits down to the C++ level

* Revert "export image model version when loading"

This reverts commit fa65b23de3.

* Linting Fixes:
PY:
- Inconsistent var name sd_restrict_square -> sd_restrict_square_var
- GUI swap back to using absolute row numbers for now.
- fstring fix
- size_limit -> side_limit inconsistency
C++:
- roundup_64 standalone function
- refactor sd_fix_resolution variable names for clarity
- move "anti crashing" hard total megapixel limit always to be applied after soft total megapixel limit instead of conditionally only when sd_restrict_square is unset

* allow unsafe resolutions if debugmode is on

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2025-06-13 20:05:20 +08:00
Reithan
f1c9db4174
fix-loss-of-destroyed-tokens-in-grammar-pre-pass (#1600) 2025-06-13 18:46:38 +08:00
Concedo
5bac0fb3d5 remove debug prints for now, they were kind of cluttered 2025-06-13 16:00:23 +08:00
Reithan
5af9138ebe
Improve GNBF performance by attempting culled grammar search first (#1597)
* cull tokens with top_3k first before running grammar, fallback to unculled if none found

* fix errors

* fix improvement and test against concedo's GBNF

* revert non-culling changes
2025-06-13 15:57:27 +08:00
Concedo
1cbe716e45 allow setting maingpu 2025-06-12 17:53:43 +08:00
Concedo
7a688e07cd remove gfx12 until amd wakes up 2025-06-12 16:52:55 +08:00
Concedo
1970d8c9e8 uvos said it might work 2025-06-12 16:44:46 +08:00
Concedo
5cdb2d3fc6 cleanup 2025-06-11 01:35:40 +08:00
henk717
f151648f03 Pyinstaller launcher and dependency updates
This PR adds a new launcher executable to the unpack feature, eliminating the need to have python and its dependencies in the unpacked version. It also does a few dependency changes to help future proof.
2025-06-10 23:08:02 +08:00
Concedo
8386546e08 Switched VS2019 for revert cu12.1 build, hopefully solves dll issues
try change order (+3 squashed commit)

Squashed commit:

[457f02507] try newer jimver

[64af28862] windows pyinstaller shim. the final loader will be moved into the packed directory later.

[0272ecf2d] try alternative way of getting cuda toolkit 12.4 since jimver wont work, also fix rocm
try again (+3 squashed commit)

Squashed commit:

[133e81633] try without pwsh

[4d99cefba] try without pwsh

[bdfa91e7d] try alternative way of getting cuda toolkit 12.4, also fix rocm
2025-06-10 23:08:02 +08:00
Diego Devesa
7f4fbe5183
llama : allow building all tests on windows when not using shared libs (#13980)
* llama : allow building all tests on windows when not using shared libraries

* add static windows build to ci

* tests : enable debug logs for test-chat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-09 20:03:09 +02:00
Concedo
28b35ca879 allow wmma flag for rocm 2025-06-10 01:23:48 +08:00
Concedo
7d8aa31f1f fixed embeddings, added new parameter to limit max embeddings context 2025-06-10 01:11:55 +08:00
xctan
f470bc36be
ggml-cpu : split arch-specific implementations (#13892)
* move ggml-cpu-aarch64 to repack

* split quantize_row_q8_0/1

* split helper functions

* split ggml_vec_dot_q4_0_q8_0

* split ggml_vec_dot_q4_1_q8_1

* split ggml_vec_dot_q5_0_q8_0

* split ggml_vec_dot_q5_1_q8_1

* split ggml_vec_dot_q8_0_q8_0

* split ggml_vec_dot_tq1_0_q8_K

* split ggml_vec_dot_tq2_0_q8_K

* split ggml_vec_dot_q2_K_q8_K

* split ggml_vec_dot_q3_K_q8_K

* split ggml_vec_dot_q4_K_q8_K

* split ggml_vec_dot_q5_K_q8_K

* split ggml_vec_dot_q6_K_q8_K

* split ggml_vec_dot_iq2_xxs_q8_K

* split ggml_vec_dot_iq2_xs_q8_K

* split ggml_vec_dot_iq2_s_q8_K

* split ggml_vec_dot_iq3_xxs_q8_K

* split ggml_vec_dot_iq3_s_q8_K

* split ggml_vec_dot_iq1_s_q8_K

* split ggml_vec_dot_iq1_m_q8_K

* split ggml_vec_dot_iq4_nl_q8_0

* split ggml_vec_dot_iq4_xs_q8_K

* fix typos

* fix missing prototypes

* rename ggml-cpu-quants.c

* rename ggml-cpu-traits

* rename arm folder

* move cpu-feats-x86.cpp

* rename ggml-cpu-hbm

* update arm detection macro in quants.c

* move iq quant tables

* split ggml_quantize_mat_q8_0/K

* split ggml_gemv_*

* split ggml_gemm_*

* rename namespace aarch64 to repack

* use weak aliases to replace test macros

* rename GGML_CPU_AARCH64 to GGML_CPU_REPACK

* rename more aarch64 to repack

* clean up rebase leftover

* fix compilation errors

* remove trailing spaces

* try to fix clang compilation errors

* try to fix clang compilation errors again

* try to fix clang compilation errors, 3rd attempt

* try to fix clang compilation errors, 4th attempt

* try to fix clang compilation errors, 5th attempt

* try to fix clang compilation errors, 6th attempt

* try to fix clang compilation errors, 7th attempt

* try to fix clang compilation errors, 8th attempt

* try to fix clang compilation errors, 9th attempt

* more cleanup

* fix compilation errors

* fix apple targets

* fix a typo in arm version of ggml_vec_dot_q4_K_q8_K

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-09 16:47:13 +02:00
Diego Devesa
8f47e25f56
cuda : fix device sync on buffer clear (#14033) 2025-06-09 16:36:26 +02:00
Georgi Gerganov
201b31dc2e
graph : fix geglu (#14077)
ggml-ci
2025-06-09 17:17:31 +03:00
Xinpeng Dou
e21d2d4ae2
CANN: Simplify the environment variable setting(#13104)
* Simplify the environment variable setting to specify the memory pool type.

* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.

* update

* fix CI

* update

* delete whitespace

* fix according to review

* update CANN.md

* update CANN.md
2025-06-09 19:47:39 +08:00
R0CKSTAR
dc0623fddb
webui: fix sidebar being covered by main content (#14082)
* webui: fix sidebar being covered by main content

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* webui: update index.html.gz

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-09 12:01:17 +02:00
Georgi Gerganov
87d34b381d
server : fix LRU check (#14079)
ggml-ci
2025-06-09 12:57:58 +03:00
Concedo
8780b33c64 consolidate imports 2025-06-09 17:48:54 +08:00
Nicolò Scipione
b460d16ae8
sycl: Add reorder to Q6_K mmvq implementation (#13885)
* Add Reorder to Q6_K mmvq implementation

* Address PR comments: clean up comments

* Remove unused parameter after refactoring q4_k

* Adding inline to function and removing unnecessary reference to int

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-06-09 11:47:07 +02:00
Concedo
deece4be69 missed a build target 2025-06-09 17:05:56 +08:00
Concedo
68ec00909b updated lite (+1 squashed commits)
Squashed commits:

[375c5768b] updated lite
2025-06-09 16:33:42 +08:00
Đinh Trọng Huy
91a8ee6a6f
add geglu activation function (#14074)
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
2025-06-09 05:15:31 +01:00
Yuanhao Ji
056eb74534
CANN: Enable labeler for Ascend NPU (#13914) 2025-06-09 11:20:06 +08:00
Diego Devesa
247e5c6e44
cuda : fix buffer type check with integrated GPUs (#14069) 2025-06-08 11:39:56 -07:00
Concedo
82d7c53b85 embeddings handle base64 2025-06-09 00:26:40 +08:00
Concedo
7de88802f9 revert padding change for sd chroma 2025-06-08 23:48:46 +08:00
Concedo
1cf7648305 fixed adapter 2025-06-08 23:24:11 +08:00
Concedo
771bd7197b updated lite (+1 squashed commits)
Squashed commits:

[907f10f2f] updated lite
2025-06-08 23:22:26 +08:00
Concedo
6c5c8be48d try to make rocm work for the github ci, requires disabling rocwmma 2025-06-08 21:52:29 +08:00
Concedo
7f57846c2f update bundled vcrts 2025-06-08 19:39:42 +08:00
Concedo
2d4c1aa5a0 chroma support is now usable 2025-06-08 18:53:59 +08:00
Concedo
30cf433ab4 merge base support for chroma, however its not working correctly 2025-06-08 18:06:23 +08:00
Concedo
dcf88d6e78 Revert "make tts use gpu by default. use --ttscpu to disable"
This reverts commit 669f80265b.
2025-06-08 17:08:04 +08:00
Concedo
669f80265b make tts use gpu by default. use --ttscpu to disable 2025-06-08 17:06:19 +08:00
Concedo
7132d6b15c test rocm rolling (+1 squashed commits)
Squashed commits:

[43c8f7fc6] test rocm rolling (+4 squashed commit)

Squashed commit:

[16a60aa77] test clobber 4

[a6c866450] test clobber 3

[9322f17f6] test clobber 2

[b7a420cbe] testing clobber
2025-06-08 15:33:05 +08:00
henk717
5d8f499f03
Remove 32GB of rocm dependencies with this one special trick (#1585)
* One file to remove them all

* That one lib wasn't versioned
2025-06-08 11:16:15 +08:00
Concedo
a80dfa5c10 various minor fixes 2025-06-08 01:11:42 +08:00
吴小白
5787b5da57
ci: add LoongArch cross-compile build (#13944) 2025-06-07 10:39:11 -03:00
Akarshan Biswas
228f34c9ce
SYCL: Implement few same quantized type copy kernels (#13739)
* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.
2025-06-07 18:58:20 +05:30
Sigbjørn Skjæret
0974ad7a7c
llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050) 2025-06-07 14:13:12 +02:00
Concedo
301450b1eb attempt to use system glslc first before using bundled glslc 2025-06-07 16:54:25 +08:00
Concedo
38ce7e06cc updated readme 2025-06-07 10:23:41 +08:00
Concedo
cfcdfd69bd allow embeddings models to use mmap 2025-06-07 10:14:00 +08:00
Concedo
abc272d89f breaking change: standardize ci binary names 2025-06-07 00:40:46 +08:00