Commit graph

7526 commits

Author SHA1 Message Date
askmyteapot
e2fefc373f
Update CMakeLists.txt - Fix source for ggml-cpu (#1474)
* Update CMakeLists.txt - Fix source for ggml-cpu

* Fixes std::min

adding compile define NOMINMAX seems to fix the further compile issues
2025-04-10 16:58:12 +08:00
Concedo
8acec907bb revert sbti image write 2025-04-10 10:43:24 +08:00
Concedo
27f575dc83 inpaining support completed, invert mask added 2025-04-09 23:50:17 +08:00
Concedo
23339ace9b inpainting works in kcpp! 2025-04-09 23:01:05 +08:00
Concedo
fea3b2bd4a updated sdcpp prepare for inpaint
fixed img2img (+1 squashed commits)

Squashed commits:

[42c48f14] try update sdcpp, feels kind of buggy
2025-04-09 20:26:10 +08:00
Concedo
ebf924c5d1 Merge branch 'upstream' into concedo_experimental 2025-04-08 21:46:30 +08:00
Concedo
26e1653255 fixed templates not setting gpu when swapped with admin mode 2025-04-08 21:45:18 +08:00
Concedo
88660dd59d merged qwen2.5vl again 2025-04-08 21:32:25 +08:00
Concedo
b99ee451f8 Merge commit '4ccea213bc' into concedo_experimental
# Conflicts:
#	.devops/cpu.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/rocm.Dockerfile
#	.github/workflows/bench.yml.disabled
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	build-xcframework.sh
#	ci/run.sh
#	common/CMakeLists.txt
#	examples/llama.android/llama/build.gradle.kts
#	examples/perplexity/perplexity.cpp
#	examples/run/CMakeLists.txt
#	examples/server/tests/README.md
#	examples/sycl/win-build-sycl.bat
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cpu/ggml-cpu.c
#	licenses/LICENSE-linenoise
#	scripts/sync-ggml.last
#	tests/CMakeLists.txt
2025-04-08 21:26:23 +08:00
Concedo
822cf2430e Merge commit 'f1e3eb4249' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	README.md
#	docs/backend/SYCL.md
#	examples/llava/clip.cpp
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in
2025-04-08 20:48:53 +08:00
Concedo
c58e9a2be3 revert q2.5vl before merge (+1 squashed commits)
Squashed commits:

[3197ea95] Revert "add tentative support for qwen2.5vl vision from HimariO fork"

This reverts commit 911669087a.
2025-04-08 20:38:41 +08:00
Prajwal B Mehendarkar
1d343b4069
arg : Including limits file on AIX (#12822) 2025-04-08 14:30:59 +02:00
Concedo
8e23a087e7 updated readme, memory detection prints 2025-04-08 20:23:52 +08:00
characharm
8ca6e1c3a4
server : webui : Improve Chat Input with Auto-Sizing Textarea (#12785)
* Update ChatScreen.tsx

* useAutosizeTextarea.ts

useAutosizeTextarea to encapsulate the logic.

* Implement responsive auto-sizing chat textarea

Replaces the manual textarea resizing with an automatic height adjustment based on content.

- `useChatTextarea` hook to manage textarea state and auto-sizing logic via refs, preserving the optimization
- Textarea now grows vertically up to a maximum height (`lg:max-h-48`) on large screens (lg breakpoint and up).
- Disables auto-sizing and enables manual vertical resizing (`resize-vertical`) on smaller screens for better mobile usability.
- Aligns the "Send" button to the bottom of the textarea (`items-end`) for consistent positioning during resize.

* -update compressed index.html.gz after npm run build
-refactor: replace OptimizedTextareaValue with AutosizeTextareaApi in VSCode context hook

* chore: normalize line endings to LF
refactor: AutosizeTextareaApi -> chatTextareaApi

* refactor: Rename interface to PascalCase

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-08 11:14:59 +02:00
Neo Zhang Jianyu
656babd6c2
Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (#12812)
* Revert "sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_s…"

This reverts commit 518a01480e.

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

* rm tail space
2025-04-08 15:03:21 +08:00
compilade
a226bc7a9a
gguf-py : support lazy tensor splitting (#12809)
* gguf-py : support lazy tensor splitting

Splitting usually involves returning tuples of tensors,
which need to be handled properly to avoid early eager evaluation.

* gguf-py : fix flake8 lint
2025-04-08 09:03:07 +02:00
Xuan-Son Nguyen
1466621e73
llama : Support llama 4 text-only (#12791)
* llama4 conversion

* initial support, no chat template

* clean up a bit

* fix tokenizer conversion

* correct hparams

* try this

* fix shexp

* ffn_inp_normed

* chat template

* clean up model conversion

* add_bos

* add scale_before_ffn

* fix order

* weight_before_ffn

* llm_graph_input_attn_temp

* add chunk attn mask

* build_inp_attn_scale()

* add comment about ggml_repeat

* clarify comments

* fix build
2025-04-07 23:06:44 +02:00
lhez
82974011f3
opencl: better identify Adreno GPU (#12760) 2025-04-07 13:22:54 -07:00
stduhpf
4ccea213bc
hellaswag: display estimated score confidence interval (#12797) 2025-04-07 18:47:08 +03:00
Georgi Gerganov
1a1ab7e7a4 cuda : fix HIP and MUSA BF16 (#0)
ggml-ci
2025-04-07 18:44:17 +03:00
Georgi Gerganov
a4e46e28f9 sync : ggml
ggml-ci
2025-04-07 18:44:17 +03:00
Georgi Gerganov
ff067dbcb9 ggml : simplify Arm fp16 CPU logic (ggml/1177)
* ggml : simlpify Arm fp16 CPU logic

ggml-ci

* cont : bring back CUDA/MUSA checks

ggml-ci
2025-04-07 18:44:17 +03:00
Sigbjørn Skjæret
36ca8b3628 CUDA: don't convert BF16 weights to FP32 (ggml/1174)
* add bf16 support

* use convert_from_bf16_cuda instead of convert_unary_cuda for f32

* revert 7ec5085

* move functionality into convert_unary with constexpr
2025-04-07 18:44:17 +03:00
cmdr2
995083e4ed cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167)
* cpu: refactor SIMD mappings and vectorized op functions into separate files

* Fix warning for ggml_float to float

* Fix warnings

* cpu: move all the operations (except mul_mat) to a separate c++ file

* fix whitespace

* Update ggml/src/ggml-cpu/vec.h

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp

* Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-04-07 18:44:17 +03:00
zhouwg
518a01480e
sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (#12734) 2025-04-07 17:22:57 +02:00
Concedo
11c4e7c2c4 automatic memory detection for vulkan 2025-04-07 22:56:12 +08:00
HimariO
b28ad7ecca fix attn weight scaling after rebase 2025-04-07 22:07:56 +08:00
HimariO
223edef897 remove commented-out code blocks 2025-04-07 21:52:37 +08:00
HimariO
dde96b4774 remove not so often use qwen2vl-cli debug functions 2025-04-07 21:52:37 +08:00
HimariO
c4898d3dee reuse qwen2vl converter instead 2025-04-07 21:52:37 +08:00
HimariO
8fcf682b28 ignore transformers Qwen2_5_xxx type check 2025-04-07 21:52:37 +08:00
HimariO
fdae70a832 cleaning up 2025-04-07 21:52:37 +08:00
HimariO
c891300c1e move position id remap out of ggml to avoid int32 cuda operations 2025-04-07 21:52:37 +08:00
HimariO
e18f6a3238 fix few incorrect tensor memory layout 2025-04-07 21:52:37 +08:00
HimariO
ecd673f0c5 add debug utils 2025-04-07 21:51:18 +08:00
HimariO
7e5d20852d add support for Qwen2_5_VLForConditionalGeneration 2025-04-07 21:51:18 +08:00
HimariO
9c827814e6 handle window attention inputs 2025-04-07 21:51:18 +08:00
HimariO
9c7cc6de9c implment vision model architecture, gguf convertor 2025-04-07 21:46:06 +08:00
Concedo
a3f7de7142 fixed outetts docs 2025-04-07 21:31:43 +08:00
Xuan-Son Nguyen
e391d3ee8d
ci : no curl on ggml-ci (#12796) 2025-04-07 15:37:28 +03:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default (#12761)
* cmake : enable curl by default

* no curl if no examples

* fix build

* fix build-linux-cross

* add windows-setup-curl

* fix

* shell

* fix path

* fix windows-latest-cmake*

* run: include_directories

* LLAMA_RUN_EXTRA_LIBS

* sycl: no llama_curl

* no test-arg-parser on windows

* clarification

* try riscv64 / arm64

* windows: include libcurl inside release binary

* add msg

* fix mac / ios / android build

* will this fix xcode?

* try clearing the cache

* add bunch of licenses

* revert clear cache

* fix xcode

* fix xcode (2)

* fix typo
2025-04-07 13:35:19 +02:00
zhouwg
52b3d71f12
CANN: fix typo in ggml-cann (#12733) 2025-04-07 19:34:14 +08:00
hipudding
d0d5b2232b
CANN: Refactor to reduce duplicate code (#12731)
* CANN: Refactor to reduce duplicate code

* CANN: fix review comment
2025-04-07 17:10:36 +08:00
Concedo
6e42e673c6 attempt to fall back to system glslc 2025-04-07 00:33:52 +08:00
Concedo
5edbacdd0e fix tools (+3 squashed commit)
Squashed commit:

[95a489ee] fix tools build

[1d3d3451] add accelerate

[2837705c] edit a line
2025-04-06 21:30:48 +08:00
R0CKSTAR
916c83bfe7
musa: fix compilation warnings in mp_22/31 (#12780)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-06 15:23:54 +02:00
Jeff Bolz
0c74b04376
vulkan: fix NaN issue in flash attention shader (#12776)
Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.
2025-04-06 11:03:47 +02:00
Jeff Bolz
80b717d493
vulkan: Use unclamped loads for flash attention mask (#12720)
nem1 must be a multiple of GGML_KQ_MASK_PAD, and GGML_KQ_MASK_PAD is a multiple
of the number of rows in the matrix. The KV dim is a multiple of the number of
columns for the aligned shader.
2025-04-06 10:47:13 +02:00
Concedo
11f993ca10 added flag to adjust max request size 2025-04-06 00:13:00 +08:00
0cc4m
6bf28f0111
Vulkan: Tune Vulkan mmq int dot shader for performance (#12767) 2025-04-05 18:04:03 +02:00