askmyteapot
e2fefc373f
Update CMakeLists.txt - Fix source for ggml-cpu ( #1474 )
...
* Update CMakeLists.txt - Fix source for ggml-cpu
* Fixes std::min
adding compile define NOMINMAX seems to fix the further compile issues
2025-04-10 16:58:12 +08:00
Concedo
8acec907bb
revert sbti image write
2025-04-10 10:43:24 +08:00
Concedo
27f575dc83
inpaining support completed, invert mask added
2025-04-09 23:50:17 +08:00
Concedo
23339ace9b
inpainting works in kcpp!
2025-04-09 23:01:05 +08:00
Concedo
fea3b2bd4a
updated sdcpp prepare for inpaint
...
fixed img2img (+1 squashed commits)
Squashed commits:
[42c48f14] try update sdcpp, feels kind of buggy
2025-04-09 20:26:10 +08:00
Concedo
ebf924c5d1
Merge branch 'upstream' into concedo_experimental
2025-04-08 21:46:30 +08:00
Concedo
26e1653255
fixed templates not setting gpu when swapped with admin mode
2025-04-08 21:45:18 +08:00
Concedo
88660dd59d
merged qwen2.5vl again
2025-04-08 21:32:25 +08:00
Concedo
b99ee451f8
Merge commit ' 4ccea213bc' into concedo_experimental
...
# Conflicts:
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/musa.Dockerfile
# .devops/rocm.Dockerfile
# .github/workflows/bench.yml.disabled
# .github/workflows/build.yml
# .github/workflows/server.yml
# CMakeLists.txt
# build-xcframework.sh
# ci/run.sh
# common/CMakeLists.txt
# examples/llama.android/llama/build.gradle.kts
# examples/perplexity/perplexity.cpp
# examples/run/CMakeLists.txt
# examples/server/tests/README.md
# examples/sycl/win-build-sycl.bat
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/ggml-cpu.c
# licenses/LICENSE-linenoise
# scripts/sync-ggml.last
# tests/CMakeLists.txt
2025-04-08 21:26:23 +08:00
Concedo
822cf2430e
Merge commit ' f1e3eb4249' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/backend/SYCL.md
# examples/llava/clip.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in
2025-04-08 20:48:53 +08:00
Concedo
c58e9a2be3
revert q2.5vl before merge (+1 squashed commits)
...
Squashed commits:
[3197ea95] Revert "add tentative support for qwen2.5vl vision from HimariO fork"
This reverts commit 911669087a .
2025-04-08 20:38:41 +08:00
Prajwal B Mehendarkar
1d343b4069
arg : Including limits file on AIX ( #12822 )
2025-04-08 14:30:59 +02:00
Concedo
8e23a087e7
updated readme, memory detection prints
2025-04-08 20:23:52 +08:00
characharm
8ca6e1c3a4
server : webui : Improve Chat Input with Auto-Sizing Textarea ( #12785 )
...
* Update ChatScreen.tsx
* useAutosizeTextarea.ts
useAutosizeTextarea to encapsulate the logic.
* Implement responsive auto-sizing chat textarea
Replaces the manual textarea resizing with an automatic height adjustment based on content.
- `useChatTextarea` hook to manage textarea state and auto-sizing logic via refs, preserving the optimization
- Textarea now grows vertically up to a maximum height (`lg:max-h-48`) on large screens (lg breakpoint and up).
- Disables auto-sizing and enables manual vertical resizing (`resize-vertical`) on smaller screens for better mobile usability.
- Aligns the "Send" button to the bottom of the textarea (`items-end`) for consistent positioning during resize.
* -update compressed index.html.gz after npm run build
-refactor: replace OptimizedTextareaValue with AutosizeTextareaApi in VSCode context hook
* chore: normalize line endings to LF
refactor: AutosizeTextareaApi -> chatTextareaApi
* refactor: Rename interface to PascalCase
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-08 11:14:59 +02:00
Neo Zhang Jianyu
656babd6c2
Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" ( #12812 )
...
* Revert "sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_s…"
This reverts commit 518a01480e .
* Update ggml/src/ggml-sycl/ggml-sycl.cpp
* Update ggml/src/ggml-sycl/ggml-sycl.cpp
* rm tail space
2025-04-08 15:03:21 +08:00
compilade
a226bc7a9a
gguf-py : support lazy tensor splitting ( #12809 )
...
* gguf-py : support lazy tensor splitting
Splitting usually involves returning tuples of tensors,
which need to be handled properly to avoid early eager evaluation.
* gguf-py : fix flake8 lint
2025-04-08 09:03:07 +02:00
Xuan-Son Nguyen
1466621e73
llama : Support llama 4 text-only ( #12791 )
...
* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build
2025-04-07 23:06:44 +02:00
lhez
82974011f3
opencl: better identify Adreno GPU ( #12760 )
2025-04-07 13:22:54 -07:00
stduhpf
4ccea213bc
hellaswag: display estimated score confidence interval ( #12797 )
2025-04-07 18:47:08 +03:00
Georgi Gerganov
1a1ab7e7a4
cuda : fix HIP and MUSA BF16 ( #0 )
...
ggml-ci
2025-04-07 18:44:17 +03:00
Georgi Gerganov
a4e46e28f9
sync : ggml
...
ggml-ci
2025-04-07 18:44:17 +03:00
Georgi Gerganov
ff067dbcb9
ggml : simplify Arm fp16 CPU logic (ggml/1177)
...
* ggml : simlpify Arm fp16 CPU logic
ggml-ci
* cont : bring back CUDA/MUSA checks
ggml-ci
2025-04-07 18:44:17 +03:00
Sigbjørn Skjæret
36ca8b3628
CUDA: don't convert BF16 weights to FP32 (ggml/1174)
...
* add bf16 support
* use convert_from_bf16_cuda instead of convert_unary_cuda for f32
* revert 7ec5085
* move functionality into convert_unary with constexpr
2025-04-07 18:44:17 +03:00
cmdr2
995083e4ed
cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167)
...
* cpu: refactor SIMD mappings and vectorized op functions into separate files
* Fix warning for ggml_float to float
* Fix warnings
* cpu: move all the operations (except mul_mat) to a separate c++ file
* fix whitespace
* Update ggml/src/ggml-cpu/vec.h
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp
* Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-04-07 18:44:17 +03:00
zhouwg
518a01480e
sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor ( #12734 )
2025-04-07 17:22:57 +02:00
Concedo
11c4e7c2c4
automatic memory detection for vulkan
2025-04-07 22:56:12 +08:00
HimariO
b28ad7ecca
fix attn weight scaling after rebase
2025-04-07 22:07:56 +08:00
HimariO
223edef897
remove commented-out code blocks
2025-04-07 21:52:37 +08:00
HimariO
dde96b4774
remove not so often use qwen2vl-cli debug functions
2025-04-07 21:52:37 +08:00
HimariO
c4898d3dee
reuse qwen2vl converter instead
2025-04-07 21:52:37 +08:00
HimariO
8fcf682b28
ignore transformers Qwen2_5_xxx type check
2025-04-07 21:52:37 +08:00
HimariO
fdae70a832
cleaning up
2025-04-07 21:52:37 +08:00
HimariO
c891300c1e
move position id remap out of ggml to avoid int32 cuda operations
2025-04-07 21:52:37 +08:00
HimariO
e18f6a3238
fix few incorrect tensor memory layout
2025-04-07 21:52:37 +08:00
HimariO
ecd673f0c5
add debug utils
2025-04-07 21:51:18 +08:00
HimariO
7e5d20852d
add support for Qwen2_5_VLForConditionalGeneration
2025-04-07 21:51:18 +08:00
HimariO
9c827814e6
handle window attention inputs
2025-04-07 21:51:18 +08:00
HimariO
9c7cc6de9c
implment vision model architecture, gguf convertor
2025-04-07 21:46:06 +08:00
Concedo
a3f7de7142
fixed outetts docs
2025-04-07 21:31:43 +08:00
Xuan-Son Nguyen
e391d3ee8d
ci : no curl on ggml-ci ( #12796 )
2025-04-07 15:37:28 +03:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default ( #12761 )
...
* cmake : enable curl by default
* no curl if no examples
* fix build
* fix build-linux-cross
* add windows-setup-curl
* fix
* shell
* fix path
* fix windows-latest-cmake*
* run: include_directories
* LLAMA_RUN_EXTRA_LIBS
* sycl: no llama_curl
* no test-arg-parser on windows
* clarification
* try riscv64 / arm64
* windows: include libcurl inside release binary
* add msg
* fix mac / ios / android build
* will this fix xcode?
* try clearing the cache
* add bunch of licenses
* revert clear cache
* fix xcode
* fix xcode (2)
* fix typo
2025-04-07 13:35:19 +02:00
zhouwg
52b3d71f12
CANN: fix typo in ggml-cann ( #12733 )
2025-04-07 19:34:14 +08:00
hipudding
d0d5b2232b
CANN: Refactor to reduce duplicate code ( #12731 )
...
* CANN: Refactor to reduce duplicate code
* CANN: fix review comment
2025-04-07 17:10:36 +08:00
Concedo
6e42e673c6
attempt to fall back to system glslc
2025-04-07 00:33:52 +08:00
Concedo
5edbacdd0e
fix tools (+3 squashed commit)
...
Squashed commit:
[95a489ee] fix tools build
[1d3d3451] add accelerate
[2837705c ] edit a line
2025-04-06 21:30:48 +08:00
R0CKSTAR
916c83bfe7
musa: fix compilation warnings in mp_22/31 ( #12780 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-06 15:23:54 +02:00
Jeff Bolz
0c74b04376
vulkan: fix NaN issue in flash attention shader ( #12776 )
...
Use -FLT_MAX/2 rather than -inf as the initial value for computing the maximum.
2025-04-06 11:03:47 +02:00
Jeff Bolz
80b717d493
vulkan: Use unclamped loads for flash attention mask ( #12720 )
...
nem1 must be a multiple of GGML_KQ_MASK_PAD, and GGML_KQ_MASK_PAD is a multiple
of the number of rows in the matrix. The KV dim is a multiple of the number of
columns for the aligned shader.
2025-04-06 10:47:13 +02:00
Concedo
11f993ca10
added flag to adjust max request size
2025-04-06 00:13:00 +08:00
0cc4m
6bf28f0111
Vulkan: Tune Vulkan mmq int dot shader for performance ( #12767 )
2025-04-05 18:04:03 +02:00