Concedo
fe12b1cbd4
fixed lora, now works quanted too
2025-04-14 23:44:42 +08:00
Concedo
ad2522b319
str splitter
2025-04-14 23:05:36 +08:00
Akarshan Biswas
75afa0ae31
SYCL: Fix im2col ( #12910 )
...
* SYCL: Fix im2col
* restore local workgroup size adjustments for large inputs
* restore format
2025-04-14 14:23:53 +02:00
Radoslav Gerganov
c772d54926
rpc : use ggml_context_ptr ( #12938 )
2025-04-14 13:59:34 +03:00
Neo Zhang Jianyu
81c7e64fc2
dsiable curl lib check, this action is missed by commit bd3f59f812
( #12761 ) ( #12937 )
2025-04-14 18:19:07 +08:00
Concedo
6bc2ca4803
added more sanity checks on zenity
2025-04-14 15:06:08 +08:00
Concedo
ffa0bc21e6
workaround for rwkv
2025-04-14 14:46:08 +08:00
Georgi Gerganov
526739b879
sync : ggml
...
ggml-ci
2025-04-14 09:26:15 +03:00
cmdr2
a25355e264
cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190)
2025-04-14 09:26:15 +03:00
Concedo
3d31d75c8f
clamp and display detected GPU memory
2025-04-14 14:19:23 +08:00
SXX
e959d32b1c
ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register ( #12773 )
...
* ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register
* simplifies the codebase by removing redundant functions
2025-04-14 08:47:55 +03:00
Concedo
e1ee857b1e
allow vulkan to be packaged without coopmat for noavx2
2025-04-14 12:40:00 +08:00
Alan Gray
307bfa253d
ggml: disable CUDA graphs for unsupported DUP and CONT node types ( #12891 )
...
Fixes #12798
2025-04-13 23:12:21 +02:00
Ed Addario
71e90e8813
quantize: Handle user-defined quantization levels for additional tensors ( #12511 )
...
* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Minor refactoring as per the contributors' coding guidelines
* Update descriptions to match existing style
* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Minor refactoring as per the contributors' guidelines
* Implement general --tensor-type instead of tensor-specific command option
* Fix implied type bug
* Restore missing #includes
* Add regex capability for tensor selection
* Refactor function name and update ALLOWED_TENSOR_TYPE
* Add missing #include
* Handle edge case when tensor name is cls.output
* Minor logging improvement
2025-04-13 21:29:28 +03:00
Concedo
e0aa7aa4d9
updated sdui
2025-04-13 22:37:30 +08:00
Concedo
2d0b7e37f9
fix build
2025-04-13 22:01:48 +08:00
Concedo
895d008c5f
the bloke has retired for a year, its time to let go
2025-04-13 17:00:00 +08:00
Prajwal B Mehendarkar
bc091a4dc5
common : Define cache directory on AIX ( #12915 )
2025-04-12 17:33:39 +02:00
Concedo
a6149ad0fc
fixed g3 adapter back
2025-04-12 23:17:54 +08:00
Concedo
9f94f62768
fixed segfault
2025-04-12 19:08:27 +08:00
Concedo
7b4254bef9
not working on cpu
2025-04-12 18:55:29 +08:00
Concedo
c94aec1930
update workflows, update gemma default adapter sysprompt
2025-04-12 18:38:23 +08:00
Concedo
956ed89595
fixed build
2025-04-12 17:06:55 +08:00
Jeff Bolz
a4837577aa
vulkan: use aligned loads for flash attention mask ( #12853 )
...
Rewrite the stride logic for the mask tensor in the FA shader to force the
stride to be aligned, to allow using more efficient loads.
2025-04-12 10:44:48 +02:00
Concedo
6302709fbb
discourage but dont prevent vulkan FA (it's occasionally still useful)
2025-04-12 16:23:52 +08:00
Concedo
b42fa821d8
try allow build from commit hash
2025-04-12 13:37:10 +08:00
Matt Clayton
e59ea539b8
llava: Fix cpu-only clip image encoding sefault ( #12907 )
...
* llava: Fix cpu-only clip image encoding
* clip : no smart ptr for ggml_backend_t
* Fix for backend_ptr push_back
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-12 07:29:03 +02:00
Concedo
5908f2ca19
based on occam and henky advice, disabled flash attention entirely on vulkan.
2025-04-12 12:30:48 +08:00
Concedo
7a7bdeab6d
json to gbnf endpoint added
2025-04-12 11:41:11 +08:00
Concedo
7e1289ade8
fixes for sdcpp
2025-04-12 10:08:23 +08:00
Concedo
a0ae187563
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# examples/llava/CMakeLists.txt
# examples/llava/clip.cpp
# examples/rpc/rpc-server.cpp
# examples/run/run.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
2025-04-12 10:06:47 +08:00
Concedo
efef14bb82
added llama4 tags
2025-04-12 08:58:04 +08:00
Concedo
ea9bd61e47
Merge commit ' 64eda5deb9
' into concedo_experimental
...
# Conflicts:
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/musa.Dockerfile
# .devops/rocm.Dockerfile
# .devops/vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# README.md
# docs/backend/SYCL.md
# examples/llava/clip.cpp
# examples/server_embd.py
# ggml/src/ggml-cann/acl_tensor.cpp
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/ggml-cann.cpp
# src/CMakeLists.txt
# tests/test-chat-template.cpp
2025-04-12 08:31:22 +08:00
Georgi Gerganov
c94085df28
server : add VSCode's Github Copilot Chat support ( #12896 )
...
* server : add VSCode's Github Copilot Chat support
* cont : update handler name
2025-04-11 23:37:41 +03:00
yuri@FreeBSD
e8a62631b3
rpc : Set cache directory in rpc-server.cpp on FreeBSD ( #12903 )
2025-04-11 22:04:14 +02:00
Olivier Chafik
b6930ebc42
tool-call
: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900 )
...
* `tool-call`: don't call common_chat_params_init_hermes_2_pro when there aren't tools (or when there's a schema)
* test all chat formats w/o tools
2025-04-11 21:47:52 +02:00
yuri@FreeBSD
68b08f36d0
common : Define cache directory on FreeBSD ( #12892 )
2025-04-11 21:45:44 +02:00
Concedo
a56cc72bd0
added handling for remembering file paths, added gui option to disable zenity in GUI
2025-04-12 00:42:26 +08:00
henk717
f6b7fea979
zentk - folder select workaround ( #1478 )
...
* zentk - folder select workaround
* kcppt extention fix
2025-04-11 22:37:07 +08:00
Ewan Crawford
578754b315
sycl: Support sycl_ext_oneapi_limited_graph ( #12873 )
...
The current usage of the SYCL-Graph extension checks for
the `sycl_ext_oneapi_graph` device aspect. However, it is also
possible to support `sycl_ext_oneapi_limied_graph` devices that
don't support update
2025-04-11 15:32:14 +02:00
tastelikefeet
b2034c2b55
contrib: support modelscope community ( #12664 )
...
* support download from modelscope
* support login
* remove comments
* add arguments
* fix code
* fix win32
* test passed
* fix readme
* revert readme
* change to MODEL_ENDPOINT
* revert tail line
* fix readme
* refactor model endpoint
* remove blank line
* fix header
* fix as comments
* update comment
* update readme
---------
Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>
2025-04-11 14:01:56 +02:00
henk717
8fd70f37bd
Zentk integration (Zenity/yad support) ( #1475 )
...
* Zentk integration (Zenity/yad support)
* Escape incompatible dependencies in zentk
* Properly clean env
2025-04-11 18:23:23 +08:00
Yuxuan Zhang
06bb53ad9b
llama-model : add Glm4Model implementation for GLM-4-0414 ( #12867 )
...
* GLM-4-0414
* use original one
* Using with tensor map
* fix bug
* change order
* change order
* format with flask8
2025-04-11 12:10:10 +02:00
Xuan-Son Nguyen
0c50923944
clip : use smart pointer ( ⚠️ breaking change) ( #12869 )
...
* clip : use smart pointers
* fix warmup
* add forward declaration
* misisng include
* fix include (2)
* composite
* simplify batch ptr
* fix conflict
2025-04-11 12:09:39 +02:00
Akarshan Biswas
fccf9cae83
SYCL: Add fp16 type support to unary op kernels ( #12788 )
...
* SYCL: Add fp16 support to some elementwise OP kernels
* remove comment
ggml-ci
* Use static_cast directly
* remove not needed cast from tanh
* Use static cast and remove unneeded castings
* Adjust device_support_op for unary OPs
* Use cast_data and typed_data struct to deduplicate casting code
2025-04-11 16:03:50 +08:00
Daniel Han
ec6c09d0fa
convert : Llama4 RoPE fix ( #12889 )
2025-04-11 09:49:09 +02:00
R0CKSTAR
8ac9f5d765
ci : Replace freediskspace to free_disk_space in docker.yml ( #12861 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-11 09:26:17 +02:00
Daniel Bevenius
12e9158f25
xcf : add check for visionos build version ( #12854 )
...
This commit adds a check for the visionos build version used with vtool
in build-xcframework.sh. The script now checks the Xcode version and
determines whether to use "xros" or "visionos" for the build version.
This commit also uses xcrun for the vtool so that the version of vtool
in xcode command line tools is used instead of the one in the system
path.
Refs: https://github.com/ggml-org/whisper.cpp/pull/2994#issuecomment-2773292223
2025-04-11 09:24:34 +02:00
Xuan-Son Nguyen
5b1f13cb64
convert : proper tensor name mapping for llama4 ( #12870 )
...
* Llama-4 mapping
* remove hacky renaming
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2025-04-11 09:23:37 +02:00
Xuan-Son Nguyen
8b91d5355a
llama : correct rms norm for llama 4 ( #12882 )
2025-04-11 08:49:50 +02:00