Commit graph

6645 commits

Author SHA1 Message Date
Concedo
96432965c4 add m_pi define if not found 2025-01-14 00:35:04 +08:00
Concedo
4d92b4e98e updated readme and colab 2025-01-14 00:31:52 +08:00
Concedo
e77d566268 some tweaks and cleanup 2025-01-13 23:50:54 +08:00
Concedo
636beac6d2 added a nicer built in voice 2025-01-13 23:26:54 +08:00
Concedo
62e33d0bf7 added support for seeded tts voices 2025-01-13 19:11:34 +08:00
Concedo
b3de1598e7 Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
tts is functional (+6 squashed commit)

Squashed commit:

[22396311] wip tts

[3a883027] tts not yet working

[0dcfab0e] fix silly bug

[a378d9ef] some long overdue cleanup

[fc5a6fb5] Wip tts

[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Concedo
12cdcf0abe improved browser opening 2025-01-11 22:53:43 +08:00
Concedo
07173e84a0 add ability to use guide tokens for TTS, ref: https://github.com/ggerganov/llama.cpp/pull/11186 2025-01-11 17:18:21 +08:00
Concedo
bd38665e1f some cleanup before starting on TTS 2025-01-10 22:13:44 +08:00
Concedo
93b2bebc2f add more options for context size 2025-01-10 19:08:42 +08:00
Concedo
b154bd3671 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	docs/build.md
#	docs/development/HOWTO-add-model.md
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
2025-01-10 17:57:38 +08:00
Concedo
0305841dd5 added a gguf file analyzer 2025-01-10 16:27:48 +08:00
0cc4m
c3f9d25706
Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161)
* Vulkan: Remove float16 use in shaders

* Fix validation error about subgroup_size_control extension
2025-01-10 06:39:33 +01:00
Molly Sophia
ee7136c6d1
llama: add support for QRWKV6 model architecture (#11001)
llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix some typos

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix cuda warning

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
Akarshan Biswas
c6860cc734
SYCL: Refactor ggml_sycl_compute_forward (#11121)
* SYCL: refactor ggml_sycl_compute_forward

* SYCL: add back GGML_USED(dst) to ggml_sycl_cpy

* SYCL: add function name to noop debug

* SYCL: Some device info print refactoring and add details of XMX availability
2025-01-10 08:13:03 +08:00
Concedo
91b6e29af3 added multilingual support for whisper 2025-01-09 23:28:52 +08:00
Concedo
0cb599546e increase max supported llava images to 8 2025-01-09 22:12:06 +08:00
Tei Home
1204f97270
doc: add cuda guide for fedora (#11135)
Since NVIDIA does not release CUDA for in-maintenance versions of Fedora, the process of setting up the CUDA toolkit on Fedora has become quite involved. This guide should help mere mortals install CUDA for development in a Fedora 39 toolbox environment, without affecting the host system.
2025-01-09 11:32:06 +00:00
Daniel Bevenius
8eceb888d7
server : add tooltips to settings and themes btn (#11154)
* server : add tooltips to settings and themes btn

This commit adds tooltips to the settings and themes buttons in the
webui. The tooltip will be displayed below the actual buttons when
hovered over.

The motivation for this change is to clarify the purpose of the themes
button.

* squash! server : add tooltips to settings and themes btn

This commit adds a tooltip to the '...' button when a chat has been
started. The tooltip is "Chat options" which think could be a good
description as the dropdown contains options to delete or download the
current chat.

* rm tooltip for 3 dots button

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-09 11:28:29 +01:00
Pierrick Hymbert
f8feb4b01a
model: Add support for PhiMoE arch (#11003)
* model: support phimoe

* python linter

* doc: minor

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

* doc: minor

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

* doc: add phimoe as supported model

ggml-ci

---------

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
2025-01-09 11:21:41 +01:00
Concedo
d3d7dae82b prevent crash if we ever want to build without coopmat 2025-01-09 17:31:38 +08:00
Georgi Gerganov
be0e950c91
media : remove old img [no ci] 2025-01-09 11:15:15 +02:00
Xuan Son Nguyen
d9feae1c06
llama-chat : add phi 4 template (#11148) 2025-01-09 10:07:33 +01:00
Concedo
8fb7a1c21e updated cmakelists 2025-01-09 16:52:31 +08:00
Concedo
1b49dc305f Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	.github/workflows/editorconfig.yml
#	examples/run/run.cpp
#	examples/server/README.md
#	scripts/sync-ggml.last
2025-01-09 16:50:29 +08:00
Concedo
5cce8a5fbc define coopmat or it will segfault 2025-01-09 16:38:21 +08:00
Concedo
e788b8289a You'll never take us alive
We swore that death will do us part
They'll call our crimes a work of art
2025-01-09 11:27:06 +08:00
hydai
8d59d91171
fix: add missing msg in static_assert (#11143)
Signed-off-by: hydai <z54981220@gmail.com>
2025-01-08 20:03:28 +00:00
Vinesh Janarthanan
8a1d9c25fa
gguf-py : move scripts directory (#11116)
* Moved scripts dir and fixed pyproject.toml

* updated readme

* fixed README urls

* bump pypi gguf to v0.14.0

* retrigger ci

* empty commit - trigger ci
2025-01-08 20:54:58 +02:00
Eric Curtin
1bf839b1e8
Enhance user input handling for llama-run (#11138)
The main motivation for this change is it was not handing
ctrl-c/ctrl-d correctly. Modify `read_user_input` to handle EOF,
"/bye" command, and empty input cases. Introduce `get_user_input`
function to manage user input loop and handle different return
cases.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-08 18:47:05 +00:00
Concedo
dcfa1eca4e Merge commit '017cc5f446' into concedo_experimental
# Conflicts:
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/019-bug-misc.yml
#	CODEOWNERS
#	examples/batched-bench/batched-bench.cpp
#	examples/batched/batched.cpp
#	examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
#	examples/gritlm/gritlm.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/passkey/passkey.cpp
#	examples/quantize-stats/quantize-stats.cpp
#	examples/run/run.cpp
#	examples/simple-chat/simple-chat.cpp
#	examples/simple/simple.cpp
#	examples/tokenize/tokenize.cpp
#	ggml/CMakeLists.txt
#	ggml/src/ggml-metal/CMakeLists.txt
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-autorelease.cpp
#	tests/test-model-load-cancel.cpp
#	tests/test-tokenizer-0.cpp
#	tests/test-tokenizer-1-bpe.cpp
#	tests/test-tokenizer-1-spm.cpp
2025-01-08 23:15:21 +08:00
Xuan Son Nguyen
f7cd13301c
ci : use actions from ggml-org (#11140) 2025-01-08 16:09:20 +01:00
Xuan Son Nguyen
4d2b3d8804
lora : improve compat with mergekit-extract-lora (#11131)
* (wip) support mergekit-extracted lora

* support mergekit-extract-lora

* use lora->get_scale

* correct comment

* correct norm name & condition

* add some hints
2025-01-08 15:59:53 +01:00
Concedo
3732bb2686 taesd now supports flux and sd3 2025-01-08 22:35:50 +08:00
Georgi Gerganov
c07d437bbd
llama : avoid hardcoded QK_K (#11061)
ggml-ci
2025-01-08 16:19:36 +02:00
Georgi Gerganov
99a3755a3c
sync : ggml 2025-01-08 13:40:30 +02:00
Radoslav Gerganov
c792dcf488
ggml : allow loading backend with env variable (ggml/1059)
ref: #1058
2025-01-08 13:40:18 +02:00
Xuan Son Nguyen
80ccf5d725
ci : pin dependency to specific version (#11137)
* ci : pin dependency to specific version

* will this fix ec?
2025-01-08 12:07:20 +01:00
Georgi Gerganov
a3c1232c3f
arg : option to exclude arguments from specific examples (#11136)
* arg : option to exclude arguments from specific examples

ggml-ci

* readme : remove old args [no ci]
2025-01-08 12:55:36 +02:00
amritahs-ibm
8cef75c743
llamafile : ppc64le MMA INT8 implementation (#10912)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le using MMA
builtins for quantised int8 datatype.

This change results in 10% - 70% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2025-01-08 12:54:19 +02:00
Georgi Gerganov
0d52a69e4b
ci : fix cmake option (#11125) 2025-01-08 11:29:34 +02:00
Mathieu Baudier
02f0430141
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117)
* Disable GL_KHR_cooperative_matrix Vulkan extension if not available.

* Perform Vulkan extensions checks in a more sensible order

* Remove unnecessary #ifdef directive
2025-01-08 09:18:13 +01:00
ag2s20150909
bec2183f2c
fix: Vulkan shader gen binary path when Cross-compiling (#11096)
* fix: Vulkan shader gen binary path when cross compiling
2025-01-08 09:17:29 +01:00
Concedo
c73d99ccac updated lite 2025-01-08 13:35:59 +08:00
Concedo
568e476997 added toggle for vae tiling, use custom memory buffer 2025-01-08 13:12:03 +08:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes (#11030)
* GGUF: C++ refactor, backend support, misc fixes

remove ggml_tensor.backend

update CODEOWNERS [no ci]

remove gguf_get_data from API

revise GGUF API data types
2025-01-07 18:01:58 +01:00
Diego Devesa
017cc5f446
ggml-backend : only offload from host buffers (fix) (#11124) 2025-01-07 16:11:57 +01:00
Concedo
d752846116 fixed ask save file 2025-01-07 22:11:15 +08:00
Concedo
bb2e739627 fixed simplercflags 2025-01-07 21:34:38 +08:00
Diego Devesa
a3d50bc022
ggml-backend : only offload from host buffers (#11120) 2025-01-07 12:38:05 +01:00