Commit graph

6714 commits

Author SHA1 Message Date
Concedo
80965bbdd7 rewritten gguf metadata reader from scratch, analyze works now 2025-01-20 15:57:03 +08:00
Concedo
bf4a52383f change of plans, we can't bundle numpy 2025-01-19 22:53:38 +08:00
Concedo
ff64c3060a fixed misc lite bugs, tts parsing issues, klite connectivity process 2025-01-19 22:32:01 +08:00
Concedo
57e8c1433b updated lite 2025-01-19 17:34:15 +08:00
Concedo
5c9714cf40 improve whisper to work on 8 bit and 32bit wav too, also support form data for language 2025-01-19 16:57:41 +08:00
Concedo
fa7e661133 various fixes 2025-01-18 23:52:39 +08:00
Concedo
e90866fd46 always show tts gen time 2025-01-18 18:16:08 +08:00
Concedo
65c5c77a16 fixed a tts parsing bug 2025-01-18 10:33:42 +08:00
Concedo
60308ed9dd fix the ci (+1 squashed commits)
Squashed commits:

[b3d85833] fix ci
2025-01-18 01:06:10 +08:00
Concedo
96407502cd Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	examples/llama-bench/llama-bench.cpp
#	examples/llama.android/llama/src/main/cpp/llama-android.cpp
#	examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt
#	src/llama-vocab.cpp
#	tests/test-backend-ops.cpp
2025-01-17 23:13:50 +08:00
codezjx
3edfa7d375
llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) 2025-01-17 14:57:56 +02:00
Concedo
e8570de0e6 improved tts default voices quality and sample rate 2025-01-17 18:45:16 +08:00
Radoslav Gerganov
667d72846c
rpc : early register backend devices (#11262)
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.

ref: #10609
2025-01-17 10:57:09 +02:00
Georgi Gerganov
a133566d34
vocab : fix double-eos check (#11273)
ggml-ci
2025-01-17 09:28:00 +02:00
David Renshaw
960ec65273
llama : fix deprecation message: vocabable -> vocab (#11269) 2025-01-17 08:12:01 +01:00
Concedo
8d961bba29 all outetts 0.3 models working 2025-01-17 14:34:07 +08:00
musoles
7a689c415e
README : added kalavai to infrastructure list (#11216) 2025-01-17 01:10:49 +01:00
Jeff Bolz
bd38ddea01
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166)
* vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl

Shaders are based on cpy.cu.

* vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32

* ggml: copy q->f32 assumes some contiguity in the destination
2025-01-16 22:47:10 +01:00
Jeff Bolz
466300fe14
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206)
Do masking on whole dwords, fetch all scales at once.
2025-01-16 22:23:49 +01:00
Jeff Bolz
206bc53422
vulkan: optimize coopmat2 q2_k dequant function (#11130) 2025-01-16 22:16:39 +01:00
RunningLeon
4dbc8b9cb7
llama : add internlm3 support (#11233)
* support internlm3

* fix lint
2025-01-16 20:10:38 +02:00
Concedo
828a01d805 wip outetts 0.3 2025-01-17 00:37:09 +08:00
Johannes Gäßler
9c8dcefe17
CUDA: backwards pass for misc. ops, add tests (#11257)
* CUDA: backwards pass for misc. ops, add tests

* remove restrict from pointers
2025-01-16 16:43:38 +01:00
Concedo
f0383c6f8d added newline 2025-01-16 22:46:08 +08:00
Concedo
11cd7c7bb0 survived the storm, again 2025-01-16 22:25:18 +08:00
Concedo
2a00ee8fa8 broken commit 2025-01-16 21:41:18 +08:00
Xuan Son Nguyen
681149ced2
llama : add llama_model_load_from_splits (#11255)
* llama : add `llama_model_load_from_splits`

* update
2025-01-16 13:54:08 +01:00
fj-y-saito
c67cc9837d
ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227)
* Add SVE support for q4_K_q8_K

* Update ggml/src/ggml-cpu/ggml-cpu-quants.c

change to use K_SCALE_SIZE

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-16 11:11:49 +02:00
Concedo
f5cf054335 Merge commit 'c05e8c9934' into concedo_experimental 2025-01-16 15:20:43 +08:00
Concedo
8e3cad1aa2 added audio caching, as a hacky fix for ST TTS bug 2025-01-16 12:04:58 +08:00
Eve
adc5dd92e8
vulkan: scale caching for k quants + misc fixes (#11081)
* q6_k scale caching

* 16 bit unpack

* q4_k test (slow)

* revert it

* q3_k

* q2_k

* little stuff

* try precalculating products of a and q2_k scales

* Revert "try precalculating products of a and q2_k scales"

This reverts commit 65110b81f23f66331a50c6e889a7c1ab9470a86b.

* unpack should be u16, add vim swap to gitignore (about time)

* better q4_k scales

* q5_k

* better q6_k with separate paths for all threads and partial threads in use, plus some more optimizations

* q2_k better dequant

* q3_k optimizations

* q3_k use hmask simd from cpu avx version

* make the caches happy

* q3_k separate out calculation

* q2_k separate out

* little stuff

* use calc_superblock everywhere

* q2_k optimize scale calculation

* more barriers
2025-01-15 19:50:13 +00:00
Georgi Gerganov
f11cfdfd7f
ci : use -no-cnv in gguf-split tests (#11254)
* ci : use -no-cnv in gguf-split tests

ggml-ci

* ci : use -no-cnv in requantize tests

ggml-ci

* scripts : fix [no ci]
2025-01-15 18:28:35 +02:00
Concedo
f8a9634aa2 better xtts and oai speech (+1 squashed commits)
Squashed commits:

[34b9c15f] better xtts and oai speech
2025-01-16 00:26:21 +08:00
Junil Kim
1d8504338e
fix: ggml: fix vulkan-shaders-gen build (#10448)
* fix: ggml: fix vulkan-shaders-gen build

The vulkan-shaders-gen target was not being built correctly
in case of cross-compilation.
Other outputs need to be built for the cross compile target,
but vulkan-shaders-gen needs to be built for the host.

* refactor: ggml: Improve vulkan-shaders-gen toolchain setup

- Add GGML_SHADERS_GEN_TOOLCHAIN CMake option.
- Auto-detect host toolchain if not set.

* refactor: ggml: Improve vulkan-shaders-gen toolchain setup

Use configure_file to generate host_toolchain.cmake from template

* fix: ggml: Fix compile error

Fix compile error not finding vulkan-shaders-gen

* fix: vulkan-shaders-gen build and path handling

Fix build issues with vulkan-shaders-gen:
- Add target dependency for correct build order
- Use CMAKE_HOST_SYSTEM_NAME for executable suffix
- Fix MSVC output directory in host toolchain
- Normalize path handling for cross-compilation

* fix: improve host compiler detection in vulkan shader build

Improve host compiler detection for vulkan shader generation:
- Add NO_CMAKE_FIND_ROOT_PATH to all compiler searches
- Consolidate compiler detection logic
- Fix Windows-specific MSVC detection
- Ensure correct compiler search in cross-compilation

* refactor: Simplify CMake function for detecting host compiler

Simplified the CMake function to improve the process of detecting the host compiler.

* fix: Remove unnecessary Vulkan library linkage in CMakeLists.txt

Since `vulkan-shader-gen.cpp` only requires the `glslc` executable
and not the Vulkan headers or libraries, CMakeLists.txt needs to
be corrected.
(See: ecc93d0558)

* refactor: Rename host_toolchain.cmake.in

- Rename host_toolchain.cmake.in to cmake/host-toolchain.cmake.in

* refactor: GGML_VULKAN_SHADERS_GEN_TOOLCHAIN

Rename the macro GGML_SHADERS_GEN_TOOLCHAIN to GGML_VULKAN_SHADERS_GEN_TOOLCHAIN
2025-01-15 14:17:42 +01:00
Johannes Gäßler
432df2d5f9
RoPE: fix back, CUDA support for back + noncont. (#11240)
* RoPE: fix back, CUDA support for back + noncont.

* fix comments reg. non-cont. RoPE support [no-ci]
2025-01-15 12:51:37 +01:00
Concedo
70ba616ecc browser launch 2025-01-15 17:41:14 +08:00
Daniel Bevenius
0ccd7f3eb2
examples : add embd_to_audio to tts-outetts.py [no ci] (#11235)
This commit contains a suggestion for adding the missing embd_to_audio
function from tts.cpp to tts-outetts.py. This introduces a depencency
numpy which I was not sure if that is acceptable or not (only PyTorch
was mentioned in referened PR).

Also the README has been updated with instructions to run the example
with llama-server and the python script.

Refs: https://github.com/ggerganov/llama.cpp/pull/10784#issuecomment-2548377734
2025-01-15 05:44:38 +01:00
Akarshan Biswas
f446c2cf6a
SYCL: Add gated linear attention kernel (#11175)
* SYCL: Add Gated Linear attention kernel

* glahpp: add a space at the end of file

* gla: Put the barrier inside the main logic loop
2025-01-15 11:20:17 +08:00
Concedo
e07de2ea92 try fix webbrowser again 2025-01-15 00:53:24 +08:00
Concedo
fec3246ca9 make mmap no longer default, archive class.py 2025-01-15 00:38:03 +08:00
Concedo
ed9f7a38ae add some built in voices 2025-01-15 00:17:17 +08:00
Xuan Son Nguyen
b4d92a59a2
ci : add -no-cnv for tests (#11238) 2025-01-14 16:42:23 +02:00
Concedo
0a6ccda203 better fallback browser support 2025-01-14 18:59:17 +08:00
Georgi Gerganov
bbf3e55e35
vocab : add dummy tokens for "no_vocab" type (#11231)
* vocab : add dummy tokens for "no_vocab" type

ggml-ci

* vocab : minor [no ci]
2025-01-14 11:54:58 +01:00
ebraminio
c5bf0d1bd7
server : Improve code snippets direction between RTL text (#11221) 2025-01-14 11:39:33 +01:00
Olivier Chafik
091592d758
Refactor test-chat-template.cpp (#11224)
* Refactor test-chat-template

* Update test-chat-template.cpp
2025-01-14 10:16:41 +00:00
Georgi Gerganov
44d1e796d0
sync : ggml 2025-01-14 10:39:42 +02:00
Georgi Gerganov
a4f3f5d8e6
scripts : sync gguf (cont) 2025-01-14 09:40:52 +02:00
Georgi Gerganov
48e1ae0e61
scripts : sync gguf 2025-01-14 09:36:58 +02:00
Georgi Gerganov
d00a80e89d
scripts : sync opencl 2025-01-14 09:19:58 +02:00