Concedo
03def285db
updated colab
2025-01-23 00:13:55 +08:00
Concedo
0e74db7fd4
fixed another tts bug, clblast selection and quiet mode
2025-01-22 21:36:13 +08:00
kallewoof
1cb9805024
add autoguess adapter for DeepSeek V2.5/R1 ( #1329 )
2025-01-22 20:39:04 +08:00
Concedo
d109d6d8eb
do another patch release for the new deepseek models
2025-01-21 08:24:48 +08:00
Concedo
5329df2bdf
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/server.yml
# CMakeLists.txt
# cmake/build-info.cmake
# examples/run/CMakeLists.txt
# examples/run/run.cpp
# examples/simple-chat/simple-chat.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-sampling.cpp
2025-01-21 00:25:07 +08:00
Concedo
2c0239fcf2
exploration of alternative wavtokenizer
2025-01-20 23:02:50 +08:00
Georgi Gerganov
9f7add1cde
examples : fix add_special conditions ( #11311 )
2025-01-20 16:36:08 +02:00
Christopher Nielsen
90d987b105
mmap: add include for cerrno ( #11296 )
...
ggml-ci
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-20 16:02:43 +02:00
Michael Podvitskiy
a4251edd6f
cmake: fix shell command quoting in build-info script ( #11309 )
2025-01-20 16:02:15 +02:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
Concedo
02d5bb5b05
allow smaller gguf
2025-01-20 16:20:52 +08:00
Concedo
80965bbdd7
rewritten gguf metadata reader from scratch, analyze works now
2025-01-20 15:57:03 +08:00
Georgi Gerganov
ef6dada60c
cont : fix whitespaces ( #11305 )
2025-01-20 09:29:32 +02:00
Kyle Bruene
ae3c1db2f9
llama : re-add LLM_ARCH_PHIMOE ( #11305 )
...
Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.
2025-01-20 09:21:01 +02:00
Georgi Gerganov
92bc493917
tests : increase timeout when sanitizers are enabled ( #11300 )
...
* tests : increase timeout when sanitizers are enabled
* tests : add DEFAULT_HTTP_TIMEOUT
2025-01-19 20:22:30 +02:00
Georgi Gerganov
b9daaffe02
simple-chat : fix BOS being added to each message ( #11278 )
2025-01-19 18:12:09 +02:00
Concedo
bf4a52383f
change of plans, we can't bundle numpy
2025-01-19 22:53:38 +08:00
Concedo
ff64c3060a
fixed misc lite bugs, tts parsing issues, klite connectivity process
2025-01-19 22:32:01 +08:00
Nicolò Scipione
99487b57d4
SYCL: Introducing memory host pool ( #11251 )
...
* Implement host pool for matrix_info
Creating a new memory pool on the host to store memory location for
matrix_info needed to launch gemm_batch from oneMKL/oneMath.
Removing complex support in gemm_batch since it is not used in llama.cpp
* Remove unnecessary headers and cast
* Reorder member variable to avoid warning on initialization
* Formatting
* Remove unused variable
* Address PR review feedback - remove warning
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-01-19 21:33:34 +08:00
Concedo
57e8c1433b
updated lite
2025-01-19 17:34:15 +08:00
Concedo
5c9714cf40
improve whisper to work on 8 bit and 32bit wav too, also support form data for language
2025-01-19 16:57:41 +08:00
Concedo
fa7e661133
various fixes
2025-01-18 23:52:39 +08:00
Eric Curtin
a1649cc13f
Adding linenoise.cpp to llama-run ( #11252 )
...
This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:
https://github.com/ericcurtin/linenoise.cpp
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-18 14:42:31 +00:00
Georgi Gerganov
4dd34ff831
cmake : add sanitizer flags for llama.cpp ( #11279 )
...
* cmake : add sanitizer flags for llama.cpp
ggml-ci
* tests : fix compile warnings
ggml-ci
* cmake : move sanitizer flags to llama_add_compile_flags
ggml-ci
* cmake : move llama.cpp compile flags to top level lists
ggml-ci
* cmake : apply only sanitizer flags at top level
ggml-ci
* tests : fix gguf context use in same_tensor_data
* gguf-test: tensor data comparison
* dummy : trigger ggml-ci
* unicode : silence gcc warnings
ggml-ci
* ci : use sanitizer builds only in Debug mode
ggml-ci
* cmake : add status messages [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-18 16:18:15 +02:00
Xuan Son Nguyen
f30f099228
server : implement cancellable request ( #11285 )
...
* server : implement cancellable request
* fix typo
* httplib 0.18.5
* fix i underflow
2025-01-18 14:12:05 +01:00
Georgi Gerganov
f26c874179
scripts : restore hf.sh ( #11288 )
...
ggml-ci
2025-01-18 13:18:32 +02:00
LostRuins Concedo
6390a998bf
tts : add guide tokens support ( #11186 )
...
* Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences.
* applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start
2025-01-18 12:20:57 +02:00
Concedo
e90866fd46
always show tts gen time
2025-01-18 18:16:08 +08:00
Jeff Bolz
44e18ef939
vulkan: fix coopmat2 flash attention for non-contiguous inputs ( #11281 )
...
Add code similar to mul_mm_cm2 to force alignment of strides, to avoid
a performance regression.
Add noncontiguous FA tests in test-backend-ops.
Fixes #11268 .
2025-01-18 09:26:50 +01:00
Concedo
65c5c77a16
fixed a tts parsing bug
2025-01-18 10:33:42 +08:00
Concedo
60308ed9dd
fix the ci (+1 squashed commits)
...
Squashed commits:
[b3d85833] fix ci
2025-01-18 01:06:10 +08:00
Concedo
96407502cd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama-bench/llama-bench.cpp
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
2025-01-17 23:13:50 +08:00
codezjx
3edfa7d375
llama.android: add field formatChat to control whether to parse special tokens when send message ( #11270 )
2025-01-17 14:57:56 +02:00
Concedo
e8570de0e6
improved tts default voices quality and sample rate
2025-01-17 18:45:16 +08:00
Radoslav Gerganov
667d72846c
rpc : early register backend devices ( #11262 )
...
Early register RPC devices and do not propagate RPC specifics in the
llama model structures.
ref: #10609
2025-01-17 10:57:09 +02:00
Georgi Gerganov
a133566d34
vocab : fix double-eos check ( #11273 )
...
ggml-ci
2025-01-17 09:28:00 +02:00
David Renshaw
960ec65273
llama : fix deprecation message: vocabable -> vocab ( #11269 )
2025-01-17 08:12:01 +01:00
Concedo
8d961bba29
all outetts 0.3 models working
2025-01-17 14:34:07 +08:00
musoles
7a689c415e
README : added kalavai to infrastructure list ( #11216 )
2025-01-17 01:10:49 +01:00
Jeff Bolz
bd38ddea01
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl ( #11166 )
...
* vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl
Shaders are based on cpy.cu.
* vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32
* ggml: copy q->f32 assumes some contiguity in the destination
2025-01-16 22:47:10 +01:00
Jeff Bolz
466300fe14
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. ( #11206 )
...
Do masking on whole dwords, fetch all scales at once.
2025-01-16 22:23:49 +01:00
Jeff Bolz
206bc53422
vulkan: optimize coopmat2 q2_k dequant function ( #11130 )
2025-01-16 22:16:39 +01:00
RunningLeon
4dbc8b9cb7
llama : add internlm3 support ( #11233 )
...
* support internlm3
* fix lint
2025-01-16 20:10:38 +02:00
Concedo
828a01d805
wip outetts 0.3
2025-01-17 00:37:09 +08:00
Johannes Gäßler
9c8dcefe17
CUDA: backwards pass for misc. ops, add tests ( #11257 )
...
* CUDA: backwards pass for misc. ops, add tests
* remove restrict from pointers
2025-01-16 16:43:38 +01:00
Concedo
f0383c6f8d
added newline
2025-01-16 22:46:08 +08:00
Concedo
11cd7c7bb0
survived the storm, again
2025-01-16 22:25:18 +08:00
Concedo
2a00ee8fa8
broken commit
2025-01-16 21:41:18 +08:00
Xuan Son Nguyen
681149ced2
llama : add llama_model_load_from_splits ( #11255 )
...
* llama : add `llama_model_load_from_splits`
* update
2025-01-16 13:54:08 +01:00
fj-y-saito
c67cc9837d
ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot ( #11227 )
...
* Add SVE support for q4_K_q8_K
* Update ggml/src/ggml-cpu/ggml-cpu-quants.c
change to use K_SCALE_SIZE
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-16 11:11:49 +02:00