koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-22 03:10:03 +00:00

Author	SHA1	Message	Date
Concedo	03def285db	updated colab	2025-01-23 00:13:55 +08:00
Concedo	0e74db7fd4	fixed another tts bug, clblast selection and quiet mode	2025-01-22 21:36:13 +08:00
kallewoof	1cb9805024	add autoguess adapter for DeepSeek V2.5/R1 (#1329 )	2025-01-22 20:39:04 +08:00
Concedo	d109d6d8eb	do another patch release for the new deepseek models	2025-01-21 08:24:48 +08:00
Concedo	5329df2bdf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # cmake/build-info.cmake # examples/run/CMakeLists.txt # examples/run/run.cpp # examples/simple-chat/simple-chat.cpp # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-sampling.cpp	2025-01-21 00:25:07 +08:00
Concedo	2c0239fcf2	exploration of alternative wavtokenizer	2025-01-20 23:02:50 +08:00
Georgi Gerganov	9f7add1cde	examples : fix add_special conditions (#11311 )	2025-01-20 16:36:08 +02:00
Christopher Nielsen	90d987b105	mmap: add include for cerrno (#11296 ) ggml-ci Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-20 16:02:43 +02:00
Michael Podvitskiy	a4251edd6f	cmake: fix shell command quoting in build-info script (#11309 )	2025-01-20 16:02:15 +02:00
Xuan Son Nguyen	ec7f3ac9ab	llama : add support for Deepseek-R1-Qwen distill model (#11310 ) * llama : add support for Deepseek-R1-Qwen distill model * coding style	2025-01-20 14:35:07 +01:00
Concedo	02d5bb5b05	allow smaller gguf	2025-01-20 16:20:52 +08:00
Concedo	80965bbdd7	rewritten gguf metadata reader from scratch, analyze works now	2025-01-20 15:57:03 +08:00
Georgi Gerganov	ef6dada60c	cont : fix whitespaces (#11305 )	2025-01-20 09:29:32 +02:00
Kyle Bruene	ae3c1db2f9	llama : re-add LLM_ARCH_PHIMOE (#11305 ) Phi 3.5 MoE was partially removed during a refactor. The code was originally in llama.cpp and should be in llama-model.cpp after the refactor.	2025-01-20 09:21:01 +02:00
Georgi Gerganov	92bc493917	tests : increase timeout when sanitizers are enabled (#11300 ) * tests : increase timeout when sanitizers are enabled * tests : add DEFAULT_HTTP_TIMEOUT	2025-01-19 20:22:30 +02:00
Georgi Gerganov	b9daaffe02	simple-chat : fix BOS being added to each message (#11278 )	2025-01-19 18:12:09 +02:00
Concedo	bf4a52383f	change of plans, we can't bundle numpy	2025-01-19 22:53:38 +08:00
Concedo	ff64c3060a	fixed misc lite bugs, tts parsing issues, klite connectivity process	2025-01-19 22:32:01 +08:00
Nicolò Scipione	99487b57d4	SYCL: Introducing memory host pool (#11251 ) * Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-01-19 21:33:34 +08:00
Concedo	57e8c1433b	updated lite	2025-01-19 17:34:15 +08:00
Concedo	5c9714cf40	improve whisper to work on 8 bit and 32bit wav too, also support form data for language	2025-01-19 16:57:41 +08:00
Concedo	fa7e661133	various fixes	2025-01-18 23:52:39 +08:00
Eric Curtin	a1649cc13f	Adding linenoise.cpp to llama-run (#11252 ) This is a fork of linenoise that is C++17 compatible. I intend on adding it to llama-run so we can do things like traverse prompt history via the up and down arrows: https://github.com/ericcurtin/linenoise.cpp Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-01-18 14:42:31 +00:00
Georgi Gerganov	4dd34ff831	cmake : add sanitizer flags for llama.cpp (#11279 ) * cmake : add sanitizer flags for llama.cpp ggml-ci * tests : fix compile warnings ggml-ci * cmake : move sanitizer flags to llama_add_compile_flags ggml-ci * cmake : move llama.cpp compile flags to top level lists ggml-ci * cmake : apply only sanitizer flags at top level ggml-ci * tests : fix gguf context use in same_tensor_data * gguf-test: tensor data comparison * dummy : trigger ggml-ci * unicode : silence gcc warnings ggml-ci * ci : use sanitizer builds only in Debug mode ggml-ci * cmake : add status messages [no ci] --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-01-18 16:18:15 +02:00
Xuan Son Nguyen	f30f099228	server : implement cancellable request (#11285 ) * server : implement cancellable request * fix typo * httplib 0.18.5 * fix i underflow	2025-01-18 14:12:05 +01:00
Georgi Gerganov	f26c874179	scripts : restore hf.sh (#11288 ) ggml-ci	2025-01-18 13:18:32 +02:00
LostRuins Concedo	6390a998bf	tts : add guide tokens support (#11186 ) * Added the ability to use guide tokens for OuteTTS, greatly improving TTS recitation accuracy over long input sequences. * applied linting suggestions, updated to latest llama_vocab changes, added a safety check, added newline to guide token start	2025-01-18 12:20:57 +02:00
Concedo	e90866fd46	always show tts gen time	2025-01-18 18:16:08 +08:00
Jeff Bolz	44e18ef939	vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281 ) Add code similar to mul_mm_cm2 to force alignment of strides, to avoid a performance regression. Add noncontiguous FA tests in test-backend-ops. Fixes #11268.	2025-01-18 09:26:50 +01:00
Concedo	65c5c77a16	fixed a tts parsing bug	2025-01-18 10:33:42 +08:00
Concedo	60308ed9dd	fix the ci (+1 squashed commits) Squashed commits: [b3d85833] fix ci	2025-01-18 01:06:10 +08:00
Concedo	96407502cd	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/llama-bench/llama-bench.cpp # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt # src/llama-vocab.cpp # tests/test-backend-ops.cpp	2025-01-17 23:13:50 +08:00
codezjx	3edfa7d375	llama.android: add field formatChat to control whether to parse special tokens when send message (#11270 )	2025-01-17 14:57:56 +02:00
Concedo	e8570de0e6	improved tts default voices quality and sample rate	2025-01-17 18:45:16 +08:00
Radoslav Gerganov	667d72846c	rpc : early register backend devices (#11262 ) Early register RPC devices and do not propagate RPC specifics in the llama model structures. ref: #10609	2025-01-17 10:57:09 +02:00
Georgi Gerganov	a133566d34	vocab : fix double-eos check (#11273 ) ggml-ci	2025-01-17 09:28:00 +02:00
David Renshaw	960ec65273	llama : fix deprecation message: vocabable -> vocab (#11269 )	2025-01-17 08:12:01 +01:00
Concedo	8d961bba29	all outetts 0.3 models working	2025-01-17 14:34:07 +08:00
musoles	7a689c415e	README : added kalavai to infrastructure list (#11216 )	2025-01-17 01:10:49 +01:00
Jeff Bolz	bd38ddea01	vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166 ) * vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl Shaders are based on cpy.cu. * vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32 * ggml: copy q->f32 assumes some contiguity in the destination	2025-01-16 22:47:10 +01:00
Jeff Bolz	466300fe14	vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206 ) Do masking on whole dwords, fetch all scales at once.	2025-01-16 22:23:49 +01:00
Jeff Bolz	206bc53422	vulkan: optimize coopmat2 q2_k dequant function (#11130 )	2025-01-16 22:16:39 +01:00
RunningLeon	4dbc8b9cb7	llama : add internlm3 support (#11233 ) * support internlm3 * fix lint	2025-01-16 20:10:38 +02:00
Concedo	828a01d805	wip outetts 0.3	2025-01-17 00:37:09 +08:00
Johannes Gäßler	9c8dcefe17	CUDA: backwards pass for misc. ops, add tests (#11257 ) * CUDA: backwards pass for misc. ops, add tests * remove restrict from pointers	2025-01-16 16:43:38 +01:00
Concedo	f0383c6f8d	added newline	2025-01-16 22:46:08 +08:00
Concedo	11cd7c7bb0	survived the storm, again	2025-01-16 22:25:18 +08:00
Concedo	2a00ee8fa8	broken commit	2025-01-16 21:41:18 +08:00
Xuan Son Nguyen	681149ced2	llama : add `llama_model_load_from_splits` (#11255 ) * llama : add `llama_model_load_from_splits` * update	2025-01-16 13:54:08 +01:00
fj-y-saito	c67cc9837d	ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227 ) * Add SVE support for q4_K_q8_K * Update ggml/src/ggml-cpu/ggml-cpu-quants.c change to use K_SCALE_SIZE Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-16 11:11:49 +02:00

1 2 3 4 5 ...

6736 commits