koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-12 01:54:37 +00:00

Author	SHA1	Message	Date
Concedo	c61fa9155d	handle oversized images by downscaling	2024-08-26 13:58:18 +08:00
Concedo	6acbf1d7f4	macos default to full offload when using gpulayers auto (-1)	2024-08-26 12:12:51 +08:00
Herman Semenov	93bc3839f9	common: fixed not working find argument --n-gpu-layers-draft (#9175 )	2024-08-26 00:54:37 +02:00
Johannes Gäßler	f91fc5639b	CUDA: fix Gemma 2 numerical issues for FA (#9166 )	2024-08-25 22:11:48 +02:00
Concedo	97aa8648ed	allow launching with no models loaded	2024-08-25 23:57:32 +08:00
Concedo	efb8be013e	fixed swagger	2024-08-25 23:29:55 +08:00
Concedo	7bc87e1f0f	added llava letterboxing feature	2024-08-25 23:15:38 +08:00
Johannes Gäßler	e11bd856d5	CPU/CUDA: Gemma 2 FlashAttention support (#8542 ) * CPU/CUDA: Gemma 2 FlashAttention support * apply logit_softcap to scale in kernel * disable logit softcapping tests on Metal * remove metal check	2024-08-24 21:34:59 +02:00
João Dinis Ferreira	8f824ffe8e	quantize : fix typo in usage help of `quantize.cpp` (#9145 )	2024-08-24 09:22:45 +03:00
Xuan Son Nguyen	3ba780e2a8	lora : fix llama conversion script with ROPE_FREQS (#9117 )	2024-08-23 12:58:53 +02:00
piDack	a07c32ea54	llama : use F32 precision in GLM4 attention and no FA (#9130 )	2024-08-23 10:27:17 +03:00
Concedo	cca3c4c78b	xtc fixes	2024-08-22 23:18:46 +08:00
Akarshan Biswas	11b84eb457	[SYCL] Add a space to supress a cmake warning (#9133 )	2024-08-22 22:09:47 +08:00
luoyu-intel	1731d4238f	[SYCL] Add oneDNN primitive support (#9091 ) * add onednn * add sycl_f16 * add dnnl stream * add engine map * use dnnl for intel only * use fp16fp16fp16 * update doc	2024-08-22 12:50:10 +08:00
compilade	a1631e53f6	llama : simplify Mamba with advanced batch splits (#8526 ) * llama : advanced batch splits This includes equal-sequence-length batch splits which are useful to simplify recurrent model operators. * llama : always make recurrent state slots contiguous * ggml : simplify mamba operators * llama : fix integer signedness mixing * llama : logits_all has priority over batch->logits Otherwise, the server embeddings tests failed. This was likely an existing problem but was only detected here because of an additional assertion. * llama : apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * llama : fix t5 segfault * llama : fix Mamba session save and restore * llama : minor cosmetic changes * llama : rename llama_reorder_outputs to llama_output_reorder Also move it closer to llama_output_reserve. * llama : fix pooled embeddings when using batches with equal_seqs * minor : add struct members for clarity ggml-ci * llama : fix T5 segfault again * llama : fix Mamba pooled embeddings with multiple sequences Until the pooled embeddings are refactored to allow splitting across ubatches for causal embeddings, recurrent models can only process a single sequence per ubatch when calculating pooled embeddings. * llama : add llama_model_is_recurrent to simplify figuring that out This will make it easier to more cleanly support RWKV-v6 and Mamba-2. * llama : fix simple splits when the batch contains embeddings --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-21 17:58:11 -04:00
Concedo	0b96097439	add version number into help page	2024-08-22 00:52:30 +08:00
Concedo	fc2545dc83	fixed a typo	2024-08-22 00:25:56 +08:00
Concedo	5bf527a6ae	added xtc sampler	2024-08-21 23:57:15 +08:00
Concedo	1a7ecd55e6	timing for init step, clip for vulkan	2024-08-21 18:14:53 +08:00
Concedo	6200b6d64e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .gitignore # README.md # docs/build.md # flake.lock # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp	2024-08-21 17:17:36 +08:00
Xuan Son Nguyen	fc54ef0d1c	server : support reading arguments from environment variables (#9105 ) * server : support reading arguments from environment variables * add -fa and -dt * readme : specify non-arg env var	2024-08-21 11:04:34 +02:00
Concedo	cd69ab218e	fixed DRY	2024-08-21 17:01:28 +08:00
Younes Belkada	b40eb84895	llama : support for `falcon-mamba` architecture (#9074 ) * feat: initial support for llama.cpp * fix: lint * refactor: better refactor * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * fix: address comments * Update convert_hf_to_gguf.py Co-authored-by: compilade <git@compilade.net> * fix: add more cleanup and harmonization * fix: lint * Update gguf-py/gguf/gguf_writer.py Co-authored-by: compilade <git@compilade.net> * fix: change name * Apply suggestions from code review Co-authored-by: compilade <git@compilade.net> * add in operator * fix: add `dt_b_c_rms` in `llm_load_print_meta` * fix: correct printf format for bool * fix: correct print format * Update src/llama.cpp Co-authored-by: compilade <git@compilade.net> * llama : quantize more Mamba tensors * llama : use f16 as the fallback of fallback quant types --------- Co-authored-by: compilade <git@compilade.net>	2024-08-21 11:06:36 +03:00
fairydreaming	f63f603c87	llava : zero-initialize clip_ctx structure fields with aggregate initialization 908) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-08-21 09:45:49 +02:00
Daniel Bevenius	8455340b87	llama : std::move llm_bigram_bpe from work_queue (#9062 ) * llama : std::move llm_bigram_bpe from work_queue This commit updates the retrieval of llm_bigram_bpe objects from work_queue.top() by using std::move. The motivation for this is to avoid the copying of the std::string `text` member of the llm_bigram_bpe struct. * squash! llama : std::move llm_bigram_bpe from work_queue Introduced a MovablePriorityQueue class to allow moving elements out of the priority queue for llm_bigram_bpe. * squash! llama : std::move llm_bigram_bpe from work_queue Rename MovablePriorityQueue to lama_priority_queue. * squash! llama : std::move llm_bigram_bpe from work_queue Rename lama_priority_queue -> llama_priority_queue.	2024-08-21 10:32:58 +03:00
Changyeon Kim	2f3c1466ff	llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. (#8984 ) * llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. - The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available. - A GGML_OP_ACC shader has been added. - The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * fix-up coding style. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * Fix-up the missing initial parameter to resolve the compilation warning. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * [fix] Add missing parameters. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * [fix] Use nb1 and nb2 for dst. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * Fix check results ggml_acc call --------- Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> Co-authored-by: 0cc4m <picard12@live.de>	2024-08-20 21:00:00 +02:00
Concedo	2cf6d16c40	adjust sleep time	2024-08-21 01:06:41 +08:00
Concedo	6a4becb731	dry is still buggy because token indexes are wrong	2024-08-21 00:59:26 +08:00
Meng, Hengyu	50addec9a5	[SYCL] fallback mmvq (#9088 ) * fallback mmvq to mul_mat * mmvq in cuda path * Update ggml/src/ggml-sycl.cpp Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com> --------- Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>	2024-08-20 23:50:17 +08:00
zhentaoyu	4f8d19ff17	[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims (#9052 ) * sycl: fix im2col overflow and sync with cuda Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix convert overflow Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix convert and dequantize Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: fix ib in dmmv Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl:refine convert Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * sycl: move downsample global_range into common Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: add im2col and convert test cases Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: make new cases only in sycl Signed-off-by: zhentaoyu <zhentao.yu@intel.com> * test: comment new test_cases for only local testing Signed-off-by: zhentaoyu <zhentao.yu@intel.com> --------- Signed-off-by: zhentaoyu <zhentao.yu@intel.com>	2024-08-20 23:06:51 +08:00
Concedo	db6ef8d1e1	revert dry state reset	2024-08-20 22:22:21 +08:00
Concedo	c1ae350e5b	fixed race condition when generating	2024-08-20 20:17:55 +08:00
Concedo	7ee359a59b	on multigpu setups, pick lowest free mem instead of highest for auto layers	2024-08-20 19:02:16 +08:00
fairydreaming	90db8146d5	tests : add missing comma in grammar integration tests (#9099 ) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2024-08-20 12:09:55 +03:00
wangshuai09	cfac111e2b	cann: add doc for cann backend (#8867 ) Co-authored-by: xuedinge233 <damow890@gmail.com> Co-authored-by: hipudding <huafengchun@gmail.com>	2024-08-19 16:46:38 +08:00
Radoslav Gerganov	1b6ff90ff8	rpc : print error message when failed to connect endpoint (#9042 )	2024-08-19 10:11:45 +03:00
Radoslav Gerganov	18eaf29f4c	rpc : prevent crashes on invalid input (#9040 ) Add more checks which prevent RPC server from crashing if invalid input is received from client	2024-08-19 10:10:21 +03:00
Concedo	3bd70d75ea	fix segfault, kcpp is now debuggable	2024-08-19 13:50:49 +08:00
Concedo	1fbf21eec4	Revert "fix out of bounds access" This reverts commit `3ac183633a`.	2024-08-19 13:06:39 +08:00
Concedo	3ac183633a	fix out of bounds access	2024-08-19 00:30:54 +08:00
Georgi Gerganov	554b049068	flake.lock: Update (#9068 )	2024-08-18 07:43:32 -07:00
Concedo	04166d20a4	better quant clip	2024-08-18 22:15:59 +08:00
Concedo	b3b00750b7	update lite	2024-08-18 18:23:21 +08:00
ltoniazzi	2339a0be1c	tests : add integration test for lora adapters (#8957 ) * Add printing to check weights match torch version * minor code style changes --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-08-18 11:58:04 +02:00
Concedo	e9eb6fe51a	move chat compl to models tab	2024-08-18 14:56:10 +08:00
Concedo	314a620e96	added readme for macos	2024-08-18 13:11:49 +08:00
Concedo	06476b8247	Merge branch 'upstream' into concedo_experimental	2024-08-18 12:11:14 +08:00
Concedo	98dff80b9c	update lite	2024-08-18 12:00:06 +08:00
Concedo	e2e6d892b4	fix declaration order	2024-08-18 02:15:34 +08:00
Concedo	d71b5477c5	update lite, cleanup, fix interrogate format	2024-08-18 00:48:53 +08:00

... 9 10 11 12 13 ...

6004 commits