koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-13 02:19:41 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	044ec4b2a5	embedding : add EOS token if not present (#899 )	2024-03-14 15:14:14 +02:00
Georgi Gerganov	77178eedc8	gguf-py : fix dtype check (#6045 )	2024-03-14 13:32:14 +02:00
Jian Liao	15a333260a	readme : improve readme for Llava-1.6 example (#6044 ) Co-authored-by: Jian Liao <jianliao@adobe.com>	2024-03-14 13:18:23 +02:00
Pierrick Hymbert	43241adf22	server: disable debug release type sanitizer, simplify trigger (#6047 ) - increase time out for server - do not fail fast	2024-03-14 13:15:39 +02:00
Georgi Gerganov	a44bc969e4	llama : fix typo	2024-03-14 13:13:06 +02:00
Michael Podvitskiy	2c4fb69246	llama : optimize defrag moves + fix fragmentation calculation (#6037 ) * attempt to reduce the impact of a worst-case scenario * fragmentation calculation fix * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-14 12:56:48 +02:00
Ondřej Čertík	3ca23481dd	gguf-py : add support for I8, I16 and I32 (#6045 ) * Refactor dtype handling to be extensible This code is equivalent as before, but now it is prepared to easily add more NumPy dtypes. * Add support for I8, I16 and I32 These types are allowed in the GGUF specification. * Add support for I8, I16 and I32 to gguf_writer * Add support for I8, I16, I32 to gguf_reader	2024-03-14 12:40:14 +02:00
Georgi Gerganov	3fe8d7a17f	ggml : designate enum vals for integer types (#6050 )	2024-03-14 12:38:37 +02:00
Georgi Gerganov	68265ebfc6	embedding : print all resulting embeddings (#899 )	2024-03-14 12:37:20 +02:00
Georgi Gerganov	381da2d9f0	metal : build metallib + fix embed path (#6015 ) * metal : build metallib + fix embed path ggml-ci * metal : fix embed build + update library load logic ggml-ci * metal : fix embeded library build ggml-ci * ci : fix iOS builds to use embedded library	2024-03-14 11:55:23 +02:00
Georgi Gerganov	0fd6c1f015	embedding : print cosine similarity (#899 )	2024-03-14 10:12:29 +02:00
Concedo	f3b7651102	added ignoremissing param	2024-03-14 13:46:42 +08:00
Concedo	ec5dea14d7	merged, try to fix metal build	2024-03-14 11:15:50 +08:00
Linwei Wang	19885d205e	readme : update details about running llama in Termux on Android (#6039 )	2024-03-13 20:34:40 +02:00
Georgi Gerganov	76a936c893	readme : update API changes and hot topics	2024-03-13 20:33:56 +02:00
Clint Herron	463628372d	grammar : handle missing "root" node (#6004 )	2024-03-13 20:10:40 +02:00
slaren	f30ea47a87	llama : add pipeline parallelism support (#6017 ) * llama : add pipeline parallelism support for batch processing with multiple CUDA GPUs ggml-ci * server : add -ub, --ubatch-size parameter * fix server embedding test * llama : fix Mamba inference for pipeline parallelism Tested to work correctly with both `main` and `parallel` examples. * llama : limit max batch size to n_batch * add LLAMA_SCHED_MAX_COPIES to configure the number of input copies for pipeline parallelism default increase to 4 (from 2) changing this value may improve performance for some systems, but increases memory usage * fix hip build * fix sycl build (disable cpy_tensor_async) * fix hip build * llama : limit n_batch and n_ubatch to n_ctx during context creation * llama : fix norm backend * batched-bench : sync after decode * swiftui : sync after decode * ggml : allow ggml_get_rows to use multiple threads if they are available * check n_ubatch >= n_tokens with non-casual attention * llama : do not limit n_batch to n_ctx with non-casual attn * server : construct batch with size of llama_n_batch * ggml_backend_cpu_graph_compute : fix return value when alloc fails * llama : better n_batch and n_ubatch comment * fix merge * small fix * reduce default n_batch to 2048 --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-13 18:54:21 +01:00
slaren	d8fd0ccf6a	test-backend-ops : skip CPU backend by default (#6028 )	2024-03-13 15:58:30 +02:00
Concedo	9f102b9db6	update makefile	2024-03-13 21:53:52 +08:00
AidanBeltonS	b3d978600f	Update get version (#6025 )	2024-03-13 18:47:54 +05:30
Xuan Son Nguyen	99b71c068f	Server: Use multi-task for embeddings endpoint (#6001 ) * use multitask for embd endpoint * specify types * remove redundant {"n_predict", 0}	2024-03-13 11:39:11 +01:00
Concedo	7a2de82c96	updated lite	2024-03-13 18:27:19 +08:00
Concedo	a9435163ab	fixed uploading non square images	2024-03-13 14:19:51 +08:00
Concedo	85287c7701	handle uploading non square images	2024-03-13 13:57:14 +08:00
Concedo	47c42fd45c	fix for mamba processing	2024-03-13 13:27:46 +08:00
Concedo	ba950716a9	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # Package.swift # README.md # build.zig # llama.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-llama.cpp	2024-03-13 11:21:58 +08:00
slaren	306d34be7a	ci : remove tidy-review (#6021 )	2024-03-12 17:55:19 +02:00
Concedo	edb05e761f	Update some prints	2024-03-12 21:40:36 +08:00
Concedo	88705cb89a	improve quiet mode for SD	2024-03-12 20:50:39 +08:00
Georgi Gerganov	8030da7afe	ggml : reuse quantum structs across backends (#5943 ) * ggml : reuse quant blocks across backends ggml-ci * ggml : define helper constants only for CUDA and SYCL ggml-ci * ggml : define helper quantum constants for SYCL ggml-ci	2024-03-12 14:27:20 +02:00
Concedo	60d234550b	fix colab	2024-03-12 20:09:49 +08:00
Georgi Gerganov	184215e783	ggml : fix UB in IQ2_S and IQ3_S (#6012 )	2024-03-12 13:49:55 +02:00
Concedo	6c6ad93f01	added basic support for password protection (+2 squashed commit) Squashed commit: [ff91ca72] added basic support for password protection [91b0b208] updated docs	2024-03-12 19:47:12 +08:00
Georgi Gerganov	48358b2e5b	sycl : update IQ1_S kernels (WIP - not working!) (#5995 ) * sycl : try to fix after IQ1_S changes * sycl : iq1s_grid -> iq1s_grid_gpu * sycl : fix grid type	2024-03-12 11:15:05 +02:00
Concedo	a69bc44e7a	edit colab (+1 squashed commits) Squashed commits: [c7ccb99d] update colab with llava	2024-03-12 15:24:53 +08:00
gliptic	5cdb371731	grammar : fix unnecessarily retained pointer to rules (#6003 )	2024-03-11 21:59:03 +02:00
Kawrakow	44ca159faf	1.5 bit: we can do even better (#5999 ) * iq1_s: we can do even better Spent one of the 4 scale bits on a signs of a 0.125 shift. I.e., quants are now -1 + delta, delta, 1 + delta, where delta is +/- 0.125. CUDA works, same performance as before. PPL(LLaMA-v2-7B) is now 11.85! * iq1_s: make scalar and AVX2 work with the new version * iq1_s: make Neon work with new version. ~10% drop in performance, so will need some more work. * iq1_s: make Metal work with new version * iq1_s: very slightly faster dequantize on Metal * iq1_s: fix dequantize on the CPU --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2024-03-11 17:53:15 +02:00
Georgi Gerganov	05b06210c9	llama : more consistent names of count variables (#5994 ) * llama : more consistent names of count variables ggml-ci * llama : n_parallel -> n_seq_max * common : fix param name * examples : fix param name	2024-03-11 17:49:47 +02:00
Georgi Gerganov	83796e62bc	llama : refactor unicode stuff (#5992 ) * llama : refactor unicode stuff ggml-ci * unicode : names * make : fix c++ compiler * unicode : names * unicode : straighten tables * zig : fix build * unicode : put nfd normalization behind API ggml-ci * swift : fix build * unicode : add BOM * unicode : add <cstdint> ggml-ci * unicode : pass as cpts as const ref	2024-03-11 17:47:47 +02:00
Concedo	6a32c14e86	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README-sycl.md # README.md # flake.lock # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/.gitignore # tests/test-backend-ops.cpp	2024-03-11 23:00:47 +08:00
Concedo	9229ea664e	if no existing filepath, do not use cwd, use last path instead	2024-03-11 22:19:38 +08:00
Stefan Kapusniak	4dd1c2b81a	Improve launcher file dialog initial paths (#740 ) - In the launcher, if an existing value is set for a file value (e.g. Model), use that file's directory the initial directory when the file dialog is opened with 'Browse'. - In the launcher always set the intial directory for 'Load' to cwd.	2024-03-11 22:05:46 +08:00
Concedo	95c8090967	updated lite	2024-03-11 21:59:18 +08:00
Concedo	227f59dab6	added a simple program to do quantization for clip models	2024-03-11 21:50:30 +08:00
Jakub N	828defefb6	Update server docker image URLs (#5997 )	2024-03-11 14:40:42 +01:00
Concedo	2dc647f892	updated lite (+1 squashed commits) Squashed commits: [f33ea44a] updated lite	2024-03-11 20:10:34 +08:00
Concedo	d59ec68753	added interrogate endpoint (+1 squashed commits) Squashed commits: [7bf96261] added interrogate endpoint	2024-03-11 18:50:18 +08:00
Xuan Son Nguyen	caa106d4e0	Server: format error to json (#5961 ) * server: format error to json * server: do not crash on grammar error * fix api key test case * revert limit max n_predict * small fix * correct coding style * update completion.js * launch_slot_with_task * update docs * update_slots * update webui * update readme	2024-03-11 10:56:41 +01:00
Concedo	e4946b96ea	support llava with gpt4v openai endpoint	2024-03-11 17:36:10 +08:00
Michael Podvitskiy	3202361c5b	ggml, ci : Windows ARM runner and build fixes (#5979 ) * windows arm ci * fix `error C2078: too many initializers` with ggml_vld1q_u32 macro for MSVC ARM64 * fix `warning C4146: unary minus operator applied to unsigned type, result still unsigned` * fix `error C2065: '__fp16': undeclared identifier`	2024-03-11 11:28:51 +02:00

... 9 10 11 12 13 ...

4371 commits