koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-13 02:19:41 +00:00

Author	SHA1	Message	Date
askmyteapot	8263fd7bdb	Update llama_v3.cpp (#393 ) Fixing C2065 compiler error. Missed '3' on 3 separate identifiers (kB > kB3, MB > MB3)	2023-08-23 22:15:48 +08:00
Concedo	af170fc2db	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # llama.cpp # scripts/sync-ggml.sh # tests/test-tokenizer-0.cpp	2023-08-23 17:08:09 +08:00
Concedo	981c9131f0	gguf for llama is working	2023-08-23 16:07:07 +08:00
Concedo	39cc83e8c9	incomplete merge, compiles but generates rubbish	2023-08-22 23:12:47 +08:00
Concedo	a07e6dd3ad	revert cuda changes as they are bugggy	2023-08-09 22:36:41 +08:00
Concedo	18bb0ab127	up ver, support 16k ctx	2023-08-04 21:47:17 +08:00
Concedo	ba2040d1df	compile fix for ARM NEON	2023-08-03 12:52:06 +08:00
Concedo	34e60be41a	compile fix	2023-08-03 10:36:14 +08:00
Concedo	c58ffc92e5	fixed compile error	2023-08-01 18:28:49 +08:00
Concedo	45456fa6ca	switch noavx2 to not use openblas, as it has incompatible instructions	2023-07-30 16:47:33 +08:00
Concedo	2807d98fd4	touchup (+2 squashed commit) Squashed commit: [8b06458] fixed broken param order [7eabdc0] very broken, do not use	2023-07-22 22:57:56 +08:00
Concedo	374fffb9c6	Reworking rope WIP	2023-07-19 00:54:41 +08:00
Concedo	523fc3be52	fixed rwkv, standardized new ctx usage	2023-07-10 20:05:53 +08:00
Concedo	2827920044	fix compile errors, rwkv not working	2023-07-10 18:23:25 +08:00
Concedo	27a0907cfa	backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas	2023-07-06 22:33:46 +08:00
Concedo	ca9a11697c	possibly slower, but cannot use larger batches without modifying ggml library.	2023-07-04 00:35:02 +08:00
Concedo	bfeb3471d7	fix typos	2023-07-03 21:36:42 +08:00
Concedo	3d2907d208	make gptneox and gptj work with extended context too	2023-07-02 18:28:09 +08:00
Concedo	ef3b8dc0d9	GPU accel for rwkv is slow, disable it	2023-07-02 00:41:46 +08:00
Concedo	e1a7042943	try out the new rwkv but it seems worse, may revert	2023-07-02 00:10:56 +08:00
Concedo	86469d15c4	fix for yr-rocm, large gpu scratch	2023-06-30 12:40:08 +08:00
Concedo	86b061b98c	wip on unified cublas integration, add all the small libraries but exclude the large ones	2023-06-29 18:35:31 +08:00
Concedo	c2f1ed6556	fix compile errors	2023-06-29 17:54:12 +08:00
Concedo	b4698abafc	Wip, CUDA porting malloc improvements, gpu accel for non-llama, backport old quants	2023-06-28 18:20:46 +08:00
Concedo	9527a783ea	fix rope inplace	2023-06-27 19:44:33 +08:00
Concedo	8342fe81b1	revert the wstring tokenization. coherency was affected	2023-06-24 12:58:49 +08:00
Concedo	0485fa65a2	wstring convert for mpt	2023-06-24 11:43:42 +08:00
Concedo	490cf395f8	better alloc error	2023-06-23 22:51:51 +08:00
Concedo	f39a746089	bug fixes for openblas	2023-06-23 22:45:22 +08:00
Concedo	43c2891afa	option to not use scratch	2023-06-23 19:01:36 +08:00
Concedo	d5e4cf7ffe	handle ctx manip	2023-06-23 19:01:15 +08:00
Concedo	df9135e3a9	fixing memory bugs	2023-06-23 18:41:23 +08:00
Concedo	e6ddb15c3a	cleanup	2023-06-22 10:38:27 +08:00
Concedo	1b71752a9f	Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX	2023-06-22 00:43:25 +08:00
Concedo	dfdd20240c	gpt j use scratch buffers	2023-06-21 16:10:31 +08:00
Concedo	8e2dc19dc6	updated tokenizer, added support for scratch buffers for neox and gpt2	2023-06-19 21:29:06 +08:00
Concedo	3ed3e7b7e2	reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models	2023-06-14 20:03:14 +08:00
Concedo	871009dfab	integrated world tokenizer for RWKV	2023-06-13 20:06:19 +08:00
Concedo	860fb026df	rwkv compile fix (+1 squashed commits) Squashed commits: [8b0ebb1] upgraded rwkv + added memory overheads + added state_out bufs	2023-06-12 23:04:40 +08:00
Concedo	c44b9c3ecf	added the llama_v2 cuda back (+2 squashed commit) Squashed commit: [1c97fd4] Revert "fix for cublas" This reverts commit `994be9a4db`. [fce03c3] Revert "fix for cublas" This reverts commit `33528f5b1d`.	2023-06-11 23:23:24 +08:00
Concedo	a6a0fa338a	cleanup indentation, fixing cublas build	2023-06-08 22:40:53 +08:00
Concedo	c046db5197	lite bugfixes, buffer size changes, fixed a topk bug.	2023-06-06 22:38:25 +08:00
Concedo	9270056269	fixed compile error in cmake VS	2023-06-05 11:48:04 +08:00
Concedo	9aa2d8535b	hide gpu input box when dropdown not selected, minor memory fix for neox and gptj	2023-06-04 21:47:17 +08:00
Concedo	20803c221e	cleaning up some old junk	2023-06-04 11:05:46 +08:00
Concedo	b62279cb39	buf size for starcoder still not good	2023-06-04 00:41:08 +08:00
Concedo	c1b293d31a	fixed MPT ooms	2023-06-03 18:37:13 +08:00
Concedo	6f82e17b7a	added MPT support	2023-06-03 16:14:08 +08:00
Concedo	234270bd83	back to 32 block size, not better	2023-06-01 00:14:22 +08:00
Concedo	446e42a8c6	change dmmv block size	2023-05-31 21:40:12 +08:00

1 2 3 4 5

229 commits