koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	886f4eed79	updated lite, up ver, remove bell	2023-08-10 22:01:33 +08:00
Concedo	6659652c9f	lower actual temp used when temp=0	2023-08-07 11:05:06 +08:00
Concedo	bcfdd0e662	fixed bbs -1 and allow bbs = 2048	2023-08-06 17:47:05 +08:00
Concedo	18bb0ab127	up ver, support 16k ctx	2023-08-04 21:47:17 +08:00
Concedo	46682e5cb3	added mmq launch flag	2023-08-01 17:57:13 +08:00
Concedo	e221843147	trying out mmq Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # README.md	2023-07-31 22:51:15 +08:00
Concedo	c7136f03d9	added support for tensor_split parameter as an advanced parameter.	2023-07-24 17:16:19 +08:00
Concedo	280abaf029	added stop reason in the perf endpoint	2023-07-24 11:55:35 +08:00
Concedo	910744e2c0	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md # flake.nix # llama.cpp	2023-07-23 22:37:38 +08:00
Ycros	56995caa48	Fix mirostatv2. (#338 )	2023-07-23 09:52:03 +08:00
Concedo	39dc1a46c4	added token count, updated lite	2023-07-20 14:41:06 +08:00
Concedo	e9467f5a44	auto rope scale adjustments, added sched yield fix for apple, adjust warning for mirostat	2023-07-19 16:44:44 +08:00
Concedo	374fffb9c6	Reworking rope WIP	2023-07-19 00:54:41 +08:00
Concedo	a286776435	updated lite	2023-07-11 21:48:01 +08:00
Concedo	1d1111e10f	expose timing info in web api	2023-07-11 18:56:06 +08:00
Concedo	7222877069	Merge remote-tracking branch 'ren/concedo' into concedo_experimental	2023-07-11 18:45:36 +08:00
Concedo	4be167915a	added linear rope option, added warning for bad samplers	2023-07-11 18:08:19 +08:00
Concedo	2827920044	fix compile errors, rwkv not working	2023-07-10 18:23:25 +08:00
callMeMakerRen	4e46673f80	Merge branch 'LostRuins:concedo' into concedo	2023-07-08 09:33:26 +08:00
shutup	1727e652f1	expose some useful info that can be used in statistics of performence	2023-07-07 11:52:58 +08:00
Concedo	8424a35c62	added the ability to ban any substring tokens	2023-07-06 23:24:21 +08:00
Concedo	27a0907cfa	backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas	2023-07-06 22:33:46 +08:00
Concedo	fff705d4f6	Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental	2023-07-04 18:42:02 +08:00
Concedo	c6c0afdf18	refactor to avoid code duplication	2023-07-04 18:35:54 +08:00
Concedo	784628a2be	Merge remote-tracking branch 'ycros/improve-sampler-api-access' into concedo_experimental	2023-07-04 16:38:32 +08:00
Concedo	ca9a11697c	possibly slower, but cannot use larger batches without modifying ggml library.	2023-07-04 00:35:02 +08:00
Ycros	309534dcd0	implement sampler order, expose sampler order and mirostat in api	2023-07-02 18:15:34 +00:00
Concedo	ef3b8dc0d9	GPU accel for rwkv is slow, disable it	2023-07-02 00:41:46 +08:00
Concedo	e1a7042943	try out the new rwkv but it seems worse, may revert	2023-07-02 00:10:56 +08:00
YellowRoseCx	8afa800fb6	Expose low_vram for CUDA Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA	2023-06-26 16:47:22 -05:00
Concedo	d2034ced7b	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # build.zig # flake.nix # tests/test-grad0.c # tests/test-sampling.cpp # tests/test-tokenizer-0.cpp	2023-06-25 17:01:15 +08:00
Concedo	0485fa65a2	wstring convert for mpt	2023-06-24 11:43:42 +08:00
Concedo	f39a746089	bug fixes for openblas	2023-06-23 22:45:22 +08:00
Concedo	43c2891afa	option to not use scratch	2023-06-23 19:01:36 +08:00
Concedo	df9135e3a9	fixing memory bugs	2023-06-23 18:41:23 +08:00
Concedo	1b71752a9f	Implemented basic GPU offloading for MPT, GPT-2, GPT-J and GPT-NeoX	2023-06-22 00:43:25 +08:00
Concedo	537ff22ec9	fixed a bug with token timings, updated lite	2023-06-20 20:41:42 +08:00
Concedo	8e2dc19dc6	updated tokenizer, added support for scratch buffers for neox and gpt2	2023-06-19 21:29:06 +08:00
Concedo	b08b371983	allow hordeconfig to set a max ctx length too.	2023-06-18 16:42:32 +08:00
Concedo	8775dd99f4	various debug logging improvements	2023-06-18 15:24:58 +08:00
Concedo	8bc4143e14	Merge branch 'concedo' into concedo_experimental	2023-06-17 22:29:38 +08:00
YellowRoseCx	971fe9f007	add tokens per second output (#246 ) * add tokens per second output * Update gpttype_adapter.cpp simplify --------- Co-authored-by: LostRuins <39025047+LostRuins@users.noreply.github.com>	2023-06-17 19:54:29 +08:00
Concedo	0971f83bca	added eos token id handling for starcoder models, as they use a different EOS ID	2023-06-15 22:57:14 +08:00
Concedo	3ed3e7b7e2	reverted sequence mode for rwkv due to multiple issues with speed loss with bigger quantized models	2023-06-14 20:03:14 +08:00
Concedo	82cf97ce92	hotfix for rwkv	2023-06-13 23:38:41 +08:00
Concedo	871009dfab	integrated world tokenizer for RWKV	2023-06-13 20:06:19 +08:00
Concedo	9b6c35b651	rwkv speed enhancements (batch processing), fixed a rwkv token processing bug	2023-06-13 16:02:12 +08:00
Concedo	66a3f4e421	added support for lora base	2023-06-10 19:29:45 +08:00
Concedo	43f7e40470	added extra endpoints for abort gen and polled streaming	2023-06-10 18:13:26 +08:00
Concedo	b92f9fe3a2	Merge remote-tracking branch 'sammcheese/sammcheese/tokenstreaming' into concedo_experimental	2023-06-09 20:41:02 +08:00

1 2 3

123 commits