koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-11 17:44:38 +00:00

Author	SHA1	Message	Date
YellowRoseCx	8afa800fb6	Expose low_vram for CUDA Enabling --lowvram instructs the program to not allocate a VRAM scratch buffer for holding temporary results. Reduces VRAM usage at the cost of performance, particularly prompt processing speed. Requires CUDA	2023-06-26 16:47:22 -05:00
Concedo	8775dd99f4	various debug logging improvements	2023-06-18 15:24:58 +08:00
Concedo	66a3f4e421	added support for lora base	2023-06-10 19:29:45 +08:00
Concedo	43f7e40470	added extra endpoints for abort gen and polled streaming	2023-06-10 18:13:26 +08:00
SammCheese	e6231c3055	back to http.server, improved implementation	2023-06-09 12:17:55 +02:00
SammCheese	9a8da35ec4	working streaming. TODO: fix lite	2023-06-08 18:34:23 +02:00
SammCheese	97971291e9	draft: token streaming	2023-06-08 18:34:08 +02:00
Concedo	abfdfb702e	added top_a sampler	2023-05-27 17:32:37 +08:00
Concedo	466cd21368	test cmakefile for cublas.	2023-05-15 14:50:38 +08:00
Concedo	8a964e76c8	integrated mirostat as a launch parameter, works on all models	2023-05-06 00:47:17 +08:00
Concedo	851f55325a	Merge remote-tracking branch 'temp/concedo' into concedo_experimental	2023-05-05 23:55:53 +08:00
Concedo	2edbcebe27	added optional force versioning flag	2023-05-05 22:02:00 +08:00
Hendrik Langer	8131bc8b56	add new sampling algorithm mirostat	2023-05-05 13:23:47 +02:00
Concedo	4857739ab5	allow specifying a different thread count for GPU blas	2023-05-03 21:19:59 +08:00
Concedo	966cd2ce91	Merge remote-tracking branch 'temp/concedo' into concedo_experimental # Conflicts: # koboldcpp.py	2023-05-02 22:43:34 +08:00
Concedo	7afad2b9b5	integrated the new samplers	2023-04-29 19:41:41 +08:00
Concedo	e8a389f85b	updated kobold lite, added debug mode, changed streaming mode to now use the same url when launching	2023-04-28 11:41:03 +08:00
Concedo	3962eb39c7	added token unbanning	2023-04-24 21:50:20 +08:00
Concedo	6e908c1792	added lora support	2023-04-22 12:29:38 +08:00
Concedo	c200b674f4	updated kobold lite, work on rwkv, added exe path to model load params, added launch parameter	2023-04-18 17:36:44 +08:00
Concedo	525184930d	added a kobold API compatible implementation of stopping sequences	2023-04-16 18:37:49 +08:00
Concedo	ad5676810a	merge CLBlast improvements - GPU dequant	2023-04-16 01:17:40 +08:00
Concedo	adb4df78d6	Added SmartContext mode, a way of prompt context manipulation that avoids frequent context recalculation.	2023-04-14 21:24:16 +08:00
Concedo	23c675b2e6	integrated optional (experimentl) CLBlast support	2023-04-11 23:33:44 +08:00
Concedo	f53238f570	Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed.	2023-04-10 12:00:34 +08:00
Concedo	085a9f90a7	still refactoring	2023-04-01 11:56:34 +08:00
Concedo	6b86f5ea22	halfway refactoring, wip adding other model types	2023-04-01 01:13:05 +08:00

27 commits