koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	47c42fd45c	fix for mamba processing	2024-03-13 13:27:46 +08:00
Concedo	484d90c330	llava support is now fully functioning	2024-03-11 15:55:32 +08:00
Concedo	d943c739a8	wip submitting of llava image to backend	2024-03-10 17:14:27 +08:00
Concedo	c08d7e5042	wip integration of llava	2024-03-10 11:18:47 +08:00
Concedo	7c64845dea	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/nix/sif.nix # .github/workflows/build.yml # .github/workflows/python-check-requirements.yml # README-sycl.md # README.md # flake.lock # flake.nix # requirements/requirements-convert-hf-to-gguf.txt # scripts/compare-llama-bench.py	2024-03-04 15:33:33 +08:00
Concedo	2d9a90b652	try to fix ci compile errors (+1 squashed commits) Squashed commits: [d0d49663] fixed log multiline (+1 squashed commits) Squashed commits: [81a8befe] try to fix linux build error (+1 squashed commits) Squashed commits: [22850dda] try to fix build (+1 squashed commits) Squashed commits: [b8294611] missing type	2024-03-01 23:38:15 +08:00
Concedo	55af5446ad	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # ci/run.sh # llama.cpp # scripts/sync-ggml.last	2024-03-01 17:41:37 +08:00
Concedo	524ba12abd	refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr	2024-02-29 14:02:20 +08:00
Concedo	f75e479db0	WIP on sdcpp integration	2024-02-29 00:40:07 +08:00
Concedo	ad638285de	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md # flake.lock # ggml-cuda.cu # llama.cpp # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2024-02-28 13:41:35 +08:00
Concedo	d47e13c892	fixed compile error: GGML_BACKEND_TYPE_GPU (+1 squashed commits) Squashed commits: [00ca282a] fixed compile error: LLAMA_SPLIT_MODE_ROW	2024-02-26 10:55:35 +08:00
Concedo	b5ba6c9ece	test to see if Ofast for ggml library plus batching adjustments fixes speed regression for ggmlv1 models	2024-02-25 21:14:53 +08:00
Concedo	6d6d79f359	fixed a horrible bug in thread counts	2024-02-22 23:57:40 +08:00
Concedo	8d5e25008f	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # ci/run.sh # tests/test-tokenizer-0-falcon.cpp # tests/test-tokenizer-0-llama.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-llama.cpp	2024-02-17 15:22:05 +08:00
Concedo	066e73d769	context shift even more lenient	2024-02-11 18:30:38 +08:00
Concedo	590af480ab	contextshift more forgiving	2024-02-10 20:49:21 +08:00
Concedo	35111ce01a	row split mode is now a toggle	2024-02-09 18:35:58 +08:00
Concedo	992eea71d7	fixes for vulkan multigpu	2024-02-09 14:42:27 +08:00
Concedo	fe424a5466	tensor split active text	2024-02-09 12:02:23 +08:00
Concedo	4cd571db89	vulkan multigpu, show uptime	2024-02-08 16:54:38 +08:00
Concedo	35c32fd0f2	refactor some old code with batching	2024-02-05 15:54:45 +08:00
Alexander Abushady	4cb956c7db	Quadratic Sampling UI (#652 ) * Quadratic Sampling UI Kalomaze's Quadratic Sampling, now has a UI within KCPP. * remove debug prints * cleanup, add smooth sampler to dynatemp --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-02-04 16:26:27 +08:00
Concedo	2b02cd75c7	reformat debug logging	2024-02-01 23:20:51 +08:00
Concedo	340fbbbb04	show warning if genamt >= ctxsize, show t/s values	2024-01-31 18:51:42 +08:00
Concedo	13dcf4b556	print seed	2024-01-31 14:42:47 +08:00
Concedo	21ab727e83	change split mode to rows	2024-01-30 22:30:08 +08:00
Concedo	ed09a854f0	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .gitignore # CMakeLists.txt # Makefile # README.md # ci/run.sh # ggml-opencl.cpp # tests/CMakeLists.txt	2024-01-27 11:45:07 +08:00
Concedo	762eeb6204	triage for opencl	2024-01-27 11:09:43 +08:00
Concedo	d9a7bd577a	gpu layer offloading disabled for phi models in clblast	2024-01-25 17:40:05 +08:00
Concedo	08236ccc97	better abort handling, added support for dynatemp exponent	2024-01-23 16:56:12 +08:00
Concedo	5ff53507c4	fixed compile issues for cublas	2024-01-21 14:23:48 +08:00
Concedo	5639c1a520	units (+2 squashed commit) Squashed commit: [166979d9] units coversion [038dd5d4] get rid of all warnings (+1 squashed commits) Squashed commits: [6efd1e1b] get rid of all warnings	2024-01-20 23:53:21 +08:00
Concedo	db14de5c32	fossilize ggml library ver 3, to support ggjtv3	2024-01-20 10:49:25 +08:00
kalomaze	123bff9a0f	Full DynaTemp implementation + UI (#600 ) * move Dynatemp changes to new branch * fix float header * Properly reintroduce variable expert count Controllable through experts.txt * first pass at DynaTemp UI Checkbox partial implemented, Min and Max Temp implemented * DynaTemp UI Checkbox Trigger DynaTemp on checkbox * DynaTemp UI checkbox edition Hell Yeah! DynaTemp! * Remove greedy dynatemp * Fix race condition caused by debug print * Fixed broken presets and miro Fixes broken presets and mirostat * Remove debug function + HHI temp Also removed unnecessary softmax double precision * Fix whitespace (?) for generate function * epic upstream renaming scheme fix * fix stupid indents * Other cleanup Reintroduce unused rep pen function, move temp functions first before entropy dynamic temp * Slight indent fix * revert batch pyinstaller maker to mainline and also delete experts.txt since adjustable routing is also being removed for the PR * compact dynatemp into a single value dynatemp_range. This is a float which represents the allowed deviation from the min and max temperature when using dynatemp. Thus, if we want a value of dynatemp_min=0.3, dynatemp_max=0.5, then we would simply set temperature=0.4 and dynatemp_range=0.1. Functionally dynatemp would operate the same, but it would simplify usage and make it a single easy to adjust value. --------- Co-authored-by: Alexander Abushady <aabushady214@gmail.com> Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-01-06 11:13:16 +08:00
Concedo	e49d398f73	use same struct size for cuda and non cuda (+1 squashed commits) Squashed commits: [6eee8e2f] use same struct size for cuda and non cuda	2024-01-03 16:05:54 +08:00
Concedo	94e68fe474	added field to show recent seed	2024-01-02 15:35:04 +08:00
Concedo	5e59112de8	prevent other calls when uninitialized	2023-12-28 12:04:53 +08:00
Concedo	2d5d82e915	addlocate gpt_params on heap instead to avoid rare segfault	2023-12-28 11:48:21 +08:00
DebuggingLife46	e733a9e425	Add logit_bias to the OpenAI api (#577 ) * Add logit_bias to the OpenAI api * Cleanup and refactor, test in swagger. --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2023-12-27 00:26:19 +08:00
Concedo	8823e8b06d	added presence penalty into lite ui	2023-12-23 10:39:40 +08:00
Concedo	77463e0e9c	batch size improvements	2023-12-22 15:27:40 +08:00
Concedo	3f863eed72	add presence penalty	2023-12-19 23:18:56 +08:00
Concedo	7469f202ea	use lowvram flag for offload qkv	2023-12-08 18:16:14 +08:00
Concedo	ec21fa7712	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .gitignore # CMakeLists.txt # Makefile # Package.swift # README.md # ggml-cuda.cu # llama.cpp # llama.h # scripts/sync-ggml.sh # tests/CMakeLists.txt	2023-12-08 17:42:26 +08:00
Concedo	c7511526a2	noscript mode is done	2023-12-07 00:52:25 +08:00
Concedo	6570a2005b	token count includes ids	2023-12-03 15:44:53 +08:00
Concedo	c142c5634a	fixed segfault with clblast by reversing commit in issue https://github.com/ggerganov/llama.cpp/issues/4296	2023-12-03 00:56:00 +08:00
Concedo	12f66eaa1d	adjust fragmentation fix	2023-12-02 15:59:08 +08:00
Concedo	a012342a77	updated docs, shifted kv extra space to be subtracted from user's ctx value instead of added on load.	2023-11-30 14:19:40 +08:00
Concedo	ba5c33319b	Allocate a small amount of extra context for GGUF to deal with KV fragmentation causing issues in some scenarios.	2023-11-28 20:55:14 +08:00

1 2 3 4 5

225 commits