koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-11 01:24:36 +00:00

Author	SHA1	Message	Date
Concedo	53bf0fb32d	removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.	2024-09-15 19:21:52 +08:00
Concedo	5b658ab6d4	updated lite	2024-09-12 10:47:47 +08:00
Concedo	70cdb55cc9	Merge commit '`947538acb8`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # CMakePresets.json # examples/llama-bench/llama-bench.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2024-09-09 11:26:34 +08:00
Concedo	d777995991	able to handle kcpp protected model name endpoints	2024-09-04 16:26:28 +08:00
Concedo	5d34de0c08	fix basepath	2024-09-02 18:09:58 +08:00
Concedo	3c4fa57026	allow horde worker to work with password protected instances	2024-08-31 21:30:47 +08:00
Concedo	0f9968ef64	fixed some incorrect protocol prefix for localhost	2024-08-29 10:37:43 +08:00
Concedo	5f360f659c	Add 5m timeout for horde worker	2024-08-28 23:17:06 +08:00
Concedo	6acbf1d7f4	macos default to full offload when using gpulayers auto (-1)	2024-08-26 12:12:51 +08:00
Concedo	97aa8648ed	allow launching with no models loaded	2024-08-25 23:57:32 +08:00
Concedo	0b96097439	add version number into help page	2024-08-22 00:52:30 +08:00
Concedo	5bf527a6ae	added xtc sampler	2024-08-21 23:57:15 +08:00
Concedo	cd69ab218e	fixed DRY	2024-08-21 17:01:28 +08:00
Concedo	2cf6d16c40	adjust sleep time	2024-08-21 01:06:41 +08:00
Concedo	c1ae350e5b	fixed race condition when generating	2024-08-20 20:17:55 +08:00
Concedo	7ee359a59b	on multigpu setups, pick lowest free mem instead of highest for auto layers	2024-08-20 19:02:16 +08:00
Concedo	e9eb6fe51a	move chat compl to models tab	2024-08-18 14:56:10 +08:00
Concedo	e2e6d892b4	fix declaration order	2024-08-18 02:15:34 +08:00
Concedo	d71b5477c5	update lite, cleanup, fix interrogate format	2024-08-18 00:48:53 +08:00
Concedo	2c108ab17e	correct phrasing	2024-08-14 21:55:53 +08:00
Concedo	f4f24d0e14	small text change	2024-08-11 21:30:46 +08:00
Concedo	139ab3d198	generate passes whole object now	2024-08-11 00:08:13 +08:00
Concedo	da8a96199c	add a space between the bench prompt to fix an issue with old bpe tokenizer stack overflow (+1 squashed commits) Squashed commits: [44a689de] add a space between the bench prompt to fix an issue with old bpe tokenizer stack overflow	2024-08-10 19:35:56 +08:00
Concedo	86e687ae8b	updated lite, added promptlimit	2024-08-10 16:05:24 +08:00
Concedo	03adb90dc6	prompt command done	2024-08-07 20:52:28 +08:00
Concedo	853d57c53c	wip prompt	2024-08-06 21:54:08 +08:00
Concedo	6b8b50b350	try fix ipv6 (+1 squashed commits) Squashed commits: [8d95a639] try fix ipv6	2024-08-06 15:36:46 +08:00
Concedo	381b4a1844	default multiuser true	2024-08-05 20:03:29 +08:00
Concedo	bd4e55eb74	add used memory checks, add gpulayers for metal	2024-08-05 16:32:05 +08:00
Concedo	23caa63f94	up ver	2024-08-04 23:42:22 +08:00
Concedo	bfdf4b021f	adjust v4-v6 allocation, default back to localhost	2024-08-04 11:42:16 +08:00
Concedo	40481abf0c	allow ipv6 as well	2024-08-04 00:53:19 +08:00
Concedo	9a0976761e	use loopback ip instead of localhost	2024-08-03 00:41:32 +08:00
Concedo	6bf78967f9	more janky nonsense	2024-08-02 21:58:28 +08:00
Concedo	3a72410804	Added vulkan support for SD (+1 squashed commits) Squashed commits: [13f42f83] Added vulkan support for SD	2024-08-01 17:12:33 +08:00
Concedo	9a04060aaa	also apply even if tensor split is set	2024-07-30 23:01:50 +08:00
Concedo	2f04f848e1	if gpuid is specified, force specific order	2024-07-30 22:58:25 +08:00
Concedo	43c55bb7e2	hack to fix bad unicode fragments corrupting streamed output	2024-07-30 22:18:22 +08:00
Concedo	102eec3d22	more bugfixes in auto gpu layers selection	2024-07-29 20:38:24 +08:00
Llama	26f1df5e5f	Fix the penultimate token sometimes being lost with SSE streaming (#1031 ) The token immediately before an eot token was lost when SSE streaming was enabled if that token was contained entirely within a stop sequence. As an example of when this could happen, consider this prompt: Type the phrase 'pleas' once. In a Llama 3-derived model, 'pleas' tokenizes as 'ple' 'as'. The token 'as' is contained within this instruct mode stop sequence: <\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|> due to the word 'assistant'. Since `string_contains_sequence_substring` returns True for 'as', this token is added to `tokenReserve` instead of being streamed immediately. If the '<\|eot_id\|>' token was generated next, the text in `tokenReserve` would be discarded.	2024-07-29 20:16:47 +08:00
Concedo	948646ff7a	do not offload if auto layers is less than 2, as its usually slower	2024-07-29 20:13:43 +08:00
Concedo	e39b8aab8b	improvements to auto layer calcs	2024-07-29 18:51:10 +08:00
Concedo	f289fb494a	bump size of some payload arr sequences from 16 to 24	2024-07-28 20:29:39 +08:00
Concedo	01afb28a63	not working	2024-07-28 11:43:10 +08:00
Concedo	eaa702852d	increased padding, it is still way too little but whatever	2024-07-27 22:32:13 +08:00
Concedo	4531ab5465	refactor some fields	2024-07-27 00:04:29 +08:00
Concedo	9f2076b4b3	fix rocminfo error	2024-07-25 22:23:36 +08:00
Concedo	57a98ba308	fixed dict loading	2024-07-25 11:41:05 +08:00
Concedo	0024d9d682	fixed order of selection	2024-07-25 11:15:30 +08:00
Concedo	d1f7832d21	adjusted layer estimation	2024-07-24 22:51:02 +08:00

1 2 3 4 5 ...

653 commits