koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	a4249abe5d	alias noblas to usecpu	2024-09-15 21:25:48 +08:00
Concedo	53bf0fb32d	removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.	2024-09-15 19:21:52 +08:00
Concedo	5b658ab6d4	updated lite	2024-09-12 10:47:47 +08:00
Concedo	70cdb55cc9	Merge commit '`947538acb8`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # CMakePresets.json # examples/llama-bench/llama-bench.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2024-09-09 11:26:34 +08:00
Concedo	d777995991	able to handle kcpp protected model name endpoints	2024-09-04 16:26:28 +08:00
Concedo	5d34de0c08	fix basepath	2024-09-02 18:09:58 +08:00
Concedo	3c4fa57026	allow horde worker to work with password protected instances	2024-08-31 21:30:47 +08:00
Concedo	0f9968ef64	fixed some incorrect protocol prefix for localhost	2024-08-29 10:37:43 +08:00
Concedo	5f360f659c	Add 5m timeout for horde worker	2024-08-28 23:17:06 +08:00
Concedo	6acbf1d7f4	macos default to full offload when using gpulayers auto (-1)	2024-08-26 12:12:51 +08:00
Concedo	97aa8648ed	allow launching with no models loaded	2024-08-25 23:57:32 +08:00
Concedo	0b96097439	add version number into help page	2024-08-22 00:52:30 +08:00
Concedo	5bf527a6ae	added xtc sampler	2024-08-21 23:57:15 +08:00
Concedo	cd69ab218e	fixed DRY	2024-08-21 17:01:28 +08:00
Concedo	2cf6d16c40	adjust sleep time	2024-08-21 01:06:41 +08:00
Concedo	c1ae350e5b	fixed race condition when generating	2024-08-20 20:17:55 +08:00
Concedo	7ee359a59b	on multigpu setups, pick lowest free mem instead of highest for auto layers	2024-08-20 19:02:16 +08:00
Concedo	e9eb6fe51a	move chat compl to models tab	2024-08-18 14:56:10 +08:00
Concedo	e2e6d892b4	fix declaration order	2024-08-18 02:15:34 +08:00
Concedo	d71b5477c5	update lite, cleanup, fix interrogate format	2024-08-18 00:48:53 +08:00
Concedo	2c108ab17e	correct phrasing	2024-08-14 21:55:53 +08:00
Concedo	f4f24d0e14	small text change	2024-08-11 21:30:46 +08:00
Concedo	139ab3d198	generate passes whole object now	2024-08-11 00:08:13 +08:00
Concedo	da8a96199c	add a space between the bench prompt to fix an issue with old bpe tokenizer stack overflow (+1 squashed commits) Squashed commits: [44a689de] add a space between the bench prompt to fix an issue with old bpe tokenizer stack overflow	2024-08-10 19:35:56 +08:00
Concedo	86e687ae8b	updated lite, added promptlimit	2024-08-10 16:05:24 +08:00
Concedo	03adb90dc6	prompt command done	2024-08-07 20:52:28 +08:00
Concedo	853d57c53c	wip prompt	2024-08-06 21:54:08 +08:00
Concedo	6b8b50b350	try fix ipv6 (+1 squashed commits) Squashed commits: [8d95a639] try fix ipv6	2024-08-06 15:36:46 +08:00
Concedo	381b4a1844	default multiuser true	2024-08-05 20:03:29 +08:00
Concedo	bd4e55eb74	add used memory checks, add gpulayers for metal	2024-08-05 16:32:05 +08:00
Concedo	23caa63f94	up ver	2024-08-04 23:42:22 +08:00
Concedo	bfdf4b021f	adjust v4-v6 allocation, default back to localhost	2024-08-04 11:42:16 +08:00
Concedo	40481abf0c	allow ipv6 as well	2024-08-04 00:53:19 +08:00
Concedo	9a0976761e	use loopback ip instead of localhost	2024-08-03 00:41:32 +08:00
Concedo	6bf78967f9	more janky nonsense	2024-08-02 21:58:28 +08:00
Concedo	3a72410804	Added vulkan support for SD (+1 squashed commits) Squashed commits: [13f42f83] Added vulkan support for SD	2024-08-01 17:12:33 +08:00
Concedo	9a04060aaa	also apply even if tensor split is set	2024-07-30 23:01:50 +08:00
Concedo	2f04f848e1	if gpuid is specified, force specific order	2024-07-30 22:58:25 +08:00
Concedo	43c55bb7e2	hack to fix bad unicode fragments corrupting streamed output	2024-07-30 22:18:22 +08:00
Concedo	102eec3d22	more bugfixes in auto gpu layers selection	2024-07-29 20:38:24 +08:00
Llama	26f1df5e5f	Fix the penultimate token sometimes being lost with SSE streaming (#1031 ) The token immediately before an eot token was lost when SSE streaming was enabled if that token was contained entirely within a stop sequence. As an example of when this could happen, consider this prompt: Type the phrase 'pleas' once. In a Llama 3-derived model, 'pleas' tokenizes as 'ple' 'as'. The token 'as' is contained within this instruct mode stop sequence: <\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|> due to the word 'assistant'. Since `string_contains_sequence_substring` returns True for 'as', this token is added to `tokenReserve` instead of being streamed immediately. If the '<\|eot_id\|>' token was generated next, the text in `tokenReserve` would be discarded.	2024-07-29 20:16:47 +08:00
Concedo	948646ff7a	do not offload if auto layers is less than 2, as its usually slower	2024-07-29 20:13:43 +08:00
Concedo	e39b8aab8b	improvements to auto layer calcs	2024-07-29 18:51:10 +08:00
Concedo	f289fb494a	bump size of some payload arr sequences from 16 to 24	2024-07-28 20:29:39 +08:00
Concedo	01afb28a63	not working	2024-07-28 11:43:10 +08:00
Concedo	eaa702852d	increased padding, it is still way too little but whatever	2024-07-27 22:32:13 +08:00
Concedo	4531ab5465	refactor some fields	2024-07-27 00:04:29 +08:00
Concedo	9f2076b4b3	fix rocminfo error	2024-07-25 22:23:36 +08:00
Concedo	57a98ba308	fixed dict loading	2024-07-25 11:41:05 +08:00
Concedo	0024d9d682	fixed order of selection	2024-07-25 11:15:30 +08:00

... 7 8 9 10 11 ...

1054 commits