koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	d0a6a52855	hide flash attention in quick launch for vulkan, updated lite	2024-10-24 22:00:09 +08:00
Concedo	6da5a63852	fix for uploaded wav files being incomplete due to fragmentation when converting to b64	2024-10-20 17:47:19 +08:00
Concedo	a9dbcdd3ec	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/build.md # examples/infill/infill.cpp # examples/main/README.md # examples/server/README.md # flake.lock # scripts/sync-ggml.last # src/llama.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-sampling.cpp	2024-10-17 16:36:02 +08:00
Maya	8bb220329c	Dynamic sizes for sequences (#1157 ) * Dynamic sizes for sequences * cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields * adjust anti abuse limits --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-10-16 23:55:11 +08:00
JR	8e5ffc5a58	Add header X-Accel-Buffering set to no for SSE stream requests (#1168 )	2024-10-16 17:28:05 +08:00
Concedo	21b2f6168e	Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental	2024-10-14 22:09:38 +08:00
Concedo	1d40303050	increase again	2024-10-14 22:09:26 +08:00
YellowRoseCx	f029de6e46	Merge pull request #69 from matoro/main (#1165 ) Fix gpulayers autodetection for cublas & clblast backends	2024-10-14 20:10:41 +08:00
Concedo	8d81519ca3	direct user to gguf model resources	2024-10-12 18:39:21 +08:00
Concedo	5ad826b82a	updated lite (+2 squashed commit) Squashed commit: [31a99e1f] bump baned phrase a bit more again [c999736b] small fix	2024-10-11 11:05:04 +08:00
Maya	3dab63887f	Add custom_token_bans (#1153 )	2024-10-10 23:45:07 +08:00
Concedo	a3b104a422	further increase some limits	2024-10-10 22:27:28 +08:00
Concedo	d75cbd671d	alias banned_tokens with banned_strings from ST increase max bans to 32 for now	2024-10-10 21:52:46 +08:00
Concedo	fe5479f286	unify antislop and token bans	2024-10-10 18:21:07 +08:00
Concedo	a6bf568fda	prevent GUI settings from being overridden	2024-10-10 11:46:57 +08:00
Concedo	65f3c68399	wip antislop	2024-10-07 20:19:22 +08:00
Concedo	3e8bb10e2d	wip on rewind function	2024-10-06 16:21:03 +08:00
Concedo	d9fcb94472	do not suppress stdout if debugmode	2024-10-04 16:04:29 +08:00
Concedo	a785a91e56	every request has timestamp	2024-09-27 22:10:41 +08:00
Concedo	ea55f69dc1	Merge branch 'upstream' into concedo_experimental # Conflicts: # .dockerignore # .github/workflows/build.yml # .github/workflows/docker.yml # Makefile # README.md # examples/infill/infill.cpp # examples/perplexity/perplexity.cpp # examples/server/README.md # examples/speculative/speculative.cpp # flake.lock # ggml/src/CMakeLists.txt # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-sampling.cpp	2024-09-27 11:21:28 +08:00
Concedo	5a4bc89c8d	quiet mode on perf endpoint	2024-09-22 13:03:02 +08:00
Concedo	c38d1ecc8d	update templates, fix rwkv	2024-09-22 01:32:12 +08:00
Concedo	229108f877	fixed incorrect auto gpu settings, fixed clblast not working	2024-09-21 17:59:52 +08:00
Concedo	004a35b16d	add mutually exclusive group	2024-09-21 15:46:55 +08:00
Concedo	4b6a12e9c0	allow overriding kcpps values with explicit args	2024-09-21 11:00:10 +08:00
Concedo	68aee56498	updated lite, allow GUI to import launcher args and config files with showgui	2024-09-20 17:52:52 +08:00
Concedo	e958b2f78b	updated lite	2024-09-17 00:16:21 +08:00
Concedo	a4249abe5d	alias noblas to usecpu	2024-09-15 21:25:48 +08:00
Concedo	53bf0fb32d	removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.	2024-09-15 19:21:52 +08:00
Concedo	5b658ab6d4	updated lite	2024-09-12 10:47:47 +08:00
Concedo	70cdb55cc9	Merge commit '`947538acb8`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # CMakePresets.json # examples/llama-bench/llama-bench.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2024-09-09 11:26:34 +08:00
Concedo	d777995991	able to handle kcpp protected model name endpoints	2024-09-04 16:26:28 +08:00
Concedo	5d34de0c08	fix basepath	2024-09-02 18:09:58 +08:00
Concedo	3c4fa57026	allow horde worker to work with password protected instances	2024-08-31 21:30:47 +08:00
Concedo	0f9968ef64	fixed some incorrect protocol prefix for localhost	2024-08-29 10:37:43 +08:00
Concedo	5f360f659c	Add 5m timeout for horde worker	2024-08-28 23:17:06 +08:00
Concedo	6acbf1d7f4	macos default to full offload when using gpulayers auto (-1)	2024-08-26 12:12:51 +08:00
Concedo	97aa8648ed	allow launching with no models loaded	2024-08-25 23:57:32 +08:00
Concedo	0b96097439	add version number into help page	2024-08-22 00:52:30 +08:00
Concedo	5bf527a6ae	added xtc sampler	2024-08-21 23:57:15 +08:00
Concedo	cd69ab218e	fixed DRY	2024-08-21 17:01:28 +08:00
Concedo	2cf6d16c40	adjust sleep time	2024-08-21 01:06:41 +08:00
Concedo	c1ae350e5b	fixed race condition when generating	2024-08-20 20:17:55 +08:00
Concedo	7ee359a59b	on multigpu setups, pick lowest free mem instead of highest for auto layers	2024-08-20 19:02:16 +08:00
Concedo	e9eb6fe51a	move chat compl to models tab	2024-08-18 14:56:10 +08:00
Concedo	e2e6d892b4	fix declaration order	2024-08-18 02:15:34 +08:00
Concedo	d71b5477c5	update lite, cleanup, fix interrogate format	2024-08-18 00:48:53 +08:00
Concedo	2c108ab17e	correct phrasing	2024-08-14 21:55:53 +08:00
Concedo	f4f24d0e14	small text change	2024-08-11 21:30:46 +08:00
Concedo	139ab3d198	generate passes whole object now	2024-08-11 00:08:13 +08:00

1 2 3 4 5 ...

681 commits