koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-09 16:44:35 +00:00

Author	SHA1	Message	Date
Concedo	52606e9b1d	tts cpp model is now loadable in kcpp	2025-08-17 15:47:22 +08:00
Concedo	7b5cf7143f	handle gguf already containing renamed diffusion tensors prefix	2025-08-12 22:42:29 +08:00
Concedo	3468c2834d	fixed adv mode	2025-08-08 22:26:36 +08:00
Concedo	5a3b2e3921	fix for jamba models - they have recurrent layers like rwkv, so context shifting and forwarding wont work on them.	2025-07-12 18:54:40 +08:00
Concedo	c45b8dc56f	fix for gemma3n	2025-07-10 17:39:08 +08:00
Concedo	736030bb9f	save and load state upgraded to 3 available states	2025-06-04 22:09:40 +08:00
Concedo	53f1511396	use a static buffer for kv reloads instead. also, added into lite ui	2025-06-03 22:32:46 +08:00
Concedo	4b57108508	Save KV State and Load KV State to memory added. GUI not yet updated	2025-06-03 17:46:29 +08:00
Concedo	c2802af9e8	fix qwen3, fixed sd, fixed glm4	2025-04-29 20:50:46 +08:00
Concedo	4decd6bea1	GLM4 batch clamp	2025-04-26 09:42:17 +08:00
Concedo	3992fb79cc	wip adding embeddings support	2025-03-24 18:01:23 +08:00
Concedo	0460d92cc3	disable context shifting for gemma3	2025-03-13 20:28:26 +08:00
Concedo	b162c25a5e	fixed moe experts to use detected arch for key	2025-02-10 17:46:08 +08:00
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Concedo	00d154b32b	wip on qwen2vl integration, updated msvc runtimes	2024-12-15 23:58:02 +08:00
Concedo	2c1a06a07d	wip ollama emulation, added detokenize endpoint	2024-11-23 22:48:03 +08:00
kallewoof	547ab2aebb	API: add /props route (#1222 ) * API: add an /extra/chat_template route A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model. * switch to pre-established /props endpoint for chat template * bug-fix (upstream): one-off in string juggling	2024-11-21 10:58:32 +08:00
Concedo	90f5cd0f67	wip logprobs data	2024-10-30 00:59:34 +08:00
Concedo	fc7fe2e7a0	allow rwkv6 to run although its broken	2024-09-09 20:50:58 +08:00
Concedo	0dd3907940	qwen2 warning FA	2024-07-09 20:53:25 +08:00
Nexesenex	cb2336f5d9	Gradient rope formula with offsets (#938 ) * Gradient rope formula with offsets Positive for Solar models Negative for Llama 1 and 2 models * Update gpttype_adapter.cpp Remove L1/L2 * cleanup PR, skip llama models, keep prints behind debug mode --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-06-25 20:46:34 +08:00
askmyteapot	1e72b65c38	GradientAI Auto ROPE Base calculation (#910 ) * GradientAI Auto ROPE Base calculation https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models has a formula that better fits the ideal rope scaling. Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX. * add in solar scaling logic Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k. * Update model_adapter.h adding in tensor count to identify solar models based on tensor count of 435. * Update model_adapter.cpp add in n_tensor count for solar identification * refactor and cleanup GradientAI rope scaling --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-06-13 18:12:00 +08:00
Concedo	10b148f4c2	added skip bos for tokenize endpoint	2024-06-05 10:49:11 +08:00
Concedo	f24aef8792	initial whisper integration	2024-05-29 23:13:11 +08:00
Concedo	47c42fd45c	fix for mamba processing	2024-03-13 13:27:46 +08:00
Concedo	5a44d4de2b	refactor and clean identifiers for sd, fix cmake	2024-02-29 18:28:45 +08:00
Concedo	524ba12abd	refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr	2024-02-29 14:02:20 +08:00
Concedo	f75e479db0	WIP on sdcpp integration	2024-02-29 00:40:07 +08:00
Concedo	762eeb6204	triage for opencl	2024-01-27 11:09:43 +08:00
Concedo	d9a7bd577a	gpu layer offloading disabled for phi models in clblast	2024-01-25 17:40:05 +08:00
Concedo	6570a2005b	token count includes ids	2023-12-03 15:44:53 +08:00
Concedo	8b919b5b57	allow customized rope to use model set values	2023-11-15 16:21:52 +08:00
Concedo	839fc6dac8	handle freq_base_train	2023-10-24 23:44:22 +08:00
Concedo	c1ca1de2ac	fixed support for old falcon models	2023-10-18 17:20:44 +08:00
Concedo	7fb809b94b	fixed auto rope scaling (+1 squashed commits) Squashed commits: [b1767874] wip	2023-09-07 14:45:08 +08:00
Concedo	d4c22a8b02	updated lite, added autorope config based on trained ctxlen, hotfix for falcon gpu broken	2023-08-30 16:50:55 +08:00
Concedo	b95a4ccb22	added a token counting endpoint, set mmq as default	2023-08-24 20:41:49 +08:00
Concedo	981c9131f0	gguf for llama is working	2023-08-23 16:07:07 +08:00
Concedo	39cc83e8c9	incomplete merge, compiles but generates rubbish	2023-08-22 23:12:47 +08:00
Concedo	43f7e40470	added extra endpoints for abort gen and polled streaming	2023-06-10 18:13:26 +08:00
Concedo	d28ed99e59	remove unused declarations	2023-06-09 18:01:55 +08:00
Concedo	6f82e17b7a	added MPT support	2023-06-03 16:14:08 +08:00
Concedo	5d9f5b28a6	rwkv integration completed	2023-05-28 00:48:56 +08:00
Concedo	c048bcfec4	remove old filever checks (+7 squashed commit) Squashed commit: [b72627a] new format not working [e568870] old ver works [7053b77] compile errors fixed, fixing linkers [4ae8889] add new ver [ff82dfd] file format checks [25b8aa8] refactoring type names [931063b] still merging	2023-05-21 00:15:39 +08:00
Concedo	b692e4d2a4	wip	2023-05-14 17:21:07 +08:00
Concedo	05cf5f7d6e	partially working, but the blas matmul is broken	2023-05-13 11:35:38 +08:00
Concedo	2f2eff6e13	the dark gods have been sated, and redpajama is integrated... but at what cost?	2023-05-08 20:58:00 +08:00
Concedo	5eec5d6ed9	Added backwards compatibility to an earlier version of NeoX.	2023-04-25 20:34:18 +08:00
Concedo	ef13443047	wip pythia integration	2023-04-22 01:08:23 +08:00
Concedo	45ec09d31b	fast forwarding for rwkv for unmodified contexts	2023-04-19 15:09:35 +08:00

1 2

60 commits