koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-06 08:01:27 +00:00

Author	SHA1	Message	Date
Concedo	64ce5fca15	better approach when SWA window exceeded, simply refill the window. this is not 100% correct but good enough for fastforward users. Disable FF or increase window if not good enough	2026-04-17 11:44:13 +08:00
Concedo	dc2e6ca2e3	fix header path	2026-04-05 11:02:08 +08:00
Concedo	eb3422996a	BOS fix for gemma4	2026-04-04 22:15:01 +08:00
Concedo	226c79338f	handle glm4.7 flash template	2026-01-28 23:29:08 +08:00
Concedo	b867b67e7e	added mechanics for a full clear if fast forward is not used, this should help recover from bad states	2025-12-05 16:43:37 +08:00
Concedo	8631bbcee3	linting	2025-11-18 18:56:31 +08:00
LostRuins Concedo	5751c30790	add vulkan for whisper	2025-11-13 15:37:58 +08:00
Concedo	3b30f12ca7	future proof handling of rnn models	2025-10-07 19:12:47 +08:00
Concedo	7857578f45	handle more rnn models	2025-10-07 13:47:15 +08:00
Concedo	5d89a48a50	add more rnn models supported	2025-09-24 18:14:59 +08:00
Concedo	52606e9b1d	tts cpp model is now loadable in kcpp	2025-08-17 15:47:22 +08:00
Concedo	7b5cf7143f	handle gguf already containing renamed diffusion tensors prefix	2025-08-12 22:42:29 +08:00
Concedo	3468c2834d	fixed adv mode	2025-08-08 22:26:36 +08:00
Concedo	61c19fea56	fixed glm4 sop, lower regex max stacks (+2 squashed commit) Squashed commit: [47e39ae5d] lower regex max stack again [0a32ca232] lower regex max stack again	2025-08-06 17:10:57 +08:00
Concedo	5a3b2e3921	fix for jamba models - they have recurrent layers like rwkv, so context shifting and forwarding wont work on them.	2025-07-12 18:54:40 +08:00
Concedo	c45b8dc56f	fix for gemma3n	2025-07-10 17:39:08 +08:00
Concedo	f125e724eb	fix off-by-one npast during some instances of fast forwarding	2025-05-22 19:51:21 +08:00
Concedo	f841b29c41	fixed unicode paths	2025-05-11 14:05:54 +08:00
Concedo	c2802af9e8	fix qwen3, fixed sd, fixed glm4	2025-04-29 20:50:46 +08:00
Concedo	4decd6bea1	GLM4 batch clamp	2025-04-26 09:42:17 +08:00
Concedo	35dc8387e9	fixed rwkv7 handling	2025-04-26 02:13:06 +08:00
Concedo	0460d92cc3	disable context shifting for gemma3	2025-03-13 20:28:26 +08:00
Concedo	b162c25a5e	fixed moe experts to use detected arch for key	2025-02-10 17:46:08 +08:00
Concedo	e788b8289a	You'll never take us alive We swore that death will do us part They'll call our crimes a work of art	2025-01-09 11:27:06 +08:00
Concedo	00d154b32b	wip on qwen2vl integration, updated msvc runtimes	2024-12-15 23:58:02 +08:00
Concedo	bb13925f39	Merge branch 'upstream' into concedo_experimental # Conflicts: # CMakePresets.json # Makefile # Package.swift # ci/run.sh # common/CMakeLists.txt # examples/CMakeLists.txt # flake.lock # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp # ggml/src/ggml.c # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # tests/test-backend-ops.cpp # tests/test-grad0.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp # tests/test-rope.cpp	2024-11-04 16:54:53 +08:00
Concedo	fc7fe2e7a0	allow rwkv6 to run although its broken	2024-09-09 20:50:58 +08:00
Concedo	0dd3907940	qwen2 warning FA	2024-07-09 20:53:25 +08:00
Nexesenex	cb2336f5d9	Gradient rope formula with offsets (#938 ) * Gradient rope formula with offsets Positive for Solar models Negative for Llama 1 and 2 models * Update gpttype_adapter.cpp Remove L1/L2 * cleanup PR, skip llama models, keep prints behind debug mode --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-06-25 20:46:34 +08:00
askmyteapot	1e72b65c38	GradientAI Auto ROPE Base calculation (#910 ) * GradientAI Auto ROPE Base calculation https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models has a formula that better fits the ideal rope scaling. Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX. * add in solar scaling logic Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k. * Update model_adapter.h adding in tensor count to identify solar models based on tensor count of 435. * Update model_adapter.cpp add in n_tensor count for solar identification * refactor and cleanup GradientAI rope scaling --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-06-13 18:12:00 +08:00
Concedo	47c42fd45c	fix for mamba processing	2024-03-13 13:27:46 +08:00
Concedo	f75e479db0	WIP on sdcpp integration	2024-02-29 00:40:07 +08:00
Concedo	762eeb6204	triage for opencl	2024-01-27 11:09:43 +08:00
Concedo	d9a7bd577a	gpu layer offloading disabled for phi models in clblast	2024-01-25 17:40:05 +08:00
Concedo	375003b458	always show reported arch	2023-12-22 11:15:07 +08:00
Concedo	8b919b5b57	allow customized rope to use model set values	2023-11-15 16:21:52 +08:00
Concedo	5db89b90b7	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # CMakeLists.txt # Makefile # README.md # build.zig # ggml-opencl.cpp # tests/CMakeLists.txt # tests/test-double-float.cpp # tests/test-sampling.cpp	2023-10-25 23:58:15 +08:00
Concedo	839fc6dac8	handle freq_base_train	2023-10-24 23:44:22 +08:00
Concedo	c1ca1de2ac	fixed support for old falcon models	2023-10-18 17:20:44 +08:00
Concedo	7fb809b94b	fixed auto rope scaling (+1 squashed commits) Squashed commits: [b1767874] wip	2023-09-07 14:45:08 +08:00
Concedo	d4c22a8b02	updated lite, added autorope config based on trained ctxlen, hotfix for falcon gpu broken	2023-08-30 16:50:55 +08:00
Concedo	4b00916ac7	Merge branch 'master' into concedo_experimental # Conflicts: # .dockerignore # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # flake.lock # flake.nix # tests/CMakeLists.txt	2023-08-28 14:19:05 +08:00
Concedo	bfdc596d58	gguf reader in file format detection	2023-08-23 19:19:52 +08:00
Concedo	39cc83e8c9	incomplete merge, compiles but generates rubbish	2023-08-22 23:12:47 +08:00
Concedo	3a7853d259	handle stablecode-completion-alpha-3b	2023-08-09 21:07:57 +08:00
Concedo	df9135e3a9	fixing memory bugs	2023-06-23 18:41:23 +08:00
Concedo	9b6c35b651	rwkv speed enhancements (batch processing), fixed a rwkv token processing bug	2023-06-13 16:02:12 +08:00
Concedo	6f82e17b7a	added MPT support	2023-06-03 16:14:08 +08:00
Concedo	5d9f5b28a6	rwkv integration completed	2023-05-28 00:48:56 +08:00
Concedo	01a0f206df	added support for starcoder, which is basically gpt2	2023-05-27 13:35:40 +08:00

1 2

73 commits