vrr/kvcache-ai-ktransformers

755 commits 41 branches 19 tags 46 MiB

Author	SHA1	Message	Date
Aubrey Li	def1ec7683	modeling_deepseek_v3: fix GenerationMixin warning Fix GenerationMixin warning introduced by upgrading transformers to 4.51.3.	2025-05-01 07:48:15 +08:00
Atream	e36ddc36a8	Update modeling_deepseek_v3.py	2025-04-03 17:13:06 +08:00
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Atream	5ec33d046d	optimize gguf dequant, save mem, support Q2_K use marlin for lm_head, lm_head only calc last token for prefill extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM	2025-02-22 06:13:01 +00:00
Azure	3897f001f5	update FAQ	2025-02-12 08:50:58 +00:00
Azure	907251c743	done support deepseekv3	2025-02-04 15:53:38 +00:00
Azure	f748cd29f0	fix rope; update moegate	2025-02-01 18:05:45 +00:00
Azure	f873558a89	update rope calculation; update modeling.py; update gate for moe	2025-02-01 07:32:21 +00:00

Renamed from ktransformers/models/modeling_deepseekv3.py (Browse further)