kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-04-29 12:19:50 +00:00

History

Atream 5ec33d046d optimize gguf dequant, save mem, support Q2_K use marlin for lm_head, lm_head only calc last token for prefill extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM		2025-02-22 06:13:01 +00:00
..
__init__.py	Initial commit	2024-07-27 16:06:58 +08:00
configuration_deepseek.py	Initial commit	2024-07-27 16:06:58 +08:00
configuration_deepseek_v3.py	update rope calculation; update modeling.py; update gate for moe	2025-02-01 07:32:21 +00:00
configuration_llama.py	[feature] release 0.1.3	2024-08-28 16:11:43 +00:00
custom_cache.py	Merge branch 'fix_precision_MLA' of https://github.com/kvcache-ai/ktransformers into server-prefix-cache	2025-02-17 18:08:04 +08:00
modeling_deepseek.py	optimize gguf dequant, save mem, support Q2_K	2025-02-22 06:13:01 +00:00
modeling_deepseek_v3.py	optimize gguf dequant, save mem, support Q2_K	2025-02-22 06:13:01 +00:00
modeling_llama.py	[feature] release 0.1.3	2024-08-28 16:11:43 +00:00
modeling_mixtral.py	[ADD] support multi-gpu qlen>1 q5_k	2024-08-12 11:41:26 +00:00
modeling_qwen2_moe.py	Initial commit	2024-07-27 16:06:58 +08:00