Commit graph

20 commits

Author SHA1 Message Date
Atream
7e1fe256c8 optimize GPU 2025-02-21 05:06:57 +00:00
Atream
038bc30888 fix precision bug imported by position_ids in 0.2.0 2025-02-17 09:23:14 +00:00
liam
4385e85096 support force thinking 2025-02-12 12:43:53 +08:00
liam
d07087a7e2 support R1 force thinking 2025-02-11 15:43:41 +08:00
liam
098602b08f v0.2 ongoing 2025-02-09 22:41:14 +08:00
liam
c18ecd7b7f add flush print in local_chat output and change default optimize yaml of deepseekv3 to single gpu 2025-02-08 13:15:52 +08:00
liam
0262f954c7 Merge branch 'feat-DeepSeekV3' of github.com:kvcache-ai/ktransformers into feat-DeepSeekV3 2025-02-06 22:41:25 +08:00
liam
3dca28d23b fix moe.cpp int overflow problem 2025-02-06 22:39:16 +08:00
Azure
027b11266c modify moeinfer param 2025-02-06 14:07:38 +00:00
Azure
ee24a27001 update v3 single gpu rule yaml; 2025-02-04 16:14:35 +00:00
Azure
f873558a89 update rope calculation; update modeling.py; update gate for moe 2025-02-01 07:32:21 +00:00
Azure
5a50b34627 fix hard coding caused by rope dim calculation, load from config now 2025-01-31 15:25:50 +00:00
Azure
476b1d8dc6 support deepseekv3; runable but have precition problem 2025-01-31 08:27:24 +00:00
anyanqilin
a72dc6ed15 wjh change 2024-11-04 14:02:19 +08:00
liam
7c94df4bcf 🚑️: back transformer.py bugs version, and fix typo error in local_chat.py 2024-11-04 14:02:19 +08:00
liam
dd1d8667f3 : refactor local_chat and fix message slice bug in server 2024-11-04 14:02:19 +08:00
TangJingqi
6735beb5b6 Fix cannot offload whole layer in cpu 2024-08-29 19:10:14 +08:00
chenxl
4d1d561d28 [feature] release 0.1.3 2024-08-28 16:11:43 +00:00
chenxl
f5f79f5c0e [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00
chenxl
18c42e67df Initial commit 2024-07-27 16:06:58 +08:00