rnwang04
|
142fb7ce6c
|
Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first
|
2025-05-14 19:37:27 +00:00 |
|
Atream
|
25cee5810e
|
add balance-serve, support concurrence
|
2025-03-31 22:55:32 +08:00 |
|
Azure-Tang
|
ed8437413b
|
merge main; Add torch q8 linear
|
2025-03-14 05:52:07 -04:00 |
|
Atream
|
6f43bbe55f
|
fix-singleton
|
2025-03-14 04:16:53 +00:00 |
|
宁鹏涛
|
71286ec1c0
|
Update local_chat.py
修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真
|
2025-03-01 21:52:48 +08:00 |
|
Atream
|
f35e8d41d8
|
support chunk prefill, support 139K context for 24G VRAM
|
2025-03-01 11:28:25 +00:00 |
|
Atream
|
e645d84794
|
use generation config from json file in official repo
|
2025-02-27 11:48:34 +00:00 |
|
Atream
|
b443c7dfa2
|
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
Feat absorb for long prefill
|
2025-02-25 16:53:21 +08:00 |
|
Atream
|
f4c198bd42
|
support absorb for prefill long context
|
2025-02-25 08:52:02 +00:00 |
|
Azure
|
36fbeee341
|
Update doc
|
2025-02-25 08:21:18 +00:00 |
|
Atream
|
e8e02e5ccc
|
support Moonlight
|
2025-02-23 14:21:18 +00:00 |
|
DDong Jianwei
|
95d937c51d
|
tmp
|
2025-02-23 18:51:42 +08:00 |
|
Atream
|
7e1fe256c8
|
optimize GPU
|
2025-02-21 05:06:57 +00:00 |
|
Atream
|
038bc30888
|
fix precision bug imported by position_ids in 0.2.0
|
2025-02-17 09:23:14 +00:00 |
|
liam
|
4385e85096
|
⚡ support force thinking
|
2025-02-12 12:43:53 +08:00 |
|
liam
|
d07087a7e2
|
⚡ support R1 force thinking
|
2025-02-11 15:43:41 +08:00 |
|
liam
|
098602b08f
|
⚡ v0.2 ongoing
|
2025-02-09 22:41:14 +08:00 |
|
liam
|
c18ecd7b7f
|
⚡ add flush print in local_chat output and change default optimize yaml of deepseekv3 to single gpu
|
2025-02-08 13:15:52 +08:00 |
|
liam
|
0262f954c7
|
Merge branch 'feat-DeepSeekV3' of github.com:kvcache-ai/ktransformers into feat-DeepSeekV3
|
2025-02-06 22:41:25 +08:00 |
|
liam
|
3dca28d23b
|
⚡ fix moe.cpp int overflow problem
|
2025-02-06 22:39:16 +08:00 |
|
Azure
|
027b11266c
|
modify moeinfer param
|
2025-02-06 14:07:38 +00:00 |
|
Azure
|
ee24a27001
|
update v3 single gpu rule yaml;
|
2025-02-04 16:14:35 +00:00 |
|
Azure
|
f873558a89
|
update rope calculation; update modeling.py; update gate for moe
|
2025-02-01 07:32:21 +00:00 |
|
Azure
|
5a50b34627
|
fix hard coding caused by rope dim calculation, load from config now
|
2025-01-31 15:25:50 +00:00 |
|
Azure
|
476b1d8dc6
|
support deepseekv3; runable but have precition problem
|
2025-01-31 08:27:24 +00:00 |
|
anyanqilin
|
a72dc6ed15
|
wjh change
|
2024-11-04 14:02:19 +08:00 |
|
liam
|
7c94df4bcf
|
🚑️: back transformer.py bugs version, and fix typo error in local_chat.py
|
2024-11-04 14:02:19 +08:00 |
|
liam
|
dd1d8667f3
|
✨: refactor local_chat and fix message slice bug in server
|
2024-11-04 14:02:19 +08:00 |
|
TangJingqi
|
6735beb5b6
|
Fix cannot offload whole layer in cpu
|
2024-08-29 19:10:14 +08:00 |
|
chenxl
|
4d1d561d28
|
[feature] release 0.1.3
|
2024-08-28 16:11:43 +00:00 |
|
chenxl
|
f5f79f5c0e
|
[ADD] support multi-gpu qlen>1 q5_k
|
2024-08-12 11:41:26 +00:00 |
|
chenxl
|
18c42e67df
|
Initial commit
|
2024-07-27 16:06:58 +08:00 |
|