Jiaqi Liao
|
05f6cede37
|
Merge pull request #943 from SkqLiao/main
fix benchmark params for human eval benchmark
|
2025-03-20 18:49:34 +08:00 |
|
SkqLiao
|
6d4626a5d9
|
fix params
|
2025-03-20 18:48:51 +08:00 |
|
Atream
|
633af5d235
|
Update gate.py
|
2025-03-20 14:54:01 +08:00 |
|
SkqLiao
|
8cc4df980e
|
use DeepSeek V3 instead of R1 for benchmarking
|
2025-03-20 11:59:03 +08:00 |
|
Jiaqi Liao
|
32a91c78c1
|
Merge pull request #935 from SkqLiao/main
Fix benchmarking slow issue on self-hosted actions
|
2025-03-20 10:14:37 +08:00 |
|
SkqLiao
|
19c824f9d0
|
change cpu-infer due to actual cpu cores on self-hosted server.
|
2025-03-20 10:10:52 +08:00 |
|
Jiaqi Liao
|
649489dc67
|
Merge pull request #931 from SkqLiao/main
Add Human Eval Benchmark Test for CI/CD
|
2025-03-19 21:35:24 +08:00 |
|
SkqLiao
|
bc369b256c
|
add CI/CD for human eval score benchmarking
|
2025-03-19 21:25:21 +08:00 |
|
Atream
|
b453333f60
|
Update gate.py
|
2025-03-19 16:14:54 +08:00 |
|
Atream
|
44599229cd
|
Update gate.py
|
2025-03-19 12:16:48 +08:00 |
|
Atream
|
114995355b
|
fix-gate-compile
|
2025-03-19 11:27:18 +08:00 |
|
Atream
|
167506b779
|
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
|
2025-03-17 17:05:01 +08:00 |
|
Atream
|
c9a0c44213
|
Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
|
2025-03-17 17:03:52 +08:00 |
|
liam
|
19f058ec9e
|
🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml
|
2025-03-17 15:08:12 +08:00 |
|
Azure-Tang
|
85c32fdd10
|
Fix rocm example yaml
|
2025-03-15 22:27:02 -04:00 |
|
Azure-Tang
|
4a31237346
|
fix rocm compilation
|
2025-03-15 12:34:03 -04:00 |
|
Atream
|
3934b9dfc1
|
rollback-triton-prefill
|
2025-03-15 14:21:21 +00:00 |
|
ZiWei Yuan
|
9b76cab1a5
|
Merge pull request #898 from kvcache-ai/develop-0.2.3post2
Release 0.2.3post2
|
2025-03-15 18:11:42 +08:00 |
|
liam
|
b5ef7c26dc
|
🔖 release v0.2.3post2
|
2025-03-15 18:04:10 +08:00 |
|
Azure
|
117a8d2f2a
|
fix compilation
|
2025-03-14 19:49:20 +00:00 |
|
SkqLiao
|
0f1684c28d
|
local chat for cicd test
|
2025-03-15 02:31:19 +08:00 |
|
Azure
|
3986e2d2cf
|
Merge pull request #178 from fxzjshm/hip
[Feat] Port to ROCm/HIP
|
2025-03-15 02:31:07 +08:00 |
|
Azure-Tang
|
e5b001d76f
|
Update readme; Format code; Add example yaml.
|
2025-03-14 14:25:52 -04:00 |
|
Atream
|
a889288fc1
|
use compile for gate, slight performance improvement
|
2025-03-14 12:43:28 +00:00 |
|
Azure-Tang
|
ed8437413b
|
merge main; Add torch q8 linear
|
2025-03-14 05:52:07 -04:00 |
|
Atream
|
6f43bbe55f
|
fix-singleton
|
2025-03-14 04:16:53 +00:00 |
|
Lander-Hatsune
|
d166fb9f6e
|
cpuinfer: filter repeated backend instantiation
|
2025-03-10 22:03:04 +08:00 |
|
Atream
|
09c043d8a6
|
Merge pull request #842 from BITcyman/fix-openai_chat_completion
[fix] thread context bug
|
2025-03-07 22:56:19 +08:00 |
|
BITcyman
|
08a8b553d6
|
[fix] thread context bug
|
2025-03-07 14:52:16 +00:00 |
|
Atream
|
f8c1821f1d
|
Update __init__.py
|
2025-03-07 22:08:48 +08:00 |
|
Atream
|
d453c320f1
|
fix flashinfer precision
|
2025-03-07 14:07:00 +00:00 |
|
BITcyman
|
299c4dca64
|
[update] support openai chat completion api
|
2025-03-07 08:51:09 +00:00 |
|
ZiWei Yuan
|
63b1c8525b
|
Merge pull request #820 from kvcache-ai/develop-0.2.3
Develop 0.2.3 ready to release
|
2025-03-06 14:46:09 +08:00 |
|
liam
|
8eeb6dd432
|
⚡ update compile option for avx512vpopcntdq
|
2025-03-06 12:18:04 +08:00 |
|
chenmz00
|
b2ba795cfd
|
fix: list models API
Fix the list models API to match the corresponding OpenAI API format.
|
2025-03-05 21:49:27 +08:00 |
|
liam
|
9c343b4f71
|
🔖 release v0.2.3
|
2025-03-05 20:24:11 +08:00 |
|
liam
|
848fe8ab97
|
⚡ release v0.2.3
|
2025-03-05 20:21:04 +08:00 |
|
Azure
|
d7becadcf7
|
Merge branch 'develop-0.2.3' of https://github.com/kvcache-ai/ktransformers into develop-0.2.3
|
2025-03-05 09:26:23 +00:00 |
|
Azure
|
662c1e4c14
|
small fix about max new token
|
2025-03-05 09:25:41 +00:00 |
|
liam
|
dc10480ef6
|
⚡ add humaneval support
|
2025-03-04 20:54:49 +08:00 |
|
Yi Pan
|
01755a60c0
|
fix: wrong shape in KLinearMarlin.
|
2025-03-03 17:34:45 +08:00 |
|
Atream
|
8963ae7817
|
Update __init__.py
|
2025-03-03 16:49:50 +08:00 |
|
wang jiahao
|
48b9800790
|
Merge pull request #759 from 3wweiweiwu/fix_top_p_typo
fix typo for top_p
|
2025-03-02 13:58:11 +08:00 |
|
1668068727@qq.com
|
7cdf8139f0
|
fix ollama api temperature bug
|
2025-03-02 13:55:26 +08:00 |
|
Wix Woo
|
3aa0cfc29d
|
fix typo for top_p
|
2025-03-01 20:15:36 +00:00 |
|
Atream
|
ca1dc1e7d1
|
Merge branch 'main' into main
|
2025-03-01 23:24:10 +08:00 |
|
宁鹏涛
|
71286ec1c0
|
Update local_chat.py
修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真
|
2025-03-01 21:52:48 +08:00 |
|
Atream
|
fa03ea48dd
|
Merge branch 'main' into feat-chunk-prefill-flashinfer
|
2025-03-01 11:35:09 +00:00 |
|
Atream
|
f35e8d41d8
|
support chunk prefill, support 139K context for 24G VRAM
|
2025-03-01 11:28:25 +00:00 |
|
liam
|
80e0536fb0
|
Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main
|
2025-03-01 00:12:21 +08:00 |
|