Atream
80c5cbecdd
add nlohmann
2025-04-01 10:38:45 +08:00
Atream
9360d1e3c8
add submodules
2025-03-31 23:20:29 +08:00
Atream
25cee5810e
add balance-serve, support concurrence
2025-03-31 22:55:32 +08:00
Atream
8d0292aa44
refactor folders
2025-03-31 22:45:37 +08:00
Yuhao Tsui
84164f584c
Update completions.py
2025-03-26 15:39:46 +08:00
Yuhao Tsui
52fa671c10
Merge branch 'kvcache-ai:main' into main
2025-03-26 11:06:00 +08:00
Atream
f142f4dff3
Merge pull request #956 from kvcache-ai/Atream-patch-7
...
Update README.md
2025-03-22 12:14:48 +08:00
Atream
d4c6c2bb02
Update README.md
2025-03-22 12:14:36 +08:00
Aubrey Li
a12e8ab46e
yaml: fix Marlin AssertionError
...
Marlin quantized linear only supports GPU device, when change generate_op
to "KLinearMarlin", generate_device need to be changed to "cuda" accordingly.
Fixes: e5b001d76f
("Update readme; Format code; Add example yaml.")
2025-03-21 23:58:20 +08:00
Aubrey Li
f4d52d1f0c
Restore CPU offloading capability
2025-03-21 10:04:31 +08:00
Jiaqi Liao
05f6cede37
Merge pull request #943 from SkqLiao/main
...
fix benchmark params for human eval benchmark
2025-03-20 18:49:34 +08:00
SkqLiao
6d4626a5d9
fix params
2025-03-20 18:48:51 +08:00
Atream
ddd35d5be9
Merge pull request #940 from kvcache-ai/Atream-patch-6
...
Update gate.py
2025-03-20 14:54:20 +08:00
Atream
633af5d235
Update gate.py
2025-03-20 14:54:01 +08:00
SkqLiao
8cc4df980e
use DeepSeek V3 instead of R1 for benchmarking
2025-03-20 11:59:03 +08:00
Jiaqi Liao
32a91c78c1
Merge pull request #935 from SkqLiao/main
...
Fix benchmarking slow issue on self-hosted actions
2025-03-20 10:14:37 +08:00
SkqLiao
e7d7d2705c
rename CI/CD
2025-03-20 10:11:24 +08:00
SkqLiao
19c824f9d0
change cpu-infer due to actual cpu cores on self-hosted server.
2025-03-20 10:10:52 +08:00
Jiaqi Liao
649489dc67
Merge pull request #931 from SkqLiao/main
...
Add Human Eval Benchmark Test for CI/CD
2025-03-19 21:35:24 +08:00
SkqLiao
bad334fa5b
fix path
2025-03-19 21:28:58 +08:00
SkqLiao
bc369b256c
add CI/CD for human eval score benchmarking
2025-03-19 21:25:21 +08:00
Atream
8be56a0190
Merge pull request #927 from kvcache-ai/fix-gate-precision
...
Update gate.py
2025-03-19 16:16:31 +08:00
Atream
b453333f60
Update gate.py
2025-03-19 16:14:54 +08:00
Atream
6ca233cca3
Merge pull request #926 from kvcache-ai/Atream-patch-5
...
Update gate.py
2025-03-19 12:17:09 +08:00
Atream
44599229cd
Update gate.py
2025-03-19 12:16:48 +08:00
Atream
aa8f985f85
Merge pull request #925 from kvcache-ai/fix-gate-compile
...
fix-gate-compile
2025-03-19 11:44:41 +08:00
Atream
114995355b
fix-gate-compile
2025-03-19 11:27:18 +08:00
ZiWei Yuan
e788248364
Merge pull request #916 from kvcache-ai/patch_v0.2.3post2
...
📝 fix typo ktransformer->ktransformers
2025-03-17 17:55:30 +08:00
liam
4748a912e2
📝 fix typo ktransformer->ktransformers
2025-03-17 17:54:00 +08:00
Atream
8b51b0f058
Merge pull request #915 from kvcache-ai/Atream-patch-4
...
Atream patch 4
2025-03-17 17:05:39 +08:00
Atream
167506b779
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
2025-03-17 17:05:01 +08:00
Atream
c9a0c44213
Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
2025-03-17 17:03:52 +08:00
Atream
3aee0fa099
Merge pull request #913 from kvcache-ai/Atream-patch-3
...
Add files via upload
2025-03-17 17:00:28 +08:00
Atream
094ac8f3a4
Add files via upload
2025-03-17 16:59:57 +08:00
ZiWei Yuan
8a8311cb04
Merge pull request #911 from kvcache-ai/patch_v0.2.3post2
...
🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml
2025-03-17 15:09:11 +08:00
liam
19f058ec9e
🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml
2025-03-17 15:08:12 +08:00
Azure
0e93a09d67
Merge pull request #906 from Azure-Tang/main
...
[Fix] Fix rocm example yaml
2025-03-16 10:27:59 +08:00
Azure-Tang
85c32fdd10
Fix rocm example yaml
2025-03-15 22:27:02 -04:00
Azure
63604cac59
Merge pull request #904 from Azure-Tang/main
...
[fix]Fix rocm compilation
2025-03-16 00:36:16 +08:00
Azure-Tang
4a31237346
fix rocm compilation
2025-03-15 12:34:03 -04:00
Atream
c51818c39a
Merge pull request #902 from kvcache-ai/rollback-triton-prefill
...
rollback-triton-prefill
2025-03-15 23:09:30 +08:00
Atream
3934b9dfc1
rollback-triton-prefill
2025-03-15 14:21:21 +00:00
ZiWei Yuan
bda9cf15e7
Merge pull request #899 from kvcache-ai/develop-0.2.3post2
...
⚡ fix readme path
2025-03-15 19:20:52 +08:00
liam
ee02a111d7
⚡ fix readme path
2025-03-15 19:20:04 +08:00
ZiWei Yuan
9b76cab1a5
Merge pull request #898 from kvcache-ai/develop-0.2.3post2
...
Release 0.2.3post2
2025-03-15 18:11:42 +08:00
liam
b5ef7c26dc
🔖 release v0.2.3post2
2025-03-15 18:04:10 +08:00
Jiaqi Liao
dfe09b05dd
Merge pull request #897 from SkqLiao/main
...
Add Unit Test for Local Chat
2025-03-15 17:42:48 +08:00
SkqLiao
c66ca65778
write to log
2025-03-15 17:10:44 +08:00
SkqLiao
a1891b845d
remove unsupprted paramters, add force think
2025-03-15 17:04:42 +08:00
SkqLiao
4e23a4c024
split two test
2025-03-15 11:32:43 +08:00