wang jiahao
ac95b6c710
Merge pull request #1015 from kvcache-ai/qiyuxinlin-patch-1
...
Update balance-serve.md
2025-04-02 14:22:30 +08:00
wang jiahao
ee179c2ad0
Update balance-serve.md
2025-04-02 14:22:15 +08:00
wang jiahao
a41d216393
Merge pull request #1013 from kvcache-ai/work-concurrent
...
In v0.2.4 version, we’ve added highly desired multi-concurrency support to the community through a major refactor of the whole architecture.
2025-04-02 14:09:10 +08:00
dongjw
4ed9744ebb
update readme
2025-04-02 14:02:57 +08:00
dongjw
b62cefaec9
update readme
2025-04-02 13:11:01 +08:00
dongjw
d41dd23b14
update Dockerfile
2025-04-02 12:10:58 +08:00
dongjw
65798994cb
update Dockerfile
2025-04-02 12:04:09 +08:00
dongjw
56a18ad02c
change tag v0.2.4
2025-04-01 21:07:13 +08:00
Azure-Tang
d98433c2d1
update git action env, add USE_BALANCE_SERVE=1
2025-04-01 12:58:28 +00:00
dongjw
5c7ed7b579
fix top_p = 0 bug
2025-04-01 20:38:33 +08:00
Azure-Tang
aeabd783b0
update git action env, add BALANCE_SERVE=1
2025-04-01 11:21:55 +00:00
Azure-Tang
31677181c3
Fix ktransformers-server flashinfer wrapper position arg issue;
...
Fix db position issue
2025-04-01 07:30:23 +00:00
Azure-Tang
203b853c75
rm KMoEGateDeepSeekV3, fall back to KMoEGate
2025-04-01 07:13:05 +00:00
Azure-Tang
3a5330b215
Merge branch 'main' into work-concurrent
2025-04-01 06:48:19 +00:00
fishingfly
7549ff335a
fix: refine backend error message to include ROCM_HOME
...
Signed-off-by: fishingfly <zhoyuzf@163.com>
2025-04-01 10:50:38 +08:00
Atream
80c5cbecdd
add nlohmann
2025-04-01 10:38:45 +08:00
Atream
9360d1e3c8
add submodules
2025-03-31 23:20:29 +08:00
Atream
25cee5810e
add balance-serve, support concurrence
2025-03-31 22:55:32 +08:00
Atream
8d0292aa44
refactor folders
2025-03-31 22:45:37 +08:00
Yuhao Tsui
84164f584c
Update completions.py
2025-03-26 15:39:46 +08:00
Yuhao Tsui
52fa671c10
Merge branch 'kvcache-ai:main' into main
2025-03-26 11:06:00 +08:00
Atream
f142f4dff3
Merge pull request #956 from kvcache-ai/Atream-patch-7
...
Update README.md
2025-03-22 12:14:48 +08:00
Atream
d4c6c2bb02
Update README.md
2025-03-22 12:14:36 +08:00
Aubrey Li
a12e8ab46e
yaml: fix Marlin AssertionError
...
Marlin quantized linear only supports GPU device, when change generate_op
to "KLinearMarlin", generate_device need to be changed to "cuda" accordingly.
Fixes: e5b001d76f
("Update readme; Format code; Add example yaml.")
2025-03-21 23:58:20 +08:00
Aubrey Li
f4d52d1f0c
Restore CPU offloading capability
2025-03-21 10:04:31 +08:00
Jiaqi Liao
05f6cede37
Merge pull request #943 from SkqLiao/main
...
fix benchmark params for human eval benchmark
2025-03-20 18:49:34 +08:00
SkqLiao
6d4626a5d9
fix params
2025-03-20 18:48:51 +08:00
Atream
ddd35d5be9
Merge pull request #940 from kvcache-ai/Atream-patch-6
...
Update gate.py
2025-03-20 14:54:20 +08:00
Atream
633af5d235
Update gate.py
2025-03-20 14:54:01 +08:00
SkqLiao
8cc4df980e
use DeepSeek V3 instead of R1 for benchmarking
2025-03-20 11:59:03 +08:00
Jiaqi Liao
32a91c78c1
Merge pull request #935 from SkqLiao/main
...
Fix benchmarking slow issue on self-hosted actions
2025-03-20 10:14:37 +08:00
SkqLiao
e7d7d2705c
rename CI/CD
2025-03-20 10:11:24 +08:00
SkqLiao
19c824f9d0
change cpu-infer due to actual cpu cores on self-hosted server.
2025-03-20 10:10:52 +08:00
Jiaqi Liao
649489dc67
Merge pull request #931 from SkqLiao/main
...
Add Human Eval Benchmark Test for CI/CD
2025-03-19 21:35:24 +08:00
SkqLiao
bad334fa5b
fix path
2025-03-19 21:28:58 +08:00
SkqLiao
bc369b256c
add CI/CD for human eval score benchmarking
2025-03-19 21:25:21 +08:00
Atream
8be56a0190
Merge pull request #927 from kvcache-ai/fix-gate-precision
...
Update gate.py
2025-03-19 16:16:31 +08:00
Atream
b453333f60
Update gate.py
2025-03-19 16:14:54 +08:00
Atream
6ca233cca3
Merge pull request #926 from kvcache-ai/Atream-patch-5
...
Update gate.py
2025-03-19 12:17:09 +08:00
Atream
44599229cd
Update gate.py
2025-03-19 12:16:48 +08:00
Atream
aa8f985f85
Merge pull request #925 from kvcache-ai/fix-gate-compile
...
fix-gate-compile
2025-03-19 11:44:41 +08:00
Atream
114995355b
fix-gate-compile
2025-03-19 11:27:18 +08:00
ZiWei Yuan
e788248364
Merge pull request #916 from kvcache-ai/patch_v0.2.3post2
...
📝 fix typo ktransformer->ktransformers
2025-03-17 17:55:30 +08:00
liam
4748a912e2
📝 fix typo ktransformer->ktransformers
2025-03-17 17:54:00 +08:00
Atream
8b51b0f058
Merge pull request #915 from kvcache-ai/Atream-patch-4
...
Atream patch 4
2025-03-17 17:05:39 +08:00
Atream
167506b779
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
2025-03-17 17:05:01 +08:00
Atream
c9a0c44213
Update DeepSeek-V3-Chat-multi-gpu-fp8-linear-ggml-experts.yaml
2025-03-17 17:03:52 +08:00
Atream
3aee0fa099
Merge pull request #913 from kvcache-ai/Atream-patch-3
...
Add files via upload
2025-03-17 17:00:28 +08:00
Atream
094ac8f3a4
Add files via upload
2025-03-17 16:59:57 +08:00
ZiWei Yuan
8a8311cb04
Merge pull request #911 from kvcache-ai/patch_v0.2.3post2
...
🔧 update multi-gpu-fp8-linear and multi-gpu marlin yaml
2025-03-17 15:09:11 +08:00