Atream
cf79c93fae
Update README.md
2025-07-11 09:35:12 +08:00
Atream
18690d819f
Update README.md
2025-07-11 09:34:07 +08:00
Atream
b4ac21454b
Create Kimi-K2.md
2025-07-11 09:31:47 +08:00
Atream
890b0f1622
Merge pull request #1410 from kvcache-ai/Atream-patch-1
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Update __init__.py
2025-07-01 16:43:42 +08:00
Atream
5bd40c33eb
Update __init__.py
2025-07-01 16:43:19 +08:00
aubreyli
f96aab3c85
Merge pull request #1409 from rnwang04/fix_fp16
...
revert using FP16 in XPU
2025-07-01 15:00:41 +08:00
rnwang04
5b5deda420
revert using FP16
2025-07-01 14:24:27 +08:00
ErvinXie
495ae37478
Merge pull request #1407 from kvcache-ai/v0.3.2
...
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
V0.3.2
2025-07-01 10:26:56 +08:00
ouqingliang
90cff820cf
update kvc disk path config.
2025-06-30 15:09:35 +00:00
ErvinXie
aadf31b35d
Update README.md
2025-06-30 17:55:49 +08:00
ErvinXie
5a73aaf652
Update prefix_cache.md
2025-06-30 15:04:37 +08:00
ErvinXie
a9a72e52c3
Update README.md
2025-06-30 14:56:46 +08:00
ErvinXie
d3fae09252
Merge pull request #1405 from kvcache-ai/prefix-cache
...
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
Prefix cache
2025-06-30 14:39:26 +08:00
ouqingliang
cc822df65d
add prefix cache documentation
2025-06-28 07:13:33 +00:00
ouqingliang
4d51831316
fix MPSC
2025-06-26 13:11:40 +00:00
ouqingliang
3b4a1c7532
add prefix cache support for kvc2.
2025-06-26 04:57:25 +00:00
ouqingliang
b154441072
add prefix cache to kvc2.
2025-06-26 04:56:43 +00:00
ZiWei Yuan
ee5ee1103b
Merge pull request #1399 from KMSorSMS/main
...
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
✨ update vendor ZTE name
2025-06-23 21:10:04 +08:00
ZiWei Yuan
be13587fe6
Merge branch 'kvcache-ai:main' into main
2025-06-23 21:09:10 +08:00
liam Yuan
22d0d9ccb2
✨ update vendor ZTE name
2025-06-23 21:07:17 +08:00
ZiWei Yuan
06e45fd7d1
Merge pull request #1398 from KMSorSMS/main
...
✨ update vendor support list
2025-06-23 21:04:19 +08:00
liam Yuan
cb77b52c63
✨ update vendor support list
2025-06-23 21:00:01 +08:00
ZiWei Yuan
90888fee0d
Merge pull request #1372 from JaydenChao101/patch-1
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
包含所有package
2025-06-15 21:08:53 +08:00
Azure
64ec0ec148
Merge pull request #1379 from aubreyli/main
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Load DS-R1-0528 for module is BaseInjectedModule instance
2025-06-11 16:05:04 +08:00
Aubrey Li
b599111b04
Load DS-R1-0528 for module is BaseInjectedModule instance
2025-06-10 11:31:58 +08:00
Jayden Chao
2637d285bf
Update pyproject.toml
2025-06-07 21:28:49 +08:00
wang jiahao
dcba29b291
Merge pull request #1366 from zhouye/main
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Fix kv_b_proj shape for unsloth quantized models
2025-06-07 18:14:00 +08:00
Ye Zhou
255c0fcf3b
Fix kv_b_proj shape for unsloth quantized models
2025-06-05 17:33:11 +08:00
Atream
7071970339
Merge pull request #1343 from zhouye/main
...
Mirror #1247 in server mode
2025-06-02 16:07:04 +08:00
Atream
cc367f5814
Merge pull request #1351 from kvcache-ai/load-DeepSeek-0528
...
[Patch] load DeepSeek-R1-0528 and enable CPU GGUF dequant when GPU dequant is not implemented
2025-06-01 19:13:57 +08:00
qiyuxinlin
a6b3243a56
[Patch] lload DeepSeek-R1-0528
2025-05-31 14:19:20 +00:00
Atream
ac48a58cca
Merge pull request #1350 from kvcache-ai/Update-WeChatGroup
...
Deploy / deploy (windows-latest) (push) Has been cancelled
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Add files via upload
2025-05-31 13:34:14 +08:00
Atream
44143d972f
Add files via upload
2025-05-31 13:33:36 +08:00
Emmanuel Ferdman
d8bc6402b5
raise exception on device error ( #1342 )
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
* display the unavailable torch device on error
* Raise exception on device error
---------
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-29 15:37:41 +08:00
Ye Zhou
00949d5e8d
Mirror #1247 in server mode
2025-05-29 15:30:40 +08:00
aubreyli
ce75fcd7dd
Merge pull request #1337 from liu-shaojun/docker_xpu
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Add Dockerfile and usage guide for XPU support
2025-05-28 14:08:46 +08:00
Shaojun Liu
404ad39a04
docs: add Dockerfile.xpu and GPU driver setup instructions
...
- Add Dockerfile.xpu for oneAPI-based container
- Create Docker_xpu.md with usage instructions
- Update xpu.md to include Docker guide
2025-05-28 13:55:35 +08:00
wang jiahao
0c44f2e211
Merge pull request #1331 from rnwang04/qwen3_xpu_support
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
add XPU support for qwen3moe local chat
2025-05-23 10:38:28 +08:00
qiyuxinlin
71a5fc5770
fix local_chat.py chunk_size not effect experts
2025-05-23 02:35:01 +00:00
rnwang04
adc0906967
add XPU support for qwen3moe local chat
2025-05-22 21:01:41 +08:00
Chen Hongtao
25893366b6
Merge pull request #1328 from chenht2022/main
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Fix NaN bug
2025-05-21 11:46:48 +08:00
chenht2022
66453981ff
Fix NaN bug
2025-05-21 03:39:49 +00:00
Atream
7d79735bd0
Merge pull request #1323 from kvcache-ai/Atream-patch-2
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Update version
2025-05-19 23:21:56 +08:00
Atream
4f78e37625
Update version
2025-05-19 23:21:23 +08:00
Atream
01311d251d
Merge pull request #1320 from aubreyli/no_cuda_graph_err
...
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
VLinearMarlin: padding to input.shape[0] to avoid CUDA error
2025-05-18 02:45:05 -06:00
Aubrey Li
d347aeb518
VLinearMarlin: padding to input.shape[0] to avoid CUDA error
...
Fix the following runtime error with --no-use_cuda_graph option
Traceback (most recent call last):
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 282, in run_engine
engine.loop()
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 234, in loop
self.model_runner.run(self.batch, self.query_manager)
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/balance_serve/inference/model_runner.py", line 220, in run
self.output.logits[0] = self.output.logits[0][self.input[cuda_graph_idx].minibatch.logits_start]
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-05-18 15:11:37 +08:00
wang jiahao
32f3d7befb
Merge pull request #1307 from kvcache-ai/hyc
...
add xpu parameters to install.sh
2025-05-17 15:25:33 +08:00
Alisehen
5b08d5b07b
fix
2025-05-17 07:22:51 +00:00
aubreyli
551ebc91c7
Merge pull request #1313 from rnwang04/update_ipex_llm_version
...
fix ipex-llm version to 2.3.0rc1
2025-05-16 13:28:12 +08:00
rnwang04
a56aa45186
fix ipex-llm version to 2.3.0rc1
2025-05-16 12:22:08 +08:00