rnwang04
adc0906967
add XPU support for qwen3moe local chat
2025-05-22 21:01:41 +08:00
Chen Hongtao
25893366b6
Merge pull request #1328 from chenht2022/main
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Fix NaN bug
2025-05-21 11:46:48 +08:00
chenht2022
66453981ff
Fix NaN bug
2025-05-21 03:39:49 +00:00
Atream
7d79735bd0
Merge pull request #1323 from kvcache-ai/Atream-patch-2
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Update version
2025-05-19 23:21:56 +08:00
Atream
4f78e37625
Update version
2025-05-19 23:21:23 +08:00
Atream
01311d251d
Merge pull request #1320 from aubreyli/no_cuda_graph_err
...
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
VLinearMarlin: padding to input.shape[0] to avoid CUDA error
2025-05-18 02:45:05 -06:00
Aubrey Li
d347aeb518
VLinearMarlin: padding to input.shape[0] to avoid CUDA error
...
Fix the following runtime error with --no-use_cuda_graph option
Traceback (most recent call last):
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 282, in run_engine
engine.loop()
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 234, in loop
self.model_runner.run(self.batch, self.query_manager)
File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/balance_serve/inference/model_runner.py", line 220, in run
self.output.logits[0] = self.output.logits[0][self.input[cuda_graph_idx].minibatch.logits_start]
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-05-18 15:11:37 +08:00
wang jiahao
32f3d7befb
Merge pull request #1307 from kvcache-ai/hyc
...
add xpu parameters to install.sh
2025-05-17 15:25:33 +08:00
Alisehen
5b08d5b07b
fix
2025-05-17 07:22:51 +00:00
aubreyli
551ebc91c7
Merge pull request #1313 from rnwang04/update_ipex_llm_version
...
fix ipex-llm version to 2.3.0rc1
2025-05-16 13:28:12 +08:00
rnwang04
a56aa45186
fix ipex-llm version to 2.3.0rc1
2025-05-16 12:22:08 +08:00
aubreyli
a3346c9a41
Merge pull request #1311 from kvcache-ai/wx-csy-patch-1
...
XPU.md: fix typos
2025-05-16 08:28:23 +08:00
Shaoyuan CHEN
5d194c5db0
Fix typos
2025-05-15 22:15:55 +08:00
Alisehen
f3b1e36b6a
bug fix
2025-05-15 10:01:51 +00:00
Atream
7faa776659
Merge pull request #1277 from Coekjan/patch-1
...
Fix typo about `GLIBCXX_3.4.32`
2025-05-15 01:58:00 -06:00
Alisehen
edd9efa49e
fix
2025-05-15 07:28:50 +00:00
Alisehen
055680e26c
add flashinfer to cuda device
2025-05-15 07:03:45 +00:00
Alisehen
f3be33a313
add xpu parameters to install.sh
2025-05-15 06:39:02 +00:00
aubreyli
af9472b518
Merge pull request #1306 from aubreyli/xpu-doc
...
xpu.md: add device discovery tips
2025-05-15 14:18:47 +08:00
Aubrey Li
72f6d93ffd
xpu.md: add device discovery tips
2025-05-15 14:12:26 +08:00
wang jiahao
8caecf37d8
Merge pull request #1305 from kvcache-ai/update-readme
...
fix deduplicate_and_sort cudagraphs
2025-05-15 12:10:20 +08:00
qiyuxinlin
b40f13abeb
fix deduplicate_and_sort cudagraphs
2025-05-15 04:09:34 +00:00
aubreyli
09f0ddc00b
Merge pull request #1303 from rnwang04/fix_typo_and_style
...
fix typo and code style, and update setup.py ValueError message
2025-05-15 10:55:58 +08:00
rnwang04
2f6e14a54b
fix md typo, fix code style, and update setup value error message
2025-05-15 10:14:39 +00:00
Atream
07c5f23da5
Merge pull request #1304 from kvcache-ai/Atream-patch-1
...
Update README.md
2025-05-14 20:29:54 -06:00
Atream
d051a14941
Update README.md
2025-05-15 10:29:43 +08:00
wang jiahao
2d3aaef8b6
Merge pull request #1301 from kvcache-ai/update-readme
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
update readme
2025-05-14 21:15:55 +08:00
qiyuxinlin
d35d61f6a1
update readme
2025-05-14 13:15:18 +00:00
qiyuxinlin
c3d0ac80c6
update readme
2025-05-14 13:13:10 +00:00
wang jiahao
ee524b0f41
Merge pull request #1300 from kvcache-ai/qiyuxinlin-patch-1
...
Update install.sh
2025-05-14 21:09:20 +08:00
wang jiahao
9fe3f35c37
Update install.sh
2025-05-14 21:08:58 +08:00
aubreyli
f7ee993fdc
Merge pull request #1295 from rnwang04/xpu_support
...
Enable ktransformers on Intel GPU with local chat backend
2025-05-14 20:58:35 +08:00
rnwang04
142fb7ce6c
Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first
2025-05-14 19:37:27 +00:00
wang jiahao
333351c7c8
Merge pull request #1298 from kvcache-ai/fix-workspace-buffer
...
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
update norm cpu kernel
2025-05-14 17:50:40 +08:00
qiyuxinlin
ecc01cda17
update norm cpu kernel
2025-05-14 09:49:35 +00:00
wang jiahao
8974cc9d75
Merge pull request #1297 from kvcache-ai/fix-workspace-buffer
...
update torch MLA kernel
2025-05-14 17:46:55 +08:00
qiyuxinlin
64742bec83
update torch MLA kernel
2025-05-14 09:45:12 +00:00
wang jiahao
4e015ccc65
Merge pull request #1296 from kvcache-ai/fix-workspace-buffer
...
fix flashinfer float_workspace_buffer small
2025-05-14 17:35:27 +08:00
qiyuxinlin
e8e83308a9
fix flashinfer float_workspace_buffer small
2025-05-14 09:33:52 +00:00
wang jiahao
02948bc1b8
Merge pull request #1289 from kvcache-ai/update-default-config
...
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
update default config
2025-05-13 20:23:25 +08:00
qiyuxinlin
697444905a
update default config
2025-05-13 12:20:21 +00:00
wang jiahao
8456222852
Merge pull request #1276 from kvcache-ai/support_load_safetensor
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
support safetensor load, delete architectures argument
2025-05-12 11:10:26 +08:00
Yip Coekjan
1edc6d9de0
Fix typo about GLIBCXX_3.4.32
2025-05-09 20:34:45 +08:00
qiyuxinlin
c6aa379de2
support safetensor load, delete architectures argument
2025-05-09 10:38:29 +00:00
Atream
30eab48a75
Merge pull request #799 from aubreyli/cpu_offloading
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Restore CPU offloading capability
2025-05-09 00:38:54 -06:00
Atream
8025def197
Merge pull request #1246 from aubreyli/GenerationMixin
...
modeling_deepseek_v3: fix GenerationMixin warning
2025-05-09 00:35:15 -06:00
Atream
900a7f7c3e
Merge pull request #1271 from kvcache-ai/fix-AMX
...
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
fix AMX
2025-05-07 05:12:38 -06:00
Atream
b22cded890
fix AMX
2025-05-07 19:12:19 +08:00
Yaochen Han
3f14e311cb
Merge pull request #1247 from aubreyli/_get_logits_warper
...
ktransformers/utils: fix _get_logits_warper error
2025-05-07 15:22:35 +08:00
Aubrey Li
b3a1fcf471
ktransformers/utils: fix _get_logits_warper error
2025-05-01 08:13:09 +08:00