Commit graph

323 commits

Author SHA1 Message Date
Atream
5bd40c33eb
Update __init__.py 2025-07-01 16:43:19 +08:00
rnwang04
5b5deda420 revert using FP16 2025-07-01 14:24:27 +08:00
ouqingliang
90cff820cf update kvc disk path config. 2025-06-30 15:09:35 +00:00
ouqingliang
3b4a1c7532 add prefix cache support for kvc2. 2025-06-26 04:57:25 +00:00
Aubrey Li
b599111b04 Load DS-R1-0528 for module is BaseInjectedModule instance 2025-06-10 11:31:58 +08:00
Ye Zhou
255c0fcf3b Fix kv_b_proj shape for unsloth quantized models 2025-06-05 17:33:11 +08:00
Atream
7071970339
Merge pull request #1343 from zhouye/main
Mirror #1247 in server mode
2025-06-02 16:07:04 +08:00
qiyuxinlin
a6b3243a56 [Patch] lload DeepSeek-R1-0528 2025-05-31 14:19:20 +00:00
Emmanuel Ferdman
d8bc6402b5
raise exception on device error (#1342)
Some checks failed
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
* display the unavailable torch device on error

* Raise exception on device error

---------

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-29 15:37:41 +08:00
Ye Zhou
00949d5e8d Mirror #1247 in server mode 2025-05-29 15:30:40 +08:00
qiyuxinlin
71a5fc5770 fix local_chat.py chunk_size not effect experts 2025-05-23 02:35:01 +00:00
rnwang04
adc0906967 add XPU support for qwen3moe local chat 2025-05-22 21:01:41 +08:00
Atream
4f78e37625
Update version 2025-05-19 23:21:23 +08:00
Aubrey Li
d347aeb518 VLinearMarlin: padding to input.shape[0] to avoid CUDA error
Fix the following runtime error with --no-use_cuda_graph option

Traceback (most recent call last):
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 282, in run_engine
    engine.loop()
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 234, in loop
    self.model_runner.run(self.batch, self.query_manager)
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/balance_serve/inference/model_runner.py", line 220, in run
    self.output.logits[0] = self.output.logits[0][self.input[cuda_graph_idx].minibatch.logits_start]
                            ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-05-18 15:11:37 +08:00
wang jiahao
8caecf37d8
Merge pull request #1305 from kvcache-ai/update-readme
fix deduplicate_and_sort cudagraphs
2025-05-15 12:10:20 +08:00
qiyuxinlin
b40f13abeb fix deduplicate_and_sort cudagraphs 2025-05-15 04:09:34 +00:00
rnwang04
2f6e14a54b fix md typo, fix code style, and update setup value error message 2025-05-15 10:14:39 +00:00
rnwang04
142fb7ce6c Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first 2025-05-14 19:37:27 +00:00
qiyuxinlin
ecc01cda17 update norm cpu kernel 2025-05-14 09:49:35 +00:00
qiyuxinlin
64742bec83 update torch MLA kernel 2025-05-14 09:45:12 +00:00
qiyuxinlin
e8e83308a9 fix flashinfer float_workspace_buffer small 2025-05-14 09:33:52 +00:00
qiyuxinlin
697444905a update default config 2025-05-13 12:20:21 +00:00
wang jiahao
8456222852
Merge pull request #1276 from kvcache-ai/support_load_safetensor
Some checks failed
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
support safetensor load, delete architectures argument
2025-05-12 11:10:26 +08:00
qiyuxinlin
c6aa379de2 support safetensor load, delete architectures argument 2025-05-09 10:38:29 +00:00
Atream
30eab48a75
Merge pull request #799 from aubreyli/cpu_offloading
Some checks failed
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Restore CPU offloading capability
2025-05-09 00:38:54 -06:00
Atream
8025def197
Merge pull request #1246 from aubreyli/GenerationMixin
modeling_deepseek_v3: fix GenerationMixin warning
2025-05-09 00:35:15 -06:00
Aubrey Li
b3a1fcf471 ktransformers/utils: fix _get_logits_warper error 2025-05-01 08:13:09 +08:00
Aubrey Li
def1ec7683 modeling_deepseek_v3: fix GenerationMixin warning
Fix GenerationMixin warning introduced by upgrading transformers to 4.51.3.
2025-05-01 07:48:15 +08:00
Atream
7adb7281f4 fix-cache-lens 2025-04-30 03:37:43 +00:00
qiyuxinlin
48dfbc8f9f change inject yaml 2025-04-29 08:09:39 +00:00
Atream
0f7a3e5fea fix-client 2025-04-29 12:34:20 +08:00
Atream
b0318fc01c fix-hopper-flashinfer 2025-04-29 11:06:34 +08:00
Atream
e8b2bf4f7b
Update version info in __init__.py 2025-04-29 09:58:40 +08:00
qiyuxinlin
27990dc6fb fix load bug 2025-04-28 21:08:13 +00:00
djw
33cbd47086 support qwen3 2025-04-28 18:15:35 +00:00
djw
68c2b2e6e6 support qwen3 2025-04-28 18:02:07 +00:00
djw
0da3792b27 support qwen3 2025-04-28 14:05:24 +00:00
djw
3f9bbf1181 support qwen3, dont speak human language 2025-04-28 08:44:47 +00:00
chenht2022
f3d842a0ca support AMX 2025-04-25 14:47:16 +00:00
qiyuxinlin
7af83f9efb fix load default max_new_tokens 2025-04-25 04:20:12 +00:00
Atream
46493789eb
fix chat template encoding 2025-04-24 12:44:16 +08:00
Alisehen
f7d939313b Merge remote-tracking branch 'origin/main' into check-para 2025-04-23 02:40:14 +00:00
Alisehen
99540ad01f add check parameters 2025-04-23 02:38:43 +00:00
wang jiahao
7e4813e8ad
Merge pull request #1184 from kvcache-ai/update_param
Some checks failed
Book-CI / test (push) Failing after 3s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
change test
2025-04-22 20:55:11 +08:00
qiyuxinlin
3a044e6b14 change test 2025-04-22 12:50:39 +00:00
Alisehen
c995bdbbfa add check-para 2025-04-22 09:30:08 +00:00
qiyuxinlin
4f9950e30c kill serve lead to kill sched and engine 2025-04-22 09:25:44 +00:00
qiyuxinlin
b17ab8653c update speed test 2025-04-22 07:38:05 +00:00
qiyuxinlin
f5287e908a fix no balance_serve import error 2025-04-22 02:11:18 +00:00
qiyuxinlin
03a65d6bea roll back ktransformers backend, add max_tokens, max_completion_tokens param 2025-04-21 12:55:37 +00:00