Commit graph

335 commits

Author SHA1 Message Date
Jesse
e204a0bb6b
Merge 8c8cb207aa into ee2ede0412 2025-08-05 15:24:17 +08:00
qiyuxinlin
9e1560bb82 GLM4 and SmallThinker 2025-07-25 16:56:36 +00:00
djw
17246bf84f support smt and glm4 2025-07-25 15:03:27 +00:00
djw
48bc6185b5 support smt and qlm4 2025-07-25 12:48:51 +00:00
qiyuxinlin
712ad1fa3c smallthinker right 2025-07-25 12:46:14 +00:00
Qiu Chengyu
f8719ee7b9 Add use_silu in MOEConfig in python and hard-determine smallthinker 2025-07-25 11:22:31 +00:00
qiyuxinlin
71c1d4eed7 smallthink run 2025-07-24 15:08:29 +00:00
djw
590fcb41cd support smt and glm4 2025-07-24 12:31:01 +00:00
djw
613f0b7c37 support smt and glm4 2025-07-24 09:39:19 +00:00
djw
b66d96db97 support smt and glm4 2025-07-24 08:40:58 +00:00
wang jiahao
a2e95e467a
Update balance_serve.py 2025-07-12 13:14:35 +08:00
Jesse CreateThis
8c8cb207aa Apply magikRUKKOLA's patch from issue #1417 2025-07-06 19:45:06 +00:00
Atream
5bd40c33eb
Update __init__.py 2025-07-01 16:43:19 +08:00
rnwang04
5b5deda420 revert using FP16 2025-07-01 14:24:27 +08:00
ouqingliang
90cff820cf update kvc disk path config. 2025-06-30 15:09:35 +00:00
ouqingliang
3b4a1c7532 add prefix cache support for kvc2. 2025-06-26 04:57:25 +00:00
Aubrey Li
b599111b04 Load DS-R1-0528 for module is BaseInjectedModule instance 2025-06-10 11:31:58 +08:00
Ye Zhou
255c0fcf3b Fix kv_b_proj shape for unsloth quantized models 2025-06-05 17:33:11 +08:00
Atream
7071970339
Merge pull request #1343 from zhouye/main
Mirror #1247 in server mode
2025-06-02 16:07:04 +08:00
qiyuxinlin
a6b3243a56 [Patch] lload DeepSeek-R1-0528 2025-05-31 14:19:20 +00:00
Emmanuel Ferdman
d8bc6402b5
raise exception on device error (#1342)
Some checks failed
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
* display the unavailable torch device on error

* Raise exception on device error

---------

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-29 15:37:41 +08:00
Ye Zhou
00949d5e8d Mirror #1247 in server mode 2025-05-29 15:30:40 +08:00
qiyuxinlin
71a5fc5770 fix local_chat.py chunk_size not effect experts 2025-05-23 02:35:01 +00:00
rnwang04
adc0906967 add XPU support for qwen3moe local chat 2025-05-22 21:01:41 +08:00
Atream
4f78e37625
Update version 2025-05-19 23:21:23 +08:00
Aubrey Li
d347aeb518 VLinearMarlin: padding to input.shape[0] to avoid CUDA error
Fix the following runtime error with --no-use_cuda_graph option

Traceback (most recent call last):
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 282, in run_engine
    engine.loop()
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 234, in loop
    self.model_runner.run(self.batch, self.query_manager)
  File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/balance_serve/inference/model_runner.py", line 220, in run
    self.output.logits[0] = self.output.logits[0][self.input[cuda_graph_idx].minibatch.logits_start]
                            ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
2025-05-18 15:11:37 +08:00
wang jiahao
8caecf37d8
Merge pull request #1305 from kvcache-ai/update-readme
fix deduplicate_and_sort cudagraphs
2025-05-15 12:10:20 +08:00
qiyuxinlin
b40f13abeb fix deduplicate_and_sort cudagraphs 2025-05-15 04:09:34 +00:00
rnwang04
2f6e14a54b fix md typo, fix code style, and update setup value error message 2025-05-15 10:14:39 +00:00
rnwang04
142fb7ce6c Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first 2025-05-14 19:37:27 +00:00
qiyuxinlin
ecc01cda17 update norm cpu kernel 2025-05-14 09:49:35 +00:00
qiyuxinlin
64742bec83 update torch MLA kernel 2025-05-14 09:45:12 +00:00
qiyuxinlin
e8e83308a9 fix flashinfer float_workspace_buffer small 2025-05-14 09:33:52 +00:00
qiyuxinlin
697444905a update default config 2025-05-13 12:20:21 +00:00
wang jiahao
8456222852
Merge pull request #1276 from kvcache-ai/support_load_safetensor
Some checks failed
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
support safetensor load, delete architectures argument
2025-05-12 11:10:26 +08:00
qiyuxinlin
c6aa379de2 support safetensor load, delete architectures argument 2025-05-09 10:38:29 +00:00
Atream
30eab48a75
Merge pull request #799 from aubreyli/cpu_offloading
Some checks failed
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Restore CPU offloading capability
2025-05-09 00:38:54 -06:00
Atream
8025def197
Merge pull request #1246 from aubreyli/GenerationMixin
modeling_deepseek_v3: fix GenerationMixin warning
2025-05-09 00:35:15 -06:00
Aubrey Li
b3a1fcf471 ktransformers/utils: fix _get_logits_warper error 2025-05-01 08:13:09 +08:00
Aubrey Li
def1ec7683 modeling_deepseek_v3: fix GenerationMixin warning
Fix GenerationMixin warning introduced by upgrading transformers to 4.51.3.
2025-05-01 07:48:15 +08:00
Atream
7adb7281f4 fix-cache-lens 2025-04-30 03:37:43 +00:00
qiyuxinlin
48dfbc8f9f change inject yaml 2025-04-29 08:09:39 +00:00
Atream
0f7a3e5fea fix-client 2025-04-29 12:34:20 +08:00
Atream
b0318fc01c fix-hopper-flashinfer 2025-04-29 11:06:34 +08:00
Atream
e8b2bf4f7b
Update version info in __init__.py 2025-04-29 09:58:40 +08:00
qiyuxinlin
27990dc6fb fix load bug 2025-04-28 21:08:13 +00:00
djw
33cbd47086 support qwen3 2025-04-28 18:15:35 +00:00
djw
68c2b2e6e6 support qwen3 2025-04-28 18:02:07 +00:00
djw
0da3792b27 support qwen3 2025-04-28 14:05:24 +00:00
djw
3f9bbf1181 support qwen3, dont speak human language 2025-04-28 08:44:47 +00:00