vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-16 18:19:42 +00:00

Author	SHA1	Message	Date
Jesse	e204a0bb6b	Merge `8c8cb207aa` into `ee2ede0412`	2025-08-05 15:24:17 +08:00
qiyuxinlin	9e1560bb82	GLM4 and SmallThinker	2025-07-25 16:56:36 +00:00
djw	17246bf84f	support smt and glm4	2025-07-25 15:03:27 +00:00
djw	48bc6185b5	support smt and qlm4	2025-07-25 12:48:51 +00:00
qiyuxinlin	712ad1fa3c	smallthinker right	2025-07-25 12:46:14 +00:00
Qiu Chengyu	f8719ee7b9	Add use_silu in MOEConfig in python and hard-determine smallthinker	2025-07-25 11:22:31 +00:00
qiyuxinlin	71c1d4eed7	smallthink run	2025-07-24 15:08:29 +00:00
djw	590fcb41cd	support smt and glm4	2025-07-24 12:31:01 +00:00
djw	613f0b7c37	support smt and glm4	2025-07-24 09:39:19 +00:00
djw	b66d96db97	support smt and glm4	2025-07-24 08:40:58 +00:00
wang jiahao	a2e95e467a	Update balance_serve.py	2025-07-12 13:14:35 +08:00
Jesse CreateThis	8c8cb207aa	Apply magikRUKKOLA's patch from issue #1417	2025-07-06 19:45:06 +00:00
Atream	5bd40c33eb	Update __init__.py	2025-07-01 16:43:19 +08:00
rnwang04	5b5deda420	revert using FP16	2025-07-01 14:24:27 +08:00
ouqingliang	90cff820cf	update kvc disk path config.	2025-06-30 15:09:35 +00:00
ouqingliang	3b4a1c7532	add prefix cache support for kvc2.	2025-06-26 04:57:25 +00:00
Aubrey Li	b599111b04	Load DS-R1-0528 for module is BaseInjectedModule instance	2025-06-10 11:31:58 +08:00
Ye Zhou	255c0fcf3b	Fix kv_b_proj shape for unsloth quantized models	2025-06-05 17:33:11 +08:00
Atream	7071970339	Merge pull request #1343 from zhouye/main Mirror #1247 in server mode	2025-06-02 16:07:04 +08:00
qiyuxinlin	a6b3243a56	[Patch] lload DeepSeek-R1-0528	2025-05-31 14:19:20 +00:00
Emmanuel Ferdman	d8bc6402b5	raise exception on device error (#1342 ) Some checks failed Book-CI / test (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Deploy / deploy (ubuntu-latest) (push) Has been cancelled Details Deploy / deploy (windows-latest) (push) Has been cancelled Details * display the unavailable torch device on error * Raise exception on device error --------- Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-05-29 15:37:41 +08:00
Ye Zhou	00949d5e8d	Mirror #1247 in server mode	2025-05-29 15:30:40 +08:00
qiyuxinlin	71a5fc5770	fix local_chat.py chunk_size not effect experts	2025-05-23 02:35:01 +00:00
rnwang04	adc0906967	add XPU support for qwen3moe local chat	2025-05-22 21:01:41 +08:00
Atream	4f78e37625	Update version	2025-05-19 23:21:23 +08:00
Aubrey Li	d347aeb518	VLinearMarlin: padding to input.shape[0] to avoid CUDA error Fix the following runtime error with --no-use_cuda_graph option Traceback (most recent call last): File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(self._args, *self._kwargs) File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 282, in run_engine engine.loop() File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 234, in loop self.model_runner.run(self.batch, self.query_manager) File "/home/aubrey/miniforge3/envs/kt/lib/python3.11/site-packages/ktransformers/server/balance_serve/inference/model_runner.py", line 220, in run self.output.logits[0] = self.output.logits[0][self.input[cuda_graph_idx].minibatch.logits_start] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.	2025-05-18 15:11:37 +08:00
wang jiahao	8caecf37d8	Merge pull request #1305 from kvcache-ai/update-readme fix deduplicate_and_sort cudagraphs	2025-05-15 12:10:20 +08:00
qiyuxinlin	b40f13abeb	fix deduplicate_and_sort cudagraphs	2025-05-15 04:09:34 +00:00
rnwang04	2f6e14a54b	fix md typo, fix code style, and update setup value error message	2025-05-15 10:14:39 +00:00
rnwang04	142fb7ce6c	Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first	2025-05-14 19:37:27 +00:00
qiyuxinlin	ecc01cda17	update norm cpu kernel	2025-05-14 09:49:35 +00:00
qiyuxinlin	64742bec83	update torch MLA kernel	2025-05-14 09:45:12 +00:00
qiyuxinlin	e8e83308a9	fix flashinfer float_workspace_buffer small	2025-05-14 09:33:52 +00:00
qiyuxinlin	697444905a	update default config	2025-05-13 12:20:21 +00:00
wang jiahao	8456222852	Merge pull request #1276 from kvcache-ai/support_load_safetensor Some checks failed Book-CI / test (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Deploy / deploy (ubuntu-latest) (push) Has been cancelled Details Deploy / deploy (windows-latest) (push) Has been cancelled Details support safetensor load, delete architectures argument	2025-05-12 11:10:26 +08:00
qiyuxinlin	c6aa379de2	support safetensor load, delete architectures argument	2025-05-09 10:38:29 +00:00
Atream	30eab48a75	Merge pull request #799 from aubreyli/cpu_offloading Some checks failed Book-CI / test (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Deploy / deploy (ubuntu-latest) (push) Has been cancelled Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Restore CPU offloading capability	2025-05-09 00:38:54 -06:00
Atream	8025def197	Merge pull request #1246 from aubreyli/GenerationMixin modeling_deepseek_v3: fix GenerationMixin warning	2025-05-09 00:35:15 -06:00
Aubrey Li	b3a1fcf471	ktransformers/utils: fix _get_logits_warper error	2025-05-01 08:13:09 +08:00
Aubrey Li	def1ec7683	modeling_deepseek_v3: fix GenerationMixin warning Fix GenerationMixin warning introduced by upgrading transformers to 4.51.3.	2025-05-01 07:48:15 +08:00
Atream	7adb7281f4	fix-cache-lens	2025-04-30 03:37:43 +00:00
qiyuxinlin	48dfbc8f9f	change inject yaml	2025-04-29 08:09:39 +00:00
Atream	0f7a3e5fea	fix-client	2025-04-29 12:34:20 +08:00
Atream	b0318fc01c	fix-hopper-flashinfer	2025-04-29 11:06:34 +08:00
Atream	e8b2bf4f7b	Update version info in __init__.py	2025-04-29 09:58:40 +08:00
qiyuxinlin	27990dc6fb	fix load bug	2025-04-28 21:08:13 +00:00
djw	33cbd47086	support qwen3	2025-04-28 18:15:35 +00:00
djw	68c2b2e6e6	support qwen3	2025-04-28 18:02:07 +00:00
djw	0da3792b27	support qwen3	2025-04-28 14:05:24 +00:00
djw	3f9bbf1181	support qwen3, dont speak human language	2025-04-28 08:44:47 +00:00

1 2 3 4 5 ...

335 commits