vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-09 22:05:30 +00:00

Author	SHA1	Message	Date
Atream	1d5d5faef6	Merge pull request #626 from cyhasuka/main Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM	2025-02-27 12:13:10 +08:00
Atream	8db6a4d402	Merge branch 'main' into main	2025-02-27 12:12:32 +08:00
wang jiahao	3c8c580580	Merge pull request #691 from swu-hyk/ollama_api_chat feat:implementation of chat routing for Ollama	2025-02-27 11:17:48 +08:00
Azure	ca93cf7548	Merge pull request #702 from Azure-Tang/update-readme [UPDATE] Update documents.	2025-02-26 23:45:24 +08:00
Azure	c05ebb74b1	Update fp8 doc; Update install.md broken link	2025-02-26 15:43:08 +00:00
Atream	3ebe17eb63	Merge pull request #699 from kvcache-ai/Atream-patch-1 Update DeepseekR1_V3_tutorial.md	2025-02-26 22:04:45 +08:00
Atream	369f4d917d	Update DeepseekR1_V3_tutorial.md	2025-02-26 22:04:29 +08:00
Atream	9650893adc	Merge pull request #697 from kvcache-ai/fix-yaml Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml	2025-02-26 21:54:01 +08:00
Atream	90eb87b3fc	Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml	2025-02-26 21:53:50 +08:00
swu-hyk	ec7e912fee	modify	2025-02-26 19:21:30 +08:00
swu-hyk	68e7df3a25	implementation of chat routing for Ollama	2025-02-26 17:05:00 +08:00
Chen Hongtao	9660b2cc1e	Merge pull request #685 from vproxy-tools/main fix numa cpu distribution	2025-02-26 15:35:19 +08:00
ZiWei Yuan	e7ebb26370	Merge pull request #684 from KMSorSMS/main fix dockerfile in devcontainer and fix expert torch	2025-02-26 15:06:51 +08:00
liam	ffb86c66e3	⚡ fix experts torch	2025-02-26 15:04:40 +08:00
liam	de082f141c	⚡ fix cd error	2025-02-26 14:54:47 +08:00
wkgcass	b2bff17775	fix numa cpu distribution The numa node location would be calculated based on the total number of worker threads. So we should always use the actual number of threads instead of using a min() op.	2025-02-26 14:49:57 +08:00
Azure	99f6e42113	Merge pull request #668 from KMSorSMS/main 📝 update benchmark.md	2025-02-26 00:21:09 +08:00
liam	3ad12751cf	📝 update more detail and fix typo	2025-02-26 00:17:02 +08:00
Azure	31bc990677	Merge pull request #667 from Azure-Tang/update-readme [update] Update doc.	2025-02-26 00:01:46 +08:00
liam	05339ad0ef	📝 update benchmark.md	2025-02-25 23:57:58 +08:00
Azure	bb6920ed72	update doc	2025-02-25 15:46:15 +00:00
ZiWei Yuan	9c71bcb0bb	Merge pull request #665 from KMSorSMS/v0.2.2rc1 ⚡ release v0.2.2rc1	2025-02-25 22:07:19 +08:00
liam	ddf3339339	⚡ release v0.2.2rc1	2025-02-25 22:06:36 +08:00
Azure	8333a4d874	Merge pull request #663 from kvcache-ai/develop-0.2.2 [release] Release 0.2.2rc.	2025-02-25 21:47:36 +08:00
Azure	c6e4e1c3c5	Merge pull request #662 from Azure-Tang/support-fp8 [update] Update readme.	2025-02-25 21:45:19 +08:00
Azure	91c1619296	Merge branch 'develop-0.2.2' into support-fp8 Update README.md	2025-02-25 13:43:26 +00:00
Atream	13974eb264	Update DeepseekR1_V3_tutorial.md	2025-02-25 21:36:52 +08:00
Atream	03f8bc9f79	Update DeepseekR1_V3_tutorial.md add long context	2025-02-25 21:35:31 +08:00
Azure	2c0cce90d0	add fp8 multi gpu yaml example	2025-02-25 13:32:09 +00:00
Atream	d9b2895bd3	Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2	2025-02-25 12:47:48 +00:00
Atream	477ac28a9c	fix-update-flashinfer_wrapper_local_chat	2025-02-25 12:47:31 +00:00
Azure	7e5962af3d	fix fp8 multi gpu; update FQA	2025-02-25 10:52:29 +00:00
ZiWei Yuan	89b55052b8	Merge pull request #659 from KMSorSMS/develop-0.2.2 📝 add benchmark.md	2025-02-25 17:47:05 +08:00
liam	1b5ac67fca	📝 add benchmark.md	2025-02-25 17:45:17 +08:00
ZiWei Yuan	1aa10e93b3	Merge pull request #658 from KMSorSMS/develop-0.2.2 ⚡ update git ignore add docker dev container	2025-02-25 17:22:34 +08:00
liam	0ca0b99fab	⚡ update git ignore add docker dev container	2025-02-25 17:22:11 +08:00
Azure	5474be5299	Merge branch 'main' into develop-0.2.2	2025-02-25 09:04:22 +00:00
Azure	021822dd01	update FAQ	2025-02-25 09:02:32 +00:00
Atream	b443c7dfa2	Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill Feat absorb for long prefill	2025-02-25 16:53:21 +08:00
Atream	f4c198bd42	support absorb for prefill long context	2025-02-25 08:52:02 +00:00
Azure	050b745a6e	Merge pull request #643 from Azure-Tang/support-fp8 [feat] Support fp8 linear kernel;	2025-02-25 16:22:12 +08:00
Azure	36fbeee341	Update doc	2025-02-25 08:21:18 +00:00
Azure	4dc5518e4d	update fp8 kernel tutorial	2025-02-24 15:37:01 +00:00
Atream	7b2a6690ab	Merge pull request #608 from makllama/fix_musa_ext musa: support bf16	2025-02-24 23:12:54 +08:00
Atream	6f9ea689a9	Merge pull request #645 from makllama/torch2.2 Ensure backward compatibility with PyTorch 2.2	2025-02-24 23:12:33 +08:00
Xiaodong Ye	f88c05a6f1	Ensure backward compatibility with Torch 2.2 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-24 21:55:30 +08:00
Azure	ca7366d2db	Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8	2025-02-24 11:58:10 +00:00
Azure	581a524f65	Add data loader to read special weights for fp8; Add special weight process script	2025-02-24 11:34:17 +00:00
Atream	e9b1216a9a	Merge branch 'main' into feat-absorb-for-long-prefill	2025-02-24 09:44:17 +00:00
Atream	4b5991e77e	Merge pull request #638 from kvcache-ai/feat-moonlight fix KExpertsMarlin on GPU with out CUDA Graph	2025-02-24 17:32:05 +08:00

1 2 3 4 5 ...

369 commits