vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-11 07:44:35 +00:00

Author	SHA1	Message	Date
swu-hyk	ec7e912fee	modify	2025-02-26 19:21:30 +08:00
swu-hyk	68e7df3a25	implementation of chat routing for Ollama	2025-02-26 17:05:00 +08:00
Chen Hongtao	9660b2cc1e	Merge pull request #685 from vproxy-tools/main fix numa cpu distribution	2025-02-26 15:35:19 +08:00
ZiWei Yuan	e7ebb26370	Merge pull request #684 from KMSorSMS/main fix dockerfile in devcontainer and fix expert torch	2025-02-26 15:06:51 +08:00
liam	ffb86c66e3	⚡ fix experts torch	2025-02-26 15:04:40 +08:00
liam	de082f141c	⚡ fix cd error	2025-02-26 14:54:47 +08:00
wkgcass	b2bff17775	fix numa cpu distribution The numa node location would be calculated based on the total number of worker threads. So we should always use the actual number of threads instead of using a min() op.	2025-02-26 14:49:57 +08:00
akemimadoka	8817777e11	Fix RuntimeError on Windows caused by integer overflow in np.prod	2025-02-26 03:50:12 +08:00
Azure	99f6e42113	Merge pull request #668 from KMSorSMS/main 📝 update benchmark.md	2025-02-26 00:21:09 +08:00
liam	3ad12751cf	📝 update more detail and fix typo	2025-02-26 00:17:02 +08:00
Azure	31bc990677	Merge pull request #667 from Azure-Tang/update-readme [update] Update doc.	2025-02-26 00:01:46 +08:00
liam	05339ad0ef	📝 update benchmark.md	2025-02-25 23:57:58 +08:00
Azure	bb6920ed72	update doc	2025-02-25 15:46:15 +00:00
ZiWei Yuan	9c71bcb0bb	Merge pull request #665 from KMSorSMS/v0.2.2rc1 ⚡ release v0.2.2rc1	2025-02-25 22:07:19 +08:00
liam	ddf3339339	⚡ release v0.2.2rc1	2025-02-25 22:06:36 +08:00
Azure	8333a4d874	Merge pull request #663 from kvcache-ai/develop-0.2.2 [release] Release 0.2.2rc.	2025-02-25 21:47:36 +08:00
Azure	c6e4e1c3c5	Merge pull request #662 from Azure-Tang/support-fp8 [update] Update readme.	2025-02-25 21:45:19 +08:00
Azure	91c1619296	Merge branch 'develop-0.2.2' into support-fp8 Update README.md	2025-02-25 13:43:26 +00:00
Atream	13974eb264	Update DeepseekR1_V3_tutorial.md	2025-02-25 21:36:52 +08:00
Atream	03f8bc9f79	Update DeepseekR1_V3_tutorial.md add long context	2025-02-25 21:35:31 +08:00
Azure	2c0cce90d0	add fp8 multi gpu yaml example	2025-02-25 13:32:09 +00:00
Atream	d9b2895bd3	Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2	2025-02-25 12:47:48 +00:00
Atream	477ac28a9c	fix-update-flashinfer_wrapper_local_chat	2025-02-25 12:47:31 +00:00
Azure	7e5962af3d	fix fp8 multi gpu; update FQA	2025-02-25 10:52:29 +00:00
ZiWei Yuan	89b55052b8	Merge pull request #659 from KMSorSMS/develop-0.2.2 📝 add benchmark.md	2025-02-25 17:47:05 +08:00
liam	1b5ac67fca	📝 add benchmark.md	2025-02-25 17:45:17 +08:00
ZiWei Yuan	1aa10e93b3	Merge pull request #658 from KMSorSMS/develop-0.2.2 ⚡ update git ignore add docker dev container	2025-02-25 17:22:34 +08:00
liam	0ca0b99fab	⚡ update git ignore add docker dev container	2025-02-25 17:22:11 +08:00
Azure	5474be5299	Merge branch 'main' into develop-0.2.2	2025-02-25 09:04:22 +00:00
Azure	021822dd01	update FAQ	2025-02-25 09:02:32 +00:00
Atream	b443c7dfa2	Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill Feat absorb for long prefill	2025-02-25 16:53:21 +08:00
Atream	f4c198bd42	support absorb for prefill long context	2025-02-25 08:52:02 +00:00
Azure	050b745a6e	Merge pull request #643 from Azure-Tang/support-fp8 [feat] Support fp8 linear kernel;	2025-02-25 16:22:12 +08:00
Azure	36fbeee341	Update doc	2025-02-25 08:21:18 +00:00
ceerrep	f639fbc19e	feat: basic api key support	2025-02-25 14:11:39 +08:00
Azure	4dc5518e4d	update fp8 kernel tutorial	2025-02-24 15:37:01 +00:00
Atream	7b2a6690ab	Merge pull request #608 from makllama/fix_musa_ext musa: support bf16	2025-02-24 23:12:54 +08:00
Atream	6f9ea689a9	Merge pull request #645 from makllama/torch2.2 Ensure backward compatibility with PyTorch 2.2	2025-02-24 23:12:33 +08:00
Xiaodong Ye	f88c05a6f1	Ensure backward compatibility with Torch 2.2 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-24 21:55:30 +08:00
lazymio	07eb712a73	Left out	2025-02-24 21:51:14 +08:00
lazymio	91062a834f	Default values	2025-02-24 21:38:01 +08:00
lazymio	76487c4dcb	Revert repetition_penalty as it is not in API spec	2025-02-24 21:30:03 +08:00
lazymio	05ad288453	Also /chat/completions	2025-02-24 21:08:36 +08:00
lazymio	bf36547f98	Also allow repetition_penalty	2025-02-24 21:07:35 +08:00
lazymio	8704c09192	Allow temperature and top_p from requests	2025-02-24 21:01:33 +08:00
Azure	ca7366d2db	Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8	2025-02-24 11:58:10 +00:00
Azure	581a524f65	Add data loader to read special weights for fp8; Add special weight process script	2025-02-24 11:34:17 +00:00
Atream	e9b1216a9a	Merge branch 'main' into feat-absorb-for-long-prefill	2025-02-24 09:44:17 +00:00
Atream	4b5991e77e	Merge pull request #638 from kvcache-ai/feat-moonlight fix KExpertsMarlin on GPU with out CUDA Graph	2025-02-24 17:32:05 +08:00
Atream	f327695079	fix KExpertsMarlin on GPU with out CUDA Graph	2025-02-24 09:30:54 +00:00

... 9 10 11 12 13 ...

877 commits