vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-07 21:19:51 +00:00

Author	SHA1	Message	Date
Atream	25cee5810e	add balance-serve, support concurrence	2025-03-31 22:55:32 +08:00
Atream	8d0292aa44	refactor folders	2025-03-31 22:45:37 +08:00
Azure-Tang	4a31237346	fix rocm compilation	2025-03-15 12:34:03 -04:00
Azure	117a8d2f2a	fix compilation	2025-03-14 19:49:20 +00:00
Azure-Tang	ed8437413b	merge main; Add torch q8 linear	2025-03-14 05:52:07 -04:00
liam	8eeb6dd432	⚡ update compile option for avx512vpopcntdq	2025-03-06 12:18:04 +08:00
Atream	50c691297f	Merge pull request #622 from akemimadoka/fix-msvc Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC	2025-02-27 17:42:00 +08:00
wkgcass	b2bff17775	fix numa cpu distribution The numa node location would be calculated based on the total number of worker threads. So we should always use the actual number of threads instead of using a min() op.	2025-02-26 14:49:57 +08:00
Azure	7e5962af3d	fix fp8 multi gpu; update FQA	2025-02-25 10:52:29 +00:00
Azure	5474be5299	Merge branch 'main' into develop-0.2.2	2025-02-25 09:04:22 +00:00
Atream	7b2a6690ab	Merge pull request #608 from makllama/fix_musa_ext musa: support bf16	2025-02-24 23:12:54 +08:00
Xiaodong Ye	f88c05a6f1	Ensure backward compatibility with Torch 2.2 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-24 21:55:30 +08:00
Azure	ca7366d2db	Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8	2025-02-24 11:58:10 +00:00
Azure	581a524f65	Add data loader to read special weights for fp8; Add special weight process script	2025-02-24 11:34:17 +00:00
akemimadoka	706e69f4fc	Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC	2025-02-24 01:37:50 +08:00
Xiaodong Ye	18b1d18367	musa: support bf16 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-23 10:19:19 +08:00
Azure	7b7c6a657d	Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225 '	2025-02-22 13:05:08 +00:00
Atream	f7f1059873	fix merge bug, this branch also padding Marlin	2025-02-22 09:00:09 +00:00
Atream	024009675e	Merge branch 'main' into feat-more-context	2025-02-22 06:17:39 +00:00
Atream	5ec33d046d	optimize gguf dequant, save mem, support Q2_K use marlin for lm_head, lm_head only calc last token for prefill extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM	2025-02-22 06:13:01 +00:00
Atream	7e1fe256c8	optimize GPU	2025-02-21 05:06:57 +00:00
Xiaodong Ye	2207f6cd14	feat: Support Moore Threads GPU Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-19 18:26:55 +08:00
Atream	c189d55bd1	toy support for experts on GPU, no CUDA Graph	2025-02-15 15:16:00 +00:00
liam	098602b08f	⚡ v0.2 ongoing	2025-02-09 22:41:14 +08:00
liam	3dca28d23b	⚡ fix moe.cpp int overflow problem	2025-02-06 22:39:16 +08:00
chenht2022	14869b55ad	Adapt Windows	2024-10-09 11:08:32 +00:00
Chen Hongtao	b4904537e3	Merge pull request #83 from sayap/task-queue-cond-var Use cond var to avoid busy loop	2024-10-09 18:57:17 +08:00
Azure	3758afb526	fix some dequant function dosen't support multi gpu bug	2024-09-13 08:34:23 +00:00
Yap Sok Ann	6666d62237	Use cond var to avoid busy loop	2024-09-11 16:10:54 +07:00
Yap Sok Ann	be356c1b8d	Support IQ4_XS dequantize	2024-09-02 09:10:19 +07:00
chenxl	4d1d561d28	[feature] release 0.1.3	2024-08-28 16:11:43 +00:00
BITcyman	7c4cb520bd	[feature] support q2_k & q3_k dequantize on gpu	2024-08-12 12:53:12 +00:00
chenxl	650c368c18	Merge remote-tracking branch 'upstream/main' into develop-0.1.2	2024-08-12 12:31:49 +00:00
Atream	3c675af61a	Update task_queue.h	2024-08-12 20:06:19 +08:00
chenxl	f5f79f5c0e	[ADD] support multi-gpu qlen>1 q5_k	2024-08-12 11:41:26 +00:00
chenxl	782a17e4e6	[feature] add bat for windows, update readme	2024-08-09 09:39:42 +00:00
chenht2022	c1cc7d2cd2	1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic.	2024-08-08 09:04:36 +00:00
chenxl	1d9d397525	fix some bug in compile in linux	2024-08-08 15:34:19 +08:00
Atream	0a2fd52cea	support windows support q4_0 and q5_0 dequant on cpu Add CopyRight from pygguf(It was added before, but disappear after merge). Add some TODO in the code.	2024-08-08 15:34:02 +08:00
chenxl	112cb3c962	[feature] support python 310 and multi instruction	2024-07-31 13:58:17 +00:00
chenxl	18c42e67df	Initial commit	2024-07-27 16:06:58 +08:00

41 commits