Commit graph

41 commits

Author SHA1 Message Date
Atream
25cee5810e add balance-serve, support concurrence 2025-03-31 22:55:32 +08:00
Atream
8d0292aa44 refactor folders 2025-03-31 22:45:37 +08:00
Azure-Tang
4a31237346 fix rocm compilation 2025-03-15 12:34:03 -04:00
Azure
117a8d2f2a fix compilation 2025-03-14 19:49:20 +00:00
Azure-Tang
ed8437413b merge main; Add torch q8 linear 2025-03-14 05:52:07 -04:00
liam
8eeb6dd432 update compile option for avx512vpopcntdq 2025-03-06 12:18:04 +08:00
Atream
50c691297f
Merge pull request #622 from akemimadoka/fix-msvc
Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC
2025-02-27 17:42:00 +08:00
wkgcass
b2bff17775 fix numa cpu distribution
The numa node location would be calculated based on the total number
of worker threads.
So we should always use the actual number of threads instead of using a min() op.
2025-02-26 14:49:57 +08:00
Azure
7e5962af3d fix fp8 multi gpu; update FQA 2025-02-25 10:52:29 +00:00
Azure
5474be5299 Merge branch 'main' into develop-0.2.2 2025-02-25 09:04:22 +00:00
Atream
7b2a6690ab
Merge pull request #608 from makllama/fix_musa_ext
musa: support bf16
2025-02-24 23:12:54 +08:00
Xiaodong Ye
f88c05a6f1 Ensure backward compatibility with Torch 2.2
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-24 21:55:30 +08:00
Azure
ca7366d2db Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8 2025-02-24 11:58:10 +00:00
Azure
581a524f65 Add data loader to read special weights for fp8; Add special weight process script 2025-02-24 11:34:17 +00:00
akemimadoka
706e69f4fc Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC 2025-02-24 01:37:50 +08:00
Xiaodong Ye
18b1d18367 musa: support bf16
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-23 10:19:19 +08:00
Azure
7b7c6a657d Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225' 2025-02-22 13:05:08 +00:00
Atream
f7f1059873 fix merge bug, this branch also padding Marlin 2025-02-22 09:00:09 +00:00
Atream
024009675e Merge branch 'main' into feat-more-context 2025-02-22 06:17:39 +00:00
Atream
5ec33d046d optimize gguf dequant, save mem, support Q2_K
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
2025-02-22 06:13:01 +00:00
Atream
7e1fe256c8 optimize GPU 2025-02-21 05:06:57 +00:00
Xiaodong Ye
2207f6cd14 feat: Support Moore Threads GPU
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-19 18:26:55 +08:00
Atream
c189d55bd1 toy support for experts on GPU, no CUDA Graph 2025-02-15 15:16:00 +00:00
liam
098602b08f v0.2 ongoing 2025-02-09 22:41:14 +08:00
liam
3dca28d23b fix moe.cpp int overflow problem 2025-02-06 22:39:16 +08:00
chenht2022
14869b55ad Adapt Windows 2024-10-09 11:08:32 +00:00
Chen Hongtao
b4904537e3
Merge pull request #83 from sayap/task-queue-cond-var
Use cond var to avoid busy loop
2024-10-09 18:57:17 +08:00
Azure
3758afb526 fix some dequant function dosen't support multi gpu bug 2024-09-13 08:34:23 +00:00
Yap Sok Ann
6666d62237 Use cond var to avoid busy loop 2024-09-11 16:10:54 +07:00
Yap Sok Ann
be356c1b8d Support IQ4_XS dequantize 2024-09-02 09:10:19 +07:00
chenxl
4d1d561d28 [feature] release 0.1.3 2024-08-28 16:11:43 +00:00
BITcyman
7c4cb520bd [feature] support q2_k & q3_k dequantize on gpu 2024-08-12 12:53:12 +00:00
chenxl
650c368c18 Merge remote-tracking branch 'upstream/main' into develop-0.1.2 2024-08-12 12:31:49 +00:00
Atream
3c675af61a
Update task_queue.h 2024-08-12 20:06:19 +08:00
chenxl
f5f79f5c0e [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00
chenxl
782a17e4e6 [feature] add bat for windows, update readme 2024-08-09 09:39:42 +00:00
chenht2022
c1cc7d2cd2 1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic. 2024-08-08 09:04:36 +00:00
chenxl
1d9d397525 fix some bug in compile in linux 2024-08-08 15:34:19 +08:00
Atream
0a2fd52cea support windows support q4_0 and q5_0 dequant on cpu Add CopyRight from pygguf(It was added before, but disappear after merge). Add some TODO in the code. 2024-08-08 15:34:02 +08:00
chenxl
112cb3c962 [feature] support python 310 and multi instruction 2024-07-31 13:58:17 +00:00
chenxl
18c42e67df Initial commit 2024-07-27 16:06:58 +08:00