Atream
25cee5810e
add balance-serve, support concurrence
2025-03-31 22:55:32 +08:00
Atream
8d0292aa44
refactor folders
2025-03-31 22:45:37 +08:00
Azure-Tang
4a31237346
fix rocm compilation
2025-03-15 12:34:03 -04:00
Azure
117a8d2f2a
fix compilation
2025-03-14 19:49:20 +00:00
Azure-Tang
ed8437413b
merge main; Add torch q8 linear
2025-03-14 05:52:07 -04:00
liam
8eeb6dd432
⚡ update compile option for avx512vpopcntdq
2025-03-06 12:18:04 +08:00
Atream
50c691297f
Merge pull request #622 from akemimadoka/fix-msvc
...
Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC
2025-02-27 17:42:00 +08:00
wkgcass
b2bff17775
fix numa cpu distribution
...
The numa node location would be calculated based on the total number
of worker threads.
So we should always use the actual number of threads instead of using a min() op.
2025-02-26 14:49:57 +08:00
Azure
7e5962af3d
fix fp8 multi gpu; update FQA
2025-02-25 10:52:29 +00:00
Azure
5474be5299
Merge branch 'main' into develop-0.2.2
2025-02-25 09:04:22 +00:00
Atream
7b2a6690ab
Merge pull request #608 from makllama/fix_musa_ext
...
musa: support bf16
2025-02-24 23:12:54 +08:00
Xiaodong Ye
f88c05a6f1
Ensure backward compatibility with Torch 2.2
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-24 21:55:30 +08:00
Azure
ca7366d2db
Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8
2025-02-24 11:58:10 +00:00
Azure
581a524f65
Add data loader to read special weights for fp8; Add special weight process script
2025-02-24 11:34:17 +00:00
akemimadoka
706e69f4fc
Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC
2025-02-24 01:37:50 +08:00
Xiaodong Ye
18b1d18367
musa: support bf16
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-23 10:19:19 +08:00
Azure
7b7c6a657d
Add fp8 linear kernel;\n Add empty cache to fit in 16G VRAM; By 'wkGCaSS - 知乎 https://zhuanlan.zhihu.com/p/25491611225 '
2025-02-22 13:05:08 +00:00
Atream
f7f1059873
fix merge bug, this branch also padding Marlin
2025-02-22 09:00:09 +00:00
Atream
024009675e
Merge branch 'main' into feat-more-context
2025-02-22 06:17:39 +00:00
Atream
5ec33d046d
optimize gguf dequant, save mem, support Q2_K
...
use marlin for lm_head, lm_head only calc last token for prefill
extend context window to 19K for DeepSeek-V3/R1 within 24GB VRAM
2025-02-22 06:13:01 +00:00
Atream
7e1fe256c8
optimize GPU
2025-02-21 05:06:57 +00:00
Xiaodong Ye
2207f6cd14
feat: Support Moore Threads GPU
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-19 18:26:55 +08:00
Atream
c189d55bd1
toy support for experts on GPU, no CUDA Graph
2025-02-15 15:16:00 +00:00
liam
098602b08f
⚡ v0.2 ongoing
2025-02-09 22:41:14 +08:00
liam
3dca28d23b
⚡ fix moe.cpp int overflow problem
2025-02-06 22:39:16 +08:00
chenht2022
14869b55ad
Adapt Windows
2024-10-09 11:08:32 +00:00
Chen Hongtao
b4904537e3
Merge pull request #83 from sayap/task-queue-cond-var
...
Use cond var to avoid busy loop
2024-10-09 18:57:17 +08:00
Azure
3758afb526
fix some dequant function dosen't support multi gpu bug
2024-09-13 08:34:23 +00:00
Yap Sok Ann
6666d62237
Use cond var to avoid busy loop
2024-09-11 16:10:54 +07:00
Yap Sok Ann
be356c1b8d
Support IQ4_XS dequantize
2024-09-02 09:10:19 +07:00
chenxl
4d1d561d28
[feature] release 0.1.3
2024-08-28 16:11:43 +00:00
BITcyman
7c4cb520bd
[feature] support q2_k & q3_k dequantize on gpu
2024-08-12 12:53:12 +00:00
chenxl
650c368c18
Merge remote-tracking branch 'upstream/main' into develop-0.1.2
2024-08-12 12:31:49 +00:00
Atream
3c675af61a
Update task_queue.h
2024-08-12 20:06:19 +08:00
chenxl
f5f79f5c0e
[ADD] support multi-gpu qlen>1 q5_k
2024-08-12 11:41:26 +00:00
chenxl
782a17e4e6
[feature] add bat for windows, update readme
2024-08-09 09:39:42 +00:00
chenht2022
c1cc7d2cd2
1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic.
2024-08-08 09:04:36 +00:00
chenxl
1d9d397525
fix some bug in compile in linux
2024-08-08 15:34:19 +08:00
Atream
0a2fd52cea
support windows support q4_0 and q5_0 dequant on cpu Add CopyRight from pygguf(It was added before, but disappear after merge). Add some TODO in the code.
2024-08-08 15:34:02 +08:00
chenxl
112cb3c962
[feature] support python 310 and multi instruction
2024-07-31 13:58:17 +00:00
chenxl
18c42e67df
Initial commit
2024-07-27 16:06:58 +08:00