Zonghang Li
|
45e8b0420c
|
fix compute buffer estimate: tested on cuda
|
2025-06-22 08:10:57 +00:00 |
|
Zonghang Li
|
dd589561b4
|
improve the computing buffer estimate
|
2025-06-19 08:02:43 +00:00 |
|
DeEMO
|
d6c8d322cd
|
fix try_connect
|
2025-06-12 12:26:10 +00:00 |
|
DeEMO
|
d1b97f798e
|
support reconnection
|
2025-06-12 12:26:09 +00:00 |
|
Li, Zonghang
|
7b0ededd24
|
Merge branch 'dev' into feat/auto-exit
|
2025-05-20 02:04:14 +08:00 |
|
Lizonghang
|
c54a6a0132
|
fix context shifting
|
2025-05-19 16:58:35 +04:00 |
|
DeEMO
|
fdd6694633
|
add topo rebuild
Signed-off-by: DeEMO <yzzxrx@gmail.com>
|
2025-05-19 09:21:53 +00:00 |
|
DeEMO
|
168c14f4e8
|
remove unnecessary profile when --lw is specified
|
2025-04-17 13:49:09 +00:00 |
|
leeetao
|
45ec52c2cb
|
Added support for IQ1_M and IQ2_XXS quantization type
|
2025-03-07 16:56:16 +00:00 |
|
leeetao
|
2f049b8428
|
Added support for Q2K, IQ1s, IQ4NL quantization types
|
2025-03-04 15:22:55 +00:00 |
|
Lizonghang
|
550fdcbc4f
|
add support for Q5_0
|
2025-02-27 21:47:14 +04:00 |
|
Lizonghang
|
fa31ca8e35
|
add os detect
|
2024-12-30 09:13:12 +04:00 |
|
Lizonghang
|
d9beb030ee
|
add EPS in device_compute_delay
|
2024-12-29 22:31:45 +04:00 |
|
Lizonghang
|
a7ec685eda
|
add memcpy speed test
|
2024-12-29 16:19:08 +04:00 |
|
Lizonghang
|
b642d70188
|
fix swappable mem in termux
|
2024-12-12 15:15:16 +04:00 |
|
Lizonghang
|
8e9ab45458
|
fix model bytes counter
|
2024-12-10 14:57:48 +04:00 |
|
Lizonghang
|
d78fa427e7
|
add memory copy speed test
|
2024-12-09 10:07:42 +04:00 |
|
Zonghang Li
|
df813675d0
|
fix flops count and ram/vram speed test
|
2024-12-08 10:14:05 +04:00 |
|
Lizonghang
|
f1c1d1b929
|
add support for Q5_K and fix byte count for Q6_K
|
2024-12-06 07:59:45 +04:00 |
|
Zonghang Li
|
7521e532c4
|
device_memory_bw: simulate cache-friendly block access and multi-threading
|
2024-12-04 15:36:59 +04:00 |
|
Lizonghang
|
68ecabc8c3
|
add cpu_read_ram_bw, metal_read_vram_bw, cuda_read_vram_bw
|
2024-11-29 19:04:53 +04:00 |
|
Lizonghang
|
45a1e55eec
|
reduce kv cache from available memory
|
2024-11-28 20:21:21 +04:00 |
|
Lizonghang
|
740f7f0b95
|
use multithread disk r/w test
|
2024-11-27 22:14:17 +04:00 |
|
Lizonghang
|
f7507ec20b
|
fix disk r/w test, add disk access latency, and correct units (GB, GiB)
|
2024-11-27 21:36:12 +04:00 |
|
Zonghang Li
|
f78c437172
|
add device_inp_embd_delay test, device_memory_bw test, device_cuda_memory_bw test,
|
2024-11-26 22:28:02 +04:00 |
|
Lizonghang
|
a7a95b53fe
|
add q80xf32 and count_n_params
|
2024-11-24 23:11:12 +04:00 |
|
Lizonghang
|
3fe00a16a0
|
count model flops for f32xf32, f16xf32, q4kxf32, q6kxf32
|
2024-11-24 13:13:32 +04:00 |
|
Lizonghang
|
a5ba34169a
|
add f32, f16, q4k_f32, q6k_f32 flops test and fix duplicate inp_embd in subgraphs
|
2024-11-23 21:36:34 +04:00 |
|
Zonghang Li
|
7ee1423006
|
add model_flops
|
2024-11-21 20:06:16 +04:00 |
|
Zonghang Li
|
80f6b72e71
|
remove device_flops from profiler api
|
2024-11-21 08:37:57 +04:00 |
|
Lizonghang
|
477ecf2084
|
add llama_model_n_flops
|
2024-11-20 19:40:27 +04:00 |
|
Lizonghang
|
10f6f92c7e
|
add f32, f16, q8, q4k speed test for cuda
|
2024-11-10 23:41:13 +04:00 |
|
Lizonghang
|
f4260bb346
|
add device_flops() for cpu, metal, and cuda
|
2024-11-10 23:11:05 +04:00 |
|
Lizonghang
|
5fae6ac36f
|
add cpu flops test
|
2024-11-09 20:53:42 +04:00 |
|
Lizonghang
|
53cb3a6069
|
synchronize device info
|
2024-11-07 22:02:01 +04:00 |
|
Lizonghang
|
ef7fdf70cc
|
add LLAMA_API llama_profile_device
|
2024-11-07 09:30:39 +04:00 |
|
Lizonghang
|
407c71ae52
|
add cpu and gpu profile
|
2024-11-06 20:42:28 +04:00 |
|
Lizonghang
|
4e1be1065d
|
add memory speed test
|
2024-11-06 10:57:30 +04:00 |
|
Lizonghang
|
a7f3d917a1
|
add device get name
|
2024-11-05 22:04:14 +04:00 |
|
Lizonghang
|
2d447266e9
|
add swap capacity test
|
2024-11-05 21:42:45 +04:00 |
|
Lizonghang
|
9eed6b14bf
|
add disk read speed test
|
2024-11-05 21:12:02 +04:00 |
|
Lizonghang
|
9cd66f2145
|
add profiler
|
2024-11-05 20:29:09 +04:00 |
|
Lizonghang
|
766ec7862b
|
test
|
2024-11-05 17:22:24 +04:00 |
|