Li, Zonghang
|
67b10034a7
|
update video to compare llama.cpp and prima.cpp
|
2025-04-07 18:01:04 +04:00 |
|
Lizonghang
|
7631ddcdc7
|
ignore video
|
2025-04-07 17:59:23 +04:00 |
|
Lizonghang
|
fffefb9259
|
update README
|
2025-04-07 17:57:57 +04:00 |
|
Lizonghang
|
f97a97003b
|
fix type convert
|
2025-04-07 17:57:50 +04:00 |
|
Lizonghang
|
3b264352e7
|
update README
|
2025-03-30 23:39:36 +04:00 |
|
Lizonghang
|
3a6cb1768f
|
add logo
|
2025-03-30 17:21:42 +04:00 |
|
Zonghang Li
|
63b45a4c26
|
add args -k and --force
|
2025-03-11 22:09:39 +04:00 |
|
Zonghang Li
|
bcfdace59b
|
add args -k and --force
|
2025-03-11 20:44:36 +04:00 |
|
Lizonghang
|
9cbdf01645
|
fix support for Q5_0
|
2025-02-27 22:25:03 +04:00 |
|
Lizonghang
|
c8e615d69c
|
fix n_m bound error
|
2025-02-27 21:59:04 +04:00 |
|
Lizonghang
|
550fdcbc4f
|
add support for Q5_0
|
2025-02-27 21:47:14 +04:00 |
|
Lizonghang
|
96e68679ce
|
fix upper bound and set calibration in halda
|
2025-02-27 17:00:27 +04:00 |
|
Lizonghang
|
41f3708999
|
fix condition for gpu overload
|
2025-02-25 21:31:55 +04:00 |
|
leeetao
|
224d14eb4c
|
Merge branch 'tao' into dev
|
2025-02-24 16:48:43 +00:00 |
|
leeetao
|
42da179d66
|
Added parameter display for the distilled model of deepseek-qwen
|
2025-02-24 13:24:56 +00:00 |
|
Lizonghang
|
e3a0d0007a
|
add gpu check in set calibration
|
2025-02-23 21:56:59 +04:00 |
|
leeetao
|
7bf1b743fb
|
Merge branch 'dev' into lt_test
Merge dev branch updates into local branch lt_test.
|
2025-02-23 08:35:45 +00:00 |
|
leeetao
|
b4a9932d56
|
Added deepseek-r1-qwen vocabulary file
|
2025-02-23 08:33:57 +00:00 |
|
leeetao
|
f99e08b9fe
|
Added inference support for the Deepseek distilled model
|
2025-02-23 08:27:37 +00:00 |
|
Zonghang Li
|
f5e874f75f
|
remove conda path
|
2025-02-23 01:38:13 +04:00 |
|
Lizonghang
|
07a397360b
|
fix gpu underutilization
|
2025-02-19 16:30:18 +04:00 |
|
Lizonghang
|
e219fada4e
|
disable timer
|
2025-02-19 16:24:12 +04:00 |
|
Lizonghang
|
863393554a
|
add gpu underutilization calibration step
|
2025-02-17 18:54:18 +04:00 |
|
Zonghang Li
|
8532d030f3
|
fix bugs in bo
|
2025-02-15 18:10:11 +04:00 |
|
Lizonghang
|
e64f237e04
|
fix bugs in available_mem calculation
|
2025-02-15 17:43:03 +04:00 |
|
Lizonghang
|
64c4a47980
|
fix bugs and warnings
|
2025-02-15 17:33:03 +04:00 |
|
Zonghang Li
|
630556bc16
|
fix default allocation strategy to avoid OOM
|
2025-02-15 17:23:19 +04:00 |
|
Lizonghang
|
fdfaaecd5e
|
disable device profiler in standalone mode
|
2025-02-12 17:12:30 +04:00 |
|
Lizonghang
|
c84f9d29fe
|
use arg prefetch and remove arg unload
|
2025-02-12 17:04:41 +04:00 |
|
Lizonghang
|
708b1d8c89
|
disable force fetching
|
2025-02-12 16:55:44 +04:00 |
|
Lizonghang
|
ea0e655a8b
|
disable force feteching
|
2025-02-12 16:55:21 +04:00 |
|
Lizonghang
|
b163918b46
|
disable prefetch in standalone mode
|
2025-02-12 00:17:33 +04:00 |
|
Lizonghang
|
6a50d494d2
|
increase prefetch dense
|
2025-02-11 17:25:06 +04:00 |
|
Lizonghang
|
65ad14140a
|
do not check loaded tensors due to increased latency
|
2025-02-11 17:10:11 +04:00 |
|
Lizonghang
|
3dd3138207
|
ignore tensors already in page cache when prefetching
|
2025-02-11 17:00:17 +04:00 |
|
Lizonghang
|
24974a488c
|
assume 10% of active pages can be compressed on macOS UMA
|
2025-02-11 11:06:33 +04:00 |
|
Zonghang Li
|
261c88f058
|
skip tensors on CUDA in manage_graph_tensors
|
2025-02-11 09:49:17 +04:00 |
|
Zonghang Li
|
c4c6a642fc
|
manage_graph_tensors: fix segment prefetch
|
2025-02-08 22:44:38 +04:00 |
|
Lizonghang
|
d2bc5cd502
|
add pid as suffix to avoid conflicts with other processes
|
2025-02-07 10:29:22 +04:00 |
|
Lizonghang
|
8e41362af0
|
fix prefetch
|
2025-02-04 17:38:41 +04:00 |
|
Lizonghang
|
ec73e239c9
|
use 80% available mem as a conservative estimate
|
2025-02-03 18:10:05 +04:00 |
|
Lizonghang
|
64089236eb
|
fix latency estimation in set m1
|
2025-02-03 07:56:02 +04:00 |
|
Lizonghang
|
83b3d01844
|
fix delay estimation on macos
|
2025-02-01 10:37:56 +04:00 |
|
Lizonghang
|
215151918f
|
fix gpu mem limit
|
2025-01-31 18:52:13 +04:00 |
|
Lizonghang
|
17cd8ba618
|
reverse 300MiB for Metal kernel
|
2025-01-31 16:24:44 +04:00 |
|
Lizonghang
|
dd632ee6df
|
ignore the first 5 evals due to preheat
|
2025-01-31 08:53:51 +04:00 |
|
Lizonghang
|
fdecd4b54c
|
more active pages can be compressed
|
2025-01-30 23:17:07 +04:00 |
|
Lizonghang
|
2bc7a56790
|
fix available mem estimation in termux
|
2025-01-30 21:23:05 +04:00 |
|
Lizonghang
|
b680cb74fe
|
set POSIX_MADV_WILLNEED for the next subgraph
|
2025-01-30 13:29:34 +04:00 |
|
Lizonghang
|
f9b4c46b74
|
ignore the first eval to make time test more accurate
|
2025-01-30 11:12:26 +04:00 |
|