Commit graph

4115 commits

Author SHA1 Message Date
Li, Zonghang
67b10034a7
update video to compare llama.cpp and prima.cpp 2025-04-07 18:01:04 +04:00
Lizonghang
7631ddcdc7 ignore video 2025-04-07 17:59:23 +04:00
Lizonghang
fffefb9259 update README 2025-04-07 17:57:57 +04:00
Lizonghang
f97a97003b fix type convert 2025-04-07 17:57:50 +04:00
Lizonghang
3b264352e7 update README 2025-03-30 23:39:36 +04:00
Lizonghang
3a6cb1768f add logo 2025-03-30 17:21:42 +04:00
Zonghang Li
63b45a4c26 add args -k and --force 2025-03-11 22:09:39 +04:00
Zonghang Li
bcfdace59b add args -k and --force 2025-03-11 20:44:36 +04:00
Lizonghang
9cbdf01645 fix support for Q5_0 2025-02-27 22:25:03 +04:00
Lizonghang
c8e615d69c fix n_m bound error 2025-02-27 21:59:04 +04:00
Lizonghang
550fdcbc4f add support for Q5_0 2025-02-27 21:47:14 +04:00
Lizonghang
96e68679ce fix upper bound and set calibration in halda 2025-02-27 17:00:27 +04:00
Lizonghang
41f3708999 fix condition for gpu overload 2025-02-25 21:31:55 +04:00
leeetao 
224d14eb4c Merge branch 'tao' into dev 2025-02-24 16:48:43 +00:00
leeetao
42da179d66 Added parameter display for the distilled model of deepseek-qwen 2025-02-24 13:24:56 +00:00
Lizonghang
e3a0d0007a add gpu check in set calibration 2025-02-23 21:56:59 +04:00
leeetao
7bf1b743fb Merge branch 'dev' into lt_test
Merge dev branch updates into local branch lt_test.
2025-02-23 08:35:45 +00:00
leeetao
b4a9932d56 Added deepseek-r1-qwen vocabulary file 2025-02-23 08:33:57 +00:00
leeetao
f99e08b9fe Added inference support for the Deepseek distilled model 2025-02-23 08:27:37 +00:00
Zonghang Li
f5e874f75f remove conda path 2025-02-23 01:38:13 +04:00
Lizonghang
07a397360b fix gpu underutilization 2025-02-19 16:30:18 +04:00
Lizonghang
e219fada4e disable timer 2025-02-19 16:24:12 +04:00
Lizonghang
863393554a add gpu underutilization calibration step 2025-02-17 18:54:18 +04:00
Zonghang Li
8532d030f3 fix bugs in bo 2025-02-15 18:10:11 +04:00
Lizonghang
e64f237e04 fix bugs in available_mem calculation 2025-02-15 17:43:03 +04:00
Lizonghang
64c4a47980 fix bugs and warnings 2025-02-15 17:33:03 +04:00
Zonghang Li
630556bc16 fix default allocation strategy to avoid OOM 2025-02-15 17:23:19 +04:00
Lizonghang
fdfaaecd5e disable device profiler in standalone mode 2025-02-12 17:12:30 +04:00
Lizonghang
c84f9d29fe use arg prefetch and remove arg unload 2025-02-12 17:04:41 +04:00
Lizonghang
708b1d8c89 disable force fetching 2025-02-12 16:55:44 +04:00
Lizonghang
ea0e655a8b disable force feteching 2025-02-12 16:55:21 +04:00
Lizonghang
b163918b46 disable prefetch in standalone mode 2025-02-12 00:17:33 +04:00
Lizonghang
6a50d494d2 increase prefetch dense 2025-02-11 17:25:06 +04:00
Lizonghang
65ad14140a do not check loaded tensors due to increased latency 2025-02-11 17:10:11 +04:00
Lizonghang
3dd3138207 ignore tensors already in page cache when prefetching 2025-02-11 17:00:17 +04:00
Lizonghang
24974a488c assume 10% of active pages can be compressed on macOS UMA 2025-02-11 11:06:33 +04:00
Zonghang Li
261c88f058 skip tensors on CUDA in manage_graph_tensors 2025-02-11 09:49:17 +04:00
Zonghang Li
c4c6a642fc manage_graph_tensors: fix segment prefetch 2025-02-08 22:44:38 +04:00
Lizonghang
d2bc5cd502 add pid as suffix to avoid conflicts with other processes 2025-02-07 10:29:22 +04:00
Lizonghang
8e41362af0 fix prefetch 2025-02-04 17:38:41 +04:00
Lizonghang
ec73e239c9 use 80% available mem as a conservative estimate 2025-02-03 18:10:05 +04:00
Lizonghang
64089236eb fix latency estimation in set m1 2025-02-03 07:56:02 +04:00
Lizonghang
83b3d01844 fix delay estimation on macos 2025-02-01 10:37:56 +04:00
Lizonghang
215151918f fix gpu mem limit 2025-01-31 18:52:13 +04:00
Lizonghang
17cd8ba618 reverse 300MiB for Metal kernel 2025-01-31 16:24:44 +04:00
Lizonghang
dd632ee6df ignore the first 5 evals due to preheat 2025-01-31 08:53:51 +04:00
Lizonghang
fdecd4b54c more active pages can be compressed 2025-01-30 23:17:07 +04:00
Lizonghang
2bc7a56790 fix available mem estimation in termux 2025-01-30 21:23:05 +04:00
Lizonghang
b680cb74fe set POSIX_MADV_WILLNEED for the next subgraph 2025-01-30 13:29:34 +04:00
Lizonghang
f9b4c46b74 ignore the first eval to make time test more accurate 2025-01-30 11:12:26 +04:00