prima.cpp

mirror of https://github.com/Lizonghang/prima.cpp.git synced 2025-09-06 04:09:02 +00:00

Author	SHA1	Message	Date
Li, Zonghang	67b10034a7	update video to compare llama.cpp and prima.cpp	2025-04-07 18:01:04 +04:00
Lizonghang	7631ddcdc7	ignore video	2025-04-07 17:59:23 +04:00
Lizonghang	fffefb9259	update README	2025-04-07 17:57:57 +04:00
Lizonghang	f97a97003b	fix type convert	2025-04-07 17:57:50 +04:00
Lizonghang	3b264352e7	update README	2025-03-30 23:39:36 +04:00
Lizonghang	3a6cb1768f	add logo	2025-03-30 17:21:42 +04:00
Zonghang Li	63b45a4c26	add args -k and --force	2025-03-11 22:09:39 +04:00
Zonghang Li	bcfdace59b	add args -k and --force	2025-03-11 20:44:36 +04:00
Lizonghang	9cbdf01645	fix support for Q5_0	2025-02-27 22:25:03 +04:00
Lizonghang	c8e615d69c	fix n_m bound error	2025-02-27 21:59:04 +04:00
Lizonghang	550fdcbc4f	add support for Q5_0	2025-02-27 21:47:14 +04:00
Lizonghang	96e68679ce	fix upper bound and set calibration in halda	2025-02-27 17:00:27 +04:00
Lizonghang	41f3708999	fix condition for gpu overload	2025-02-25 21:31:55 +04:00
leeetao	224d14eb4c	Merge branch 'tao' into dev	2025-02-24 16:48:43 +00:00
leeetao	42da179d66	Added parameter display for the distilled model of deepseek-qwen	2025-02-24 13:24:56 +00:00
Lizonghang	e3a0d0007a	add gpu check in set calibration	2025-02-23 21:56:59 +04:00
leeetao	7bf1b743fb	Merge branch 'dev' into lt_test Merge dev branch updates into local branch lt_test.	2025-02-23 08:35:45 +00:00
leeetao	b4a9932d56	Added deepseek-r1-qwen vocabulary file	2025-02-23 08:33:57 +00:00
leeetao	f99e08b9fe	Added inference support for the Deepseek distilled model	2025-02-23 08:27:37 +00:00
Zonghang Li	f5e874f75f	remove conda path	2025-02-23 01:38:13 +04:00
Lizonghang	07a397360b	fix gpu underutilization	2025-02-19 16:30:18 +04:00
Lizonghang	e219fada4e	disable timer	2025-02-19 16:24:12 +04:00
Lizonghang	863393554a	add gpu underutilization calibration step	2025-02-17 18:54:18 +04:00
Zonghang Li	8532d030f3	fix bugs in bo	2025-02-15 18:10:11 +04:00
Lizonghang	e64f237e04	fix bugs in available_mem calculation	2025-02-15 17:43:03 +04:00
Lizonghang	64c4a47980	fix bugs and warnings	2025-02-15 17:33:03 +04:00
Zonghang Li	630556bc16	fix default allocation strategy to avoid OOM	2025-02-15 17:23:19 +04:00
Lizonghang	fdfaaecd5e	disable device profiler in standalone mode	2025-02-12 17:12:30 +04:00
Lizonghang	c84f9d29fe	use arg prefetch and remove arg unload	2025-02-12 17:04:41 +04:00
Lizonghang	708b1d8c89	disable force fetching	2025-02-12 16:55:44 +04:00
Lizonghang	ea0e655a8b	disable force feteching	2025-02-12 16:55:21 +04:00
Lizonghang	b163918b46	disable prefetch in standalone mode	2025-02-12 00:17:33 +04:00
Lizonghang	6a50d494d2	increase prefetch dense	2025-02-11 17:25:06 +04:00
Lizonghang	65ad14140a	do not check loaded tensors due to increased latency	2025-02-11 17:10:11 +04:00
Lizonghang	3dd3138207	ignore tensors already in page cache when prefetching	2025-02-11 17:00:17 +04:00
Lizonghang	24974a488c	assume 10% of active pages can be compressed on macOS UMA	2025-02-11 11:06:33 +04:00
Zonghang Li	261c88f058	skip tensors on CUDA in manage_graph_tensors	2025-02-11 09:49:17 +04:00
Zonghang Li	c4c6a642fc	manage_graph_tensors: fix segment prefetch	2025-02-08 22:44:38 +04:00
Lizonghang	d2bc5cd502	add pid as suffix to avoid conflicts with other processes	2025-02-07 10:29:22 +04:00
Lizonghang	8e41362af0	fix prefetch	2025-02-04 17:38:41 +04:00
Lizonghang	ec73e239c9	use 80% available mem as a conservative estimate	2025-02-03 18:10:05 +04:00
Lizonghang	64089236eb	fix latency estimation in set m1	2025-02-03 07:56:02 +04:00
Lizonghang	83b3d01844	fix delay estimation on macos	2025-02-01 10:37:56 +04:00
Lizonghang	215151918f	fix gpu mem limit	2025-01-31 18:52:13 +04:00
Lizonghang	17cd8ba618	reverse 300MiB for Metal kernel	2025-01-31 16:24:44 +04:00
Lizonghang	dd632ee6df	ignore the first 5 evals due to preheat	2025-01-31 08:53:51 +04:00
Lizonghang	fdecd4b54c	more active pages can be compressed	2025-01-30 23:17:07 +04:00
Lizonghang	2bc7a56790	fix available mem estimation in termux	2025-01-30 21:23:05 +04:00
Lizonghang	b680cb74fe	set POSIX_MADV_WILLNEED for the next subgraph	2025-01-30 13:29:34 +04:00
Lizonghang	f9b4c46b74	ignore the first eval to make time test more accurate	2025-01-30 11:12:26 +04:00

1 2 3 4 5 ...

4115 commits