prima.cpp

mirror of https://github.com/Lizonghang/prima.cpp.git synced 2025-09-07 19:39:03 +00:00

Author	SHA1	Message	Date
Lizonghang	07c4966a80	reduce fio data size to 1gb to speed up profiling	2025-05-14 21:26:01 +04:00
Lizonghang	2fbc0c8da3	fix: reset -ngl to 0 when GPU is not used and reformat code	2025-05-14 13:27:20 +04:00
DeEMO	168c14f4e8	remove unnecessary profile when `--lw` is specified	2025-04-17 13:49:09 +00:00
leeetao	fc1e2d3fc6	Added support for iq1s and iq1m quantization type	2025-04-17 10:27:53 +00:00
Zonghang Li	5a5f103833	fix q6k and q80	2025-04-16 08:55:07 +04:00
Lizonghang	f97a97003b	fix type convert	2025-04-07 17:57:50 +04:00
Zonghang Li	63b45a4c26	add args -k and --force	2025-03-11 22:09:39 +04:00
Zonghang Li	bcfdace59b	add args -k and --force	2025-03-11 20:44:36 +04:00
leeetao	45ec52c2cb	Added support for IQ1_M and IQ2_XXS quantization type	2025-03-07 16:56:16 +00:00
leeetao	230c68b80c	fixed the alignment display	2025-03-07 07:55:23 +00:00
leeetao	6a416534c8	Fixed the alignment display of device performance	2025-03-07 07:46:30 +00:00
leeetao	54c4c1c26e	Fixed the flops test for iq1s and q2k quantization types	2025-03-07 02:47:00 +00:00
leeetao	2f049b8428	Added support for Q2K, IQ1s, IQ4NL quantization types	2025-03-04 15:22:55 +00:00
Lizonghang	9cbdf01645	fix support for Q5_0	2025-02-27 22:25:03 +04:00
Lizonghang	c8e615d69c	fix n_m bound error	2025-02-27 21:59:04 +04:00
Lizonghang	550fdcbc4f	add support for Q5_0	2025-02-27 21:47:14 +04:00
Lizonghang	96e68679ce	fix upper bound and set calibration in halda	2025-02-27 17:00:27 +04:00
Lizonghang	41f3708999	fix condition for gpu overload	2025-02-25 21:31:55 +04:00
leeetao	224d14eb4c	Merge branch 'tao' into dev	2025-02-24 16:48:43 +00:00
leeetao	42da179d66	Added parameter display for the distilled model of deepseek-qwen	2025-02-24 13:24:56 +00:00
Lizonghang	e3a0d0007a	add gpu check in set calibration	2025-02-23 21:56:59 +04:00
Lizonghang	07a397360b	fix gpu underutilization	2025-02-19 16:30:18 +04:00
Lizonghang	863393554a	add gpu underutilization calibration step	2025-02-17 18:54:18 +04:00
Zonghang Li	8532d030f3	fix bugs in bo	2025-02-15 18:10:11 +04:00
Lizonghang	e64f237e04	fix bugs in available_mem calculation	2025-02-15 17:43:03 +04:00
Lizonghang	64c4a47980	fix bugs and warnings	2025-02-15 17:33:03 +04:00
Zonghang Li	630556bc16	fix default allocation strategy to avoid OOM	2025-02-15 17:23:19 +04:00
Lizonghang	fdfaaecd5e	disable device profiler in standalone mode	2025-02-12 17:12:30 +04:00
Lizonghang	c84f9d29fe	use arg prefetch and remove arg unload	2025-02-12 17:04:41 +04:00
Lizonghang	24974a488c	assume 10% of active pages can be compressed on macOS UMA	2025-02-11 11:06:33 +04:00
Lizonghang	d2bc5cd502	add pid as suffix to avoid conflicts with other processes	2025-02-07 10:29:22 +04:00
Lizonghang	ec73e239c9	use 80% available mem as a conservative estimate	2025-02-03 18:10:05 +04:00
Lizonghang	64089236eb	fix latency estimation in set m1	2025-02-03 07:56:02 +04:00
Lizonghang	83b3d01844	fix delay estimation on macos	2025-02-01 10:37:56 +04:00
Lizonghang	dd632ee6df	ignore the first 5 evals due to preheat	2025-01-31 08:53:51 +04:00
Lizonghang	fdecd4b54c	more active pages can be compressed	2025-01-30 23:17:07 +04:00
Lizonghang	2bc7a56790	fix available mem estimation in termux	2025-01-30 21:23:05 +04:00
Lizonghang	cd758247e6	consider active pages compression in macos available memory estimation	2025-01-29 20:33:13 +04:00
Lizonghang	27c996835d	fix undeclared identifier get_page_size	2025-01-29 19:59:02 +04:00
Lizonghang	4b616baed4	fix macos x86_64 available mem estimation	2025-01-29 19:57:06 +04:00
Lizonghang	849b47ccd0	fix auto schedule logic	2025-01-29 13:13:37 +04:00
Lizonghang	e7c6b830e6	fix auto schedule logic	2025-01-29 11:15:45 +04:00
Lizonghang	631daadd92	test	2025-01-28 16:36:47 +04:00
Zonghang Li	36f353e374	check env path before calling fio to ensure we can find it	2025-01-28 13:06:08 +04:00
Lizonghang	2934cf3e8e	reserve 200 mib for internal gpu usage	2025-01-27 22:14:12 +04:00
Lizonghang	1ca9e7974b	device_os returns Linux if in Termux	2025-01-27 11:14:21 +04:00
Lizonghang	1e2b934d69	add bounds n[m]<=0 for devices without GPUs	2025-01-27 11:13:09 +04:00
Lizonghang	ac5d63b09e	add explaination for why the output layer weights should be kept in metal shared memory	2025-01-25 23:51:16 +04:00
Lizonghang	1ca9a43bd1	keep the output layer weights in shared memory by default	2025-01-25 23:31:43 +04:00
Lizonghang	f3dd5776eb	fix kappa and memory bounds, account for look-up table and input/output layer delay	2025-01-25 22:31:40 +04:00

1 2 3 4 5 ...

467 commits