prima.cpp

mirror of https://github.com/Lizonghang/prima.cpp.git synced 2025-09-06 06:39:03 +00:00

Author	SHA1	Message	Date
DeEMO	d4618de991	fix: block when free socket	2025-06-12 12:26:10 +00:00
DeEMO	2039e3b0c1	fix: send and recv meta	2025-06-12 12:26:10 +00:00
DeEMO	d6c8d322cd	fix try_connect	2025-06-12 12:26:10 +00:00
DeEMO	d1b97f798e	support reconnection	2025-06-12 12:26:09 +00:00
Lizonghang	27756ee182	fix: enable rolling back set assignment when all devices are assigned to M4 but no feasible solutions	2025-06-04 15:11:29 +04:00
Li, Zonghang	6439090920	reformat code	2025-06-03 23:53:24 +04:00
Li, Zonghang	a01fafd126	Merge branch 'main' into dev	2025-06-03 17:56:47 +04:00
Li, Zonghang	b30f749e5e	fix n_embd cannot be divided by quantized block size	2025-06-03 14:06:31 +04:00
Li, Zonghang	7b0ededd24	Merge branch 'dev' into feat/auto-exit	2025-05-20 02:04:14 +08:00
Lizonghang	c54a6a0132	fix context shifting	2025-05-19 16:58:35 +04:00
DeEMO	8b61cb2fa4	fix: adapt the new topo Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:29 +00:00
DeEMO	4b36aef157	fix some bugs Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:08 +00:00
DeEMO	cc46aa9828	update rank and n_world Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:02 +00:00
DeEMO	fdd6694633	add topo rebuild Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:21:53 +00:00
DeEMO	26bb86c09b	Add tune_layer_allocation Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:21:22 +00:00
Lizonghang	2fbc0c8da3	fix: reset -ngl to 0 when GPU is not used and reformat code	2025-05-14 13:27:20 +04:00
DeEMO	168c14f4e8	remove unnecessary profile when `--lw` is specified	2025-04-17 13:49:09 +00:00
leeetao	fc1e2d3fc6	Added support for iq1s and iq1m quantization type	2025-04-17 10:27:53 +00:00
Lizonghang	f97a97003b	fix type convert	2025-04-07 17:57:50 +04:00
Zonghang Li	63b45a4c26	add args -k and --force	2025-03-11 22:09:39 +04:00
Zonghang Li	bcfdace59b	add args -k and --force	2025-03-11 20:44:36 +04:00
leeetao	45ec52c2cb	Added support for IQ1_M and IQ2_XXS quantization type	2025-03-07 16:56:16 +00:00
leeetao	2f049b8428	Added support for Q2K, IQ1s, IQ4NL quantization types	2025-03-04 15:22:55 +00:00
Lizonghang	9cbdf01645	fix support for Q5_0	2025-02-27 22:25:03 +04:00
Lizonghang	c8e615d69c	fix n_m bound error	2025-02-27 21:59:04 +04:00
Lizonghang	550fdcbc4f	add support for Q5_0	2025-02-27 21:47:14 +04:00
Lizonghang	96e68679ce	fix upper bound and set calibration in halda	2025-02-27 17:00:27 +04:00
Lizonghang	41f3708999	fix condition for gpu overload	2025-02-25 21:31:55 +04:00
leeetao	224d14eb4c	Merge branch 'tao' into dev	2025-02-24 16:48:43 +00:00
leeetao	42da179d66	Added parameter display for the distilled model of deepseek-qwen	2025-02-24 13:24:56 +00:00
Lizonghang	e3a0d0007a	add gpu check in set calibration	2025-02-23 21:56:59 +04:00
Lizonghang	07a397360b	fix gpu underutilization	2025-02-19 16:30:18 +04:00
Lizonghang	863393554a	add gpu underutilization calibration step	2025-02-17 18:54:18 +04:00
Zonghang Li	8532d030f3	fix bugs in bo	2025-02-15 18:10:11 +04:00
Lizonghang	e64f237e04	fix bugs in available_mem calculation	2025-02-15 17:43:03 +04:00
Lizonghang	64c4a47980	fix bugs and warnings	2025-02-15 17:33:03 +04:00
Zonghang Li	630556bc16	fix default allocation strategy to avoid OOM	2025-02-15 17:23:19 +04:00
Lizonghang	fdfaaecd5e	disable device profiler in standalone mode	2025-02-12 17:12:30 +04:00
Lizonghang	c84f9d29fe	use arg prefetch and remove arg unload	2025-02-12 17:04:41 +04:00
Lizonghang	64089236eb	fix latency estimation in set m1	2025-02-03 07:56:02 +04:00
Lizonghang	83b3d01844	fix delay estimation on macos	2025-02-01 10:37:56 +04:00
Lizonghang	849b47ccd0	fix auto schedule logic	2025-01-29 13:13:37 +04:00
Lizonghang	e7c6b830e6	fix auto schedule logic	2025-01-29 11:15:45 +04:00
Lizonghang	631daadd92	test	2025-01-28 16:36:47 +04:00
Zonghang Li	36f353e374	check env path before calling fio to ensure we can find it	2025-01-28 13:06:08 +04:00
Lizonghang	2934cf3e8e	reserve 200 mib for internal gpu usage	2025-01-27 22:14:12 +04:00
Lizonghang	1e2b934d69	add bounds n[m]<=0 for devices without GPUs	2025-01-27 11:13:09 +04:00
Lizonghang	1ca9a43bd1	keep the output layer weights in shared memory by default	2025-01-25 23:31:43 +04:00
Lizonghang	f3dd5776eb	fix kappa and memory bounds, account for look-up table and input/output layer delay	2025-01-25 22:31:40 +04:00
Lizonghang	9e4ba4f06a	fix w init error	2025-01-24 20:26:18 +04:00

1 2 3 4 5 ...

300 commits