prima.cpp

mirror of https://github.com/Lizonghang/prima.cpp.git synced 2025-09-06 21:49:02 +00:00

Author	SHA1	Message	Date
DeEMO	2e8e42a5ad	Add speculative decoding support to the server and command-line interfaces	2025-06-30 09:35:35 +00:00
Zonghang Li	1ea2d61a97	speedup: add arg --keep-out-in-cuda to run the output layer on CUDA	2025-06-28 10:58:18 +04:00
Li, Zonghang	a05022c05a	communication: use barrier instead of manually adding delay	2025-06-26 17:30:47 +04:00
Li, Zonghang	729870fcd7	topo rebuild: add a delay to avoid packet interleaving	2025-06-26 14:47:34 +04:00
Li, Zonghang	50807fd4e1	halda: handle infeasible solution with weak device	2025-06-26 08:56:31 +04:00
Li, Zonghang	16ba3564ce	fix compute_buffer estimate: add context GPU usage	2025-06-24 16:09:59 +04:00
Li, Zonghang	c926088d6a	fix compute buffer estimate: test without highs	2025-06-22 16:27:55 +04:00
Zonghang Li	45e8b0420c	fix compute buffer estimate: tested on cuda	2025-06-22 08:10:57 +00:00
Li, Zonghang	80e5b71b48	fix compute buffer estimate: tested on metal	2025-06-20 13:43:55 +04:00
Zonghang Li	dd589561b4	improve the computing buffer estimate	2025-06-19 08:02:43 +00:00
DeEMO	deeec668b8	fix: n_worker in draft model (cherry picked from commit 921ad2b453b24b715ad5db6a703fb3df65fdcb80)	2025-06-17 13:23:20 +08:00
DeEMO	67c4f70357	fix: add log when serving as a proxy	2025-06-17 12:08:53 +08:00
DeEMO	6ff38b2a0c	add args: data-port and signal-port	2025-06-17 12:00:04 +08:00
Li, Zonghang	fbbc30c950	Merge branch 'speculative' into dev	2025-06-16 13:27:36 +04:00
Li, Zonghang	dfb1feb54e	update README	2025-06-16 12:09:07 +04:00
Li, Zonghang	f38cfc625c	Merge branch 'fix' into dev	2025-06-14 18:56:36 +04:00
Li, Zonghang	b5ccd62135	fix n_gpu_layers allocation errors	2025-06-14 18:55:53 +04:00
DeEMO	d4618de991	fix: block when free socket	2025-06-12 12:26:10 +00:00
DeEMO	2039e3b0c1	fix: send and recv meta	2025-06-12 12:26:10 +00:00
DeEMO	d6c8d322cd	fix try_connect	2025-06-12 12:26:10 +00:00
DeEMO	d1b97f798e	support reconnection	2025-06-12 12:26:09 +00:00
Lizonghang	27756ee182	fix: enable rolling back set assignment when all devices are assigned to M4 but no feasible solutions	2025-06-04 15:11:29 +04:00
Li, Zonghang	6439090920	reformat code	2025-06-03 23:53:24 +04:00
Li, Zonghang	a01fafd126	Merge branch 'main' into dev	2025-06-03 17:56:47 +04:00
Li, Zonghang	b30f749e5e	fix n_embd cannot be divided by quantized block size	2025-06-03 14:06:31 +04:00
Li, Zonghang	7b0ededd24	Merge branch 'dev' into feat/auto-exit	2025-05-20 02:04:14 +08:00
Lizonghang	c54a6a0132	fix context shifting	2025-05-19 16:58:35 +04:00
DeEMO	8b61cb2fa4	fix: adapt the new topo Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:29 +00:00
DeEMO	4b36aef157	fix some bugs Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:08 +00:00
DeEMO	cc46aa9828	update rank and n_world Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:02 +00:00
DeEMO	fdd6694633	add topo rebuild Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:21:53 +00:00
DeEMO	26bb86c09b	Add tune_layer_allocation Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:21:22 +00:00
Lizonghang	2fbc0c8da3	fix: reset -ngl to 0 when GPU is not used and reformat code	2025-05-14 13:27:20 +04:00
DeEMO	168c14f4e8	remove unnecessary profile when `--lw` is specified	2025-04-17 13:49:09 +00:00
leeetao	fc1e2d3fc6	Added support for iq1s and iq1m quantization type	2025-04-17 10:27:53 +00:00
Lizonghang	f97a97003b	fix type convert	2025-04-07 17:57:50 +04:00
Zonghang Li	63b45a4c26	add args -k and --force	2025-03-11 22:09:39 +04:00
Zonghang Li	bcfdace59b	add args -k and --force	2025-03-11 20:44:36 +04:00
leeetao	45ec52c2cb	Added support for IQ1_M and IQ2_XXS quantization type	2025-03-07 16:56:16 +00:00
leeetao	2f049b8428	Added support for Q2K, IQ1s, IQ4NL quantization types	2025-03-04 15:22:55 +00:00
Lizonghang	9cbdf01645	fix support for Q5_0	2025-02-27 22:25:03 +04:00
Lizonghang	c8e615d69c	fix n_m bound error	2025-02-27 21:59:04 +04:00
Lizonghang	550fdcbc4f	add support for Q5_0	2025-02-27 21:47:14 +04:00
Lizonghang	96e68679ce	fix upper bound and set calibration in halda	2025-02-27 17:00:27 +04:00
Lizonghang	41f3708999	fix condition for gpu overload	2025-02-25 21:31:55 +04:00
leeetao	224d14eb4c	Merge branch 'tao' into dev	2025-02-24 16:48:43 +00:00
leeetao	42da179d66	Added parameter display for the distilled model of deepseek-qwen	2025-02-24 13:24:56 +00:00
Lizonghang	e3a0d0007a	add gpu check in set calibration	2025-02-23 21:56:59 +04:00
Lizonghang	07a397360b	fix gpu underutilization	2025-02-19 16:30:18 +04:00
Lizonghang	863393554a	add gpu underutilization calibration step	2025-02-17 18:54:18 +04:00

1 2 3 4 5 ...

317 commits