prima.cpp

mirror of https://github.com/Lizonghang/prima.cpp.git synced 2025-09-08 04:29:02 +00:00

Author	SHA1	Message	Date
Li, Zonghang	aacfa8a231	fix compute buffer estimate: reserve 300 MiB VRAM to avoid potential OOM	2025-06-26 20:45:45 +04:00
Li, Zonghang	a05022c05a	communication: use barrier instead of manually adding delay	2025-06-26 17:30:47 +04:00
Li, Zonghang	3f27a25340	topo rebuild: add a delay to avoid packet interleaving	2025-06-26 14:50:58 +04:00
Li, Zonghang	729870fcd7	topo rebuild: add a delay to avoid packet interleaving	2025-06-26 14:47:34 +04:00
Li, Zonghang	72701ae872	fix compute buffer estimate: reserve 200 MiB VRAM to avoid potential OOM	2025-06-24 20:39:49 +04:00
Li, Zonghang	4dde8458cf	fix compute buffer estimate: reserve 100 MiB VRAM to avoid potential OOM	2025-06-24 19:29:10 +04:00
Li, Zonghang	90b1079d78	fix compute_buffer estimate: remove unused memory for CUDA device	2025-06-24 16:37:16 +04:00
Li, Zonghang	16ba3564ce	fix compute_buffer estimate: add context GPU usage	2025-06-24 16:09:59 +04:00
Zonghang Li	45e8b0420c	fix compute buffer estimate: tested on cuda	2025-06-22 08:10:57 +00:00
Li, Zonghang	80e5b71b48	fix compute buffer estimate: tested on metal	2025-06-20 13:43:55 +04:00
Zonghang Li	dd589561b4	improve the computing buffer estimate	2025-06-19 08:02:43 +00:00
DeEMO	6ff38b2a0c	add args: data-port and signal-port	2025-06-17 12:00:04 +08:00
DeEMO	104e3b2356	fix: replace localhost to 127.0.0.1	2025-06-17 11:27:58 +08:00
Li, Zonghang	fbbc30c950	Merge branch 'speculative' into dev	2025-06-16 13:27:36 +04:00
Li, Zonghang	dc875bbef9	fix speculative decoding	2025-06-13 08:18:12 +04:00
DeEMO	d4618de991	fix: block when free socket	2025-06-12 12:26:10 +00:00
DeEMO	2039e3b0c1	fix: send and recv meta	2025-06-12 12:26:10 +00:00
DeEMO	d6c8d322cd	fix try_connect	2025-06-12 12:26:10 +00:00
DeEMO	d1b97f798e	support reconnection	2025-06-12 12:26:09 +00:00
Li, Zonghang	3e6d831930	fix seq_id mismatch between head and worker devices	2025-06-11 17:10:21 +04:00
Li, Zonghang	fb9b1f2b00	reformat llama.cpp	2025-06-09 13:04:22 +04:00
Li, Zonghang	22a6ddef13	fix batch decoding and dynamic batching	2025-06-07 00:53:56 +04:00
Lizonghang	e56be76bdf	assume only a single seq_id per token is needed	2025-06-07 00:42:44 +04:00
Lizonghang	d8aea899d1	fix n_seq_id and seq_id	2025-06-06 23:58:03 +04:00
Lizonghang	a1a2238831	add batch_all.n_seq_id and batch_all.seq_id to sync_meta	2025-06-06 23:36:53 +04:00
Lizonghang	68ecc8509d	add batch_all.logits to sync_meta	2025-06-06 22:58:48 +04:00
Lizonghang	500e066a2f	fix batch decoding and dynamic batching	2025-06-06 16:53:22 +04:00
Li, Zonghang	6439090920	reformat code	2025-06-03 23:53:24 +04:00
Li, Zonghang	a01fafd126	Merge branch 'main' into dev	2025-06-03 17:56:47 +04:00
Li, Zonghang	1b3b6a506f	fix: add warm-up in profiling to prevent init delay	2025-06-03 17:10:09 +04:00
Li, Zonghang	7b0ededd24	Merge branch 'dev' into feat/auto-exit	2025-05-20 02:04:14 +08:00
Lizonghang	421b3deca5	fix llama-cli pos sync	2025-05-19 18:08:27 +04:00
Lizonghang	c54a6a0132	fix context shifting	2025-05-19 16:58:35 +04:00
DeEMO	34eaa8224d	fix: handle socket closure and connection in llama_rebuild_topo Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:35 +00:00
DeEMO	8b61cb2fa4	fix: adapt the new topo Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:29 +00:00
DeEMO	df16b1876f	refactor: add zmq helper to generate message Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:24 +00:00
DeEMO	4b36aef157	fix some bugs Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:08 +00:00
DeEMO	cc46aa9828	update rank and n_world Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:02 +00:00
DeEMO	fdd6694633	add topo rebuild Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:21:53 +00:00
Lizonghang	2fbc0c8da3	fix: reset -ngl to 0 when GPU is not used and reformat code	2025-05-14 13:27:20 +04:00
DeEMO	168c14f4e8	remove unnecessary profile when `--lw` is specified	2025-04-17 13:49:09 +00:00
leeetao	fc1e2d3fc6	Added support for iq1s and iq1m quantization type	2025-04-17 10:27:53 +00:00
Zonghang Li	63b45a4c26	add args -k and --force	2025-03-11 22:09:39 +04:00
Zonghang Li	bcfdace59b	add args -k and --force	2025-03-11 20:44:36 +04:00
leeetao	45ec52c2cb	Added support for IQ1_M and IQ2_XXS quantization type	2025-03-07 16:56:16 +00:00
leeetao	6a416534c8	Fixed the alignment display of device performance	2025-03-07 07:46:30 +00:00
leeetao	2f049b8428	Added support for Q2K, IQ1s, IQ4NL quantization types	2025-03-04 15:22:55 +00:00
leeetao	e2cda4cfa0	Removed support for GGML_TYPE_Q4_0_4_4, GGML_TYPE_0_4_8, and GGML_TYPE_0_8_8 (GGUF no longer supports these types)	2025-03-01 14:31:38 +00:00
Lizonghang	9cbdf01645	fix support for Q5_0	2025-02-27 22:25:03 +04:00
Lizonghang	550fdcbc4f	add support for Q5_0	2025-02-27 21:47:14 +04:00

1 2 3 4 5 ...

266 commits