prima.cpp

mirror of https://github.com/Lizonghang/prima.cpp.git synced 2025-09-06 09:39:02 +00:00

Author	SHA1	Message	Date
Zonghang Li	1ea2d61a97	speedup: add arg --keep-out-in-cuda to run the output layer on CUDA	2025-06-28 10:58:18 +04:00
Zonghang Li	45e8b0420c	fix compute buffer estimate: tested on cuda	2025-06-22 08:10:57 +00:00
Li, Zonghang	80e5b71b48	fix compute buffer estimate: tested on metal	2025-06-20 13:43:55 +04:00
Zonghang Li	dd589561b4	improve the computing buffer estimate	2025-06-19 08:02:43 +00:00
DeEMO	6ff38b2a0c	add args: data-port and signal-port	2025-06-17 12:00:04 +08:00
Li, Zonghang	fbbc30c950	Merge branch 'speculative' into dev	2025-06-16 13:27:36 +04:00
Li, Zonghang	dc875bbef9	fix speculative decoding	2025-06-13 08:18:12 +04:00
DeEMO	2039e3b0c1	fix: send and recv meta	2025-06-12 12:26:10 +00:00
DeEMO	d6c8d322cd	fix try_connect	2025-06-12 12:26:10 +00:00
DeEMO	d1b97f798e	support reconnection	2025-06-12 12:26:09 +00:00
Li, Zonghang	3e6d831930	fix seq_id mismatch between head and worker devices	2025-06-11 17:10:21 +04:00
Li, Zonghang	6439090920	reformat code	2025-06-03 23:53:24 +04:00
Li, Zonghang	7b0ededd24	Merge branch 'dev' into feat/auto-exit	2025-05-20 02:04:14 +08:00
Lizonghang	c54a6a0132	fix context shifting	2025-05-19 16:58:35 +04:00
DeEMO	cc46aa9828	update rank and n_world Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:22:02 +00:00
DeEMO	fdd6694633	add topo rebuild Signed-off-by: DeEMO <yzzxrx@gmail.com>	2025-05-19 09:21:53 +00:00
Lizonghang	2fbc0c8da3	fix: reset -ngl to 0 when GPU is not used and reformat code	2025-05-14 13:27:20 +04:00
DeEMO	168c14f4e8	remove unnecessary profile when `--lw` is specified	2025-04-17 13:49:09 +00:00
leeetao	fc1e2d3fc6	Added support for iq1s and iq1m quantization type	2025-04-17 10:27:53 +00:00
Zonghang Li	bcfdace59b	add args -k and --force	2025-03-11 20:44:36 +04:00
leeetao	e2cda4cfa0	Removed support for GGML_TYPE_Q4_0_4_4, GGML_TYPE_0_4_8, and GGML_TYPE_0_8_8 (GGUF no longer supports these types)	2025-03-01 14:31:38 +00:00
leeetao	7bf1b743fb	Merge branch 'dev' into lt_test Merge dev branch updates into local branch lt_test.	2025-02-23 08:35:45 +00:00
leeetao	f99e08b9fe	Added inference support for the Deepseek distilled model	2025-02-23 08:27:37 +00:00
Lizonghang	c84f9d29fe	use arg prefetch and remove arg unload	2025-02-12 17:04:41 +04:00
Lizonghang	1c0087e919	rename arg --keep-inp-out-in-metal to --keep-out-in-metal	2025-01-23 23:17:06 +04:00
Lizonghang	78a544d716	add metal mem limit	2025-01-23 16:08:52 +04:00
Lizonghang	facb4ea736	add option --keep-inp-out-in-metal and fix bugs in unmap	2025-01-22 11:15:19 +04:00
Zonghang Li	46e99218b4	add arg --cuda-mem	2025-01-16 09:15:34 +04:00
Lizonghang	3d75b8576e	add api llama_model_set_n_gpu_layers	2025-01-15 10:48:19 +04:00
Lizonghang	9279a2e3ff	fix error in llama_context_n_gpu_layers	2025-01-15 10:08:41 +04:00
Lizonghang	5d9aadf3d5	use highs to solve the allocation program	2025-01-15 10:04:04 +04:00
Lizonghang	8e9ab45458	fix model bytes counter	2024-12-10 14:57:48 +04:00
Lizonghang	d78fa427e7	add memory copy speed test	2024-12-09 10:07:42 +04:00
Zonghang Li	df813675d0	fix flops count and ram/vram speed test	2024-12-08 10:14:05 +04:00
Lizonghang	cd823546dd	llama_profile_device: add arg n_predict	2024-12-06 16:37:25 +04:00
Lizonghang	6f54a12c7d	add gpu support in llama_model_kvcache_size and llama_model_compute_buf_size	2024-11-29 21:06:32 +04:00
Lizonghang	68ecabc8c3	add cpu_read_ram_bw, metal_read_vram_bw, cuda_read_vram_bw	2024-11-29 19:04:53 +04:00
Lizonghang	0f73d12247	decrease compute buf from available memory	2024-11-29 11:15:54 +04:00
Lizonghang	45a1e55eec	reduce kv cache from available memory	2024-11-28 20:21:21 +04:00
Lizonghang	9a7bbce7ad	fix t_load_us	2024-11-28 15:55:21 +04:00
Lizonghang	9cd22177d0	remove arg test_file	2024-11-27 21:34:45 +04:00
Zonghang Li	f78c437172	add device_inp_embd_delay test, device_memory_bw test, device_cuda_memory_bw test,	2024-11-26 22:28:02 +04:00
Lizonghang	3fe00a16a0	count model flops for f32xf32, f16xf32, q4kxf32, q6kxf32	2024-11-24 13:13:32 +04:00
Zonghang Li	7ee1423006	add model_flops	2024-11-21 20:06:16 +04:00
Lizonghang	477ecf2084	add llama_model_n_flops	2024-11-20 19:40:27 +04:00
Lizonghang	5fae6ac36f	add cpu flops test	2024-11-09 20:53:42 +04:00
Lizonghang	2bd4d03aa8	add automatic layer window size assignment workflow	2024-11-08 18:21:03 +04:00
Lizonghang	53cb3a6069	synchronize device info	2024-11-07 22:02:01 +04:00
Lizonghang	ef7fdf70cc	add LLAMA_API llama_profile_device	2024-11-07 09:30:39 +04:00
Lizonghang	407c71ae52	add cpu and gpu profile	2024-11-06 20:42:28 +04:00

1 2

91 commits