mirror of https://github.com/Lizonghang/prima.cpp.git synced 2025-09-05 07:59:03 +00:00

https://new.reddit.com/r/LocalLLaMA/comments/1k013u1/primacpp_speeding_up_70bscale_llm_inference_on/ https://github.com/Lizonghang/prima.cpp

distributed-ai distributed-inference llama-cpp llm-inference on-device-llms

Find a file

Zonghang Li bcfdace59b add args -k and --force		2025-03-11 20:44:36 +04:00
.devops	Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641 )	2024-09-30 20:57:12 +02:00
ci	rerank : use [SEP] token instead of [BOS] (#9737 )	2024-10-05 15:55:04 +03:00
cmake	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
common	add args -k and --force	2025-03-11 20:44:36 +04:00
docs	Update building for Android (#9672 )	2024-10-07 09:37:31 -07:00
examples	add automatic layer window size assignment workflow	2024-11-08 18:21:03 +04:00
figures	add illustration for memory allocation of activations	2024-11-29 10:34:21 +04:00
ggml	test	2025-01-28 16:36:47 +04:00
gguf-py	convert : handle tokenizer merges format from transformers 4.45 (#9696 )	2024-10-03 17:22:15 +03:00
grammars	server : match OAI structured output response (#9527 )	2024-09-18 09:50:34 +03:00
include	add args -k and --force	2025-03-11 20:44:36 +04:00
media	README: add graphic for matrix multiplication (#6881 )	2024-04-24 21:29:13 +02:00
models	Added deepseek-r1-qwen vocabulary file	2025-02-23 08:33:57 +00:00
pocs	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
prompts	llama : add Qwen support (#4281 )	2023-12-01 20:16:31 +02:00
requirements	py : update transfomers version (#9694 )	2024-09-30 18:03:47 +03:00
scripts	sync : llama.cpp	2024-10-06 12:53:28 +03:00
spm-headers	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
src	add args -k and --force	2025-03-11 20:44:36 +04:00
tests	ggml : add backend registry / device interfaces to BLAS backend (#9752 )	2024-10-07 21:55:08 +02:00
.clang-tidy	cuda : refactor into multiple files (#6269 )	2024-03-25 13:50:23 +01:00
.dockerignore	ci : fix docker build number and tag name (#9638 )	2024-09-25 17:26:01 +02:00
.ecrc	common : Update stb_image.h to latest version (#9161 )	2024-08-27 08:58:50 +03:00
.editorconfig	cvector: fix CI + correct help message (#8064 )	2024-06-22 18:11:30 +02:00
.flake8	py : logging and flake8 suppression refactoring (#7081 )	2024-05-05 08:07:48 +03:00
.gitignore	common : refactor arg parser (#9308 )	2024-09-07 20:43:51 +02:00
.gitmodules	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
.pre-commit-config.yaml	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
CMakeLists.txt	cmake : add option for common library (#9661 )	2024-09-27 10:42:06 +03:00
CMakePresets.json	CMake fix: host for msvc compiler can only be x86 or x64 (#8624 )	2024-09-06 00:14:12 +02:00
convert_hf_to_gguf.py	convert : refactor rope_freqs generation (#9396 )	2024-10-01 09:31:36 +03:00
convert_hf_to_gguf_update.py	llama : add reranking support (#9510 )	2024-09-28 17:42:03 +03:00
convert_llama_ggml_to_gguf.py	py : fix wrong input type for raw_dtype in ggml to gguf scripts (#8928 )	2024-08-16 13:36:30 +03:00
convert_lora_to_gguf.py	convert : refactor rope_freqs generation (#9396 )	2024-10-01 09:31:36 +03:00
flake.lock	flake.lock: Update (#9753 )	2024-10-07 09:35:42 -07:00
flake.nix	build(nix): Package gguf-py (#5664 )	2024-09-02 14:21:01 +03:00
LICENSE	license : update copyright notice + add AUTHORS (#6405 )	2024-04-09 09:23:19 +03:00
Makefile	remove conda path	2025-02-23 01:38:13 +04:00
mypy.ini	convert : partially revert PR #4818 (#5041 )	2024-01-20 18:14:18 -05:00
Package.swift	ggml-backend : add device and backend reg interfaces (#9707 )	2024-10-03 01:49:47 +02:00
poetry.lock	build(python): Package scripts with pip-0517 compliance	2024-07-04 15:39:13 +00:00
pyproject.toml	build(nix): Package gguf-py (#5664 )	2024-09-02 14:21:01 +03:00
pyrightconfig.json	ci : reduce severity of unused Pyright ignore comments (#9697 )	2024-09-30 14:13:16 -04:00
README.md	init	2024-10-23 09:42:32 +04:00
requirements.txt	init	2024-10-23 14:29:14 +04:00
SECURITY.md	chore: Fix markdown warnings (#6625 )	2024-04-12 10:52:36 +02:00

README.md

prima.cpp

This is a distributed implementation of llama.cpp, coming soon.