mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-16 19:59:16 +00:00
# Conflicts: # .github/labeler.yml # .github/workflows/build-self-hosted.yml # .github/workflows/release.yml # .github/workflows/server-sanitize.yml # .github/workflows/server-self-hosted.yml # .github/workflows/server.yml # .github/workflows/ui-build.yml # .github/workflows/ui-ci.yml # .github/workflows/ui-publish.yml # .gitignore # CMakeLists.txt # CODEOWNERS # scripts/ui-download.cmake # scripts/xxd.cmake # tests/test-backend-ops.cpp # tests/test-reasoning-budget.cpp # tools/CMakeLists.txt # tools/server/CMakeLists.txt # tools/server/README.md |
||
|---|---|---|
| .. | ||
| llama-eval.py | ||
| llama-server-simulator.py | ||
| README.md | ||
| test-simulator.sh | ||
llama-eval
Simple evaluation tool for llama.cpp with support for multiple datasets.
For a full description, usage examples, and sample results, see:
Quick start
# Single server
python3 llama-eval.py \
--server http://localhost:8033 \
--model my-model \
--dataset gsm8k --n_cases 100 \
--grader-type regex --threads 32
# Multiple servers (comma-separated URLs and thread counts)
python3 llama-eval.py \
--server http://server1:8033,http://server2:8033 \
--server-name server1,server2 \
--threads 16,16 \
--dataset aime2025 --n_cases 240 \
--grader-type regex