Adrien Gallouët
1d7ab2b947
app : add batched-bench, fit-params, quantize & perplexity ( #23459 )
...
Python Type-Check / python type-check (push) Waiting to run
* app : add batched-bench, fit-params, quantize & perplexity
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add missing main.cpp
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add EOL
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-05-21 10:29:44 +03:00
Johannes Gäßler
7256fce047
common: fix --fit verbosity with --verbosity 4 ( #23282 )
2026-05-19 21:33:23 +02:00
Georgi Gerganov
cfe9838d26
fit-params : refactor + add option to output estimated memory per device ( #22171 )
...
* fit-params : add option to output estimated memory per device
* cont : minor
* cont : refactor
* cont : move fit params implementation to libcommon
* cont : header
* cont : headers
* cont : codeowners
2026-04-21 09:54:36 +03:00
Adrien Gallouët
41361c8599
common : move up common_init() and fix Windows UTF-8 logs ( #21176 )
...
The build info is now only for debug, so we avoid the duplicate
with `--version`.
The UTF-8 setup at the beginning is needed to avoid logging
garbage on Windows.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-31 12:53:41 +02:00
Johannes Gäßler
e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 ( #19070 )
2026-01-24 22:13:08 +01:00
Johannes Gäßler
64848deb18
llama-fit-params: free memory target per device ( #18679 )
2026-01-08 10:07:58 +01:00
Johannes Gäßler
a52dc60ba3
llama_fit_params: return enum for fail vs. error ( #18374 )
2025-12-27 09:59:19 +01:00
Aadeshveer Singh
c184284230
fit-params : fix race condition in fit-params output ( #18276 )
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
2025-12-24 15:57:38 +01:00
Johannes Gäßler
4164596c76
llama-fit-params: QoL impr. for prints/errors ( #18089 )
2025-12-17 00:03:19 +01:00
Johannes Gäßler
b1f3a6e5db
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization ( #16653 )
...
* llama: automatically fit args to free memory
llama-fit-params tool
* fix CI
* hints for bug reports, ensure no reallocation
* fix segfault with Vulkan
* add llama-fit-params to CI
* fix CI
* fix CI
* fix CI
* minor adjustments
* fix assignment of 1 dense layer
* fix logger not being reset on model load failure
* remove --n-gpu-layer hint on model load failure
* fix llama-fit-params verbosity
* fix edge case
* fix typo [no ci]
2025-12-15 09:24:59 +01:00