All C++ handling code currently:
- build a comma-separated list from the info_vulkan array
- if GGML_VK_VISIBLE_DEVICES isn't set
- set GGML_VK_VISIBLE_DEVICES to the list
Once set, GGML_VK_VISIBLE_DEVICES affects the whole process. So this
can be done in the same way at the Python level, before all loading
functions.
Caveat: load_model had the default `inputs.vulkan_info = "0"`,
so the default GPU would be "0" only when loading a text model.
* tweak format sting types
This may not be all of them, but it's the ones which warn on OpenBSD
* complete the changes needed to fix the format string specifers
* avoid using inttypes, directly cast to size_t (u64 usually) instead
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding