need to include <sstream> otherwise build fails with lots of the below errors:
```
C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): error C2297: '<<': not valid as right operand has type 'const cha
r [26]' [C:\koboldcpp\build\music_adapter.vcxproj]
(compiling source file '../otherarch/acestep/music_adapter.cpp')
C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9): error C2679: binary '<<': no operator found which takes a right-h
and operand of type 'std::string' (or there is no acceptable conversion) [C:\koboldcpp\build\music_adapter.vcxproj]
(compiling source file '../otherarch/acestep/music_adapter.cpp')
C:\Program Files (x86)\Microsoft Visual Studio\18\BuildTools\VC\Tools\MSVC\14.50.35717\include\__msvc_int128.hpp(
753,46):
could be 'std::_Unsigned128 std::operator <<(const std::_Unsigned128 &,const std::_Base128 &) noexcept' [found us
ing argument-dependent lookup]
C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,9):
'std::_Unsigned128 std::operator <<(const std::_Unsigned128 &,const std::_Base128 &) noexcept': cannot conver
t argument 2 from 'std::string' to 'const std::_Base128 &'
C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,57):
Reason: cannot convert from 'std::string' to 'const std::_Base128'
C:\koboldcpp\otherarch\acestep\ace-qwen3.cpp(1278,57):
No user-defined-conversion operator available that can perform this conversion, or the operator cannot be
called
```
Round image dimensions to the specific multiple required by each
DiT model, which range from 32 (certain Wan models) to 1 (Chroma
Radiance), with most requiring multiples of 8 or 16. Unet models
keep being rounded to multiples of 64.
Current sd.cpp rounds the sizes internally; but it always rounds
up, so we still need to round on our side to apply image size
restrictions, and to trigger VAE tiling correctly.
Also, remove a legacy test that could abort a generation with
unsupported image sizes: it'd never run, because it was applied
after the image side adjustements.
* Improve CUDA graph capture
Currently, CUDA graphs are eagerly enabled on the first call to ggml_backend_cuda_graph_compute. If the graph properties keep changing (4+ consecutive updates), the graph is permanently disabled. This is suboptimal because:
- The first call always incurs CUDA graph capture overhead even if the graph is unstable
- Once permanently disabled, CUDA graphs never re-enable even after the graph stabilizes (e.g., switching from prompt processing to decode)
The new approach delays CUDA graph activation until warmup completes: the same cgraph must be called at least twice with matching properties before CUDA graph capture begins. This avoids wasted capture overhead on volatile graphs and allows graphs to become eligible once they stabilize.
This also fixes issues such as https://github.com/ggml-org/llama.cpp/discussions/19708
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Remove EM dashes
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Aman Gupta <amangupta052@gmail.com>