koboldcpp/examples/diffusion
Concedo 746664fde6 Merge commit '2cd20b72ed' into concedo_experimental
# Conflicts:
#	CONTRIBUTING.md
#	docs/backend/CANN.md
#	docs/backend/SYCL.md
#	docs/backend/snapdragon/README.md
#	docs/backend/snapdragon/windows.md
#	docs/build.md
#	docs/multimodal/MobileVLM.md
#	docs/ops.md
#	docs/ops/WebGPU.csv
#	examples/debug/README.md
#	examples/llama.vim
#	examples/model-conversion/README.md
#	examples/sycl/README.md
#	ggml/src/ggml-cpu/amx/mmq.cpp
#	ggml/src/ggml-cpu/arch/x86/repack.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp-drv.cpp
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-base.h
#	ggml/src/ggml-hexagon/htp/hvx-copy.h
#	ggml/src/ggml-hexagon/htp/hvx-inverse.h
#	ggml/src/ggml-hexagon/htp/hvx-reduce.h
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/worker-pool.c
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cpy.cl
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/quants.hpp
#	ggml/src/ggml-sycl/softmax.cpp
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/pr2wt.sh
#	scripts/server-bench.py
#	scripts/snapdragon/windows/run-cli.ps1
#	tests/test-alloc.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/completion/README.md
#	tools/cvector-generator/cvector-generator.cpp
#	tools/imatrix/README.md
#	tools/perplexity/README.md
#	tools/server/public_simplechat/readme.md
#	tools/server/tests/README.md
2026-03-10 22:11:08 +08:00
..
diffusion-cli.cpp Fix locale-dependent float printing in GGUF metadata (#17331) 2026-03-04 09:30:40 +01:00
README.md chore : correct typos [no ci] (#20041) 2026-03-05 08:50:21 +01:00

Diffusion Text Generation

This directory contains implementations for Diffusion LLMs (DLLMs)

More Info:

Parameters

The diffusion CLI supports various parameters to control the generation process:

Core Diffusion Parameters

  • --diffusion-steps: Number of diffusion steps (default: 256)
  • --diffusion-algorithm: Algorithm for token selection
    • 0: ORIGIN - Token will be generated in a purely random order from https://arxiv.org/abs/2107.03006.
    • 1: ENTROPY_BASED - Entropy-based selection
    • 2: MARGIN_BASED - Margin-based selection
    • 3: RANDOM - Random selection
    • 4: CONFIDENCE_BASED - Confidence-based selection (default)
    • More documentation here https://github.com/DreamLM/Dream
  • --diffusion-visual: Enable live visualization during generation

Scheduling Parameters

Choose one of the following scheduling methods:

Timestep-based scheduling:

  • --diffusion-eps: Epsilon value for timestep scheduling (e.g., 0.001)

Block-based scheduling:

  • --diffusion-block-length: Block size for block-based scheduling (e.g., 32)

Sampling Parameters

  • --temp: Temperature for sampling (0.0 = greedy/deterministic, higher = more random)
  • --top-k: Top-k filtering for sampling
  • --top-p: Top-p (nucleus) filtering for sampling
  • --seed: Random seed for reproducibility

Model Parameters

  • -m: Path to the GGUF model file
  • -p: Input prompt text
  • -ub: Maximum sequence length (ubatch size)
  • -c: Context size
  • -b: Batch size

Examples

Dream architecture:

llama-diffusion-cli -m dream7b.gguf -p "write code to train MNIST in pytorch" -ub 512 --diffusion-eps 0.001 --diffusion-algorithm 3 --diffusion-steps 256 --diffusion-visual

LLaDA architecture:

llama-diffusion-cli -m llada-8b.gguf -p "write code to train MNIST in pytorch" -ub 512 --diffusion-block-length 32 --diffusion-steps 256 --diffusion-visual

RND1 architecture:

llama-diffusion-cli -m RND1-Base-0910.gguf -p "write code to train MNIST in pytorch" -ub 512 --diffusion-algorithm 1 --diffusion-steps 256 --diffusion-visual --temp 0.5 --diffusion-eps 0.001