mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-06 16:21:49 +00:00
# Conflicts: # CONTRIBUTING.md # docs/backend/CANN.md # docs/backend/SYCL.md # docs/backend/snapdragon/README.md # docs/backend/snapdragon/windows.md # docs/build.md # docs/multimodal/MobileVLM.md # docs/ops.md # docs/ops/WebGPU.csv # examples/debug/README.md # examples/llama.vim # examples/model-conversion/README.md # examples/sycl/README.md # ggml/src/ggml-cpu/amx/mmq.cpp # ggml/src/ggml-cpu/arch/x86/repack.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp-drv.cpp # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/hvx-copy.h # ggml/src/ggml-hexagon/htp/hvx-inverse.h # ggml/src/ggml-hexagon/htp/hvx-reduce.h # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/worker-pool.c # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cpy.cl # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/quants.hpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/pr2wt.sh # scripts/server-bench.py # scripts/snapdragon/windows/run-cli.ps1 # tests/test-alloc.cpp # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/cli/cli.cpp # tools/completion/README.md # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/README.md # tools/perplexity/README.md # tools/server/public_simplechat/readme.md # tools/server/tests/README.md |
||
|---|---|---|
| .. | ||
| diffusion-cli.cpp | ||
| README.md | ||
Diffusion Text Generation
This directory contains implementations for Diffusion LLMs (DLLMs)
More Info:
Parameters
The diffusion CLI supports various parameters to control the generation process:
Core Diffusion Parameters
--diffusion-steps: Number of diffusion steps (default: 256)--diffusion-algorithm: Algorithm for token selection0: ORIGIN - Token will be generated in a purely random order from https://arxiv.org/abs/2107.03006.1: ENTROPY_BASED - Entropy-based selection2: MARGIN_BASED - Margin-based selection3: RANDOM - Random selection4: CONFIDENCE_BASED - Confidence-based selection (default)- More documentation here https://github.com/DreamLM/Dream
--diffusion-visual: Enable live visualization during generation
Scheduling Parameters
Choose one of the following scheduling methods:
Timestep-based scheduling:
--diffusion-eps: Epsilon value for timestep scheduling (e.g., 0.001)
Block-based scheduling:
--diffusion-block-length: Block size for block-based scheduling (e.g., 32)
Sampling Parameters
--temp: Temperature for sampling (0.0 = greedy/deterministic, higher = more random)--top-k: Top-k filtering for sampling--top-p: Top-p (nucleus) filtering for sampling--seed: Random seed for reproducibility
Model Parameters
-m: Path to the GGUF model file-p: Input prompt text-ub: Maximum sequence length (ubatch size)-c: Context size-b: Batch size
Examples
Dream architecture:
llama-diffusion-cli -m dream7b.gguf -p "write code to train MNIST in pytorch" -ub 512 --diffusion-eps 0.001 --diffusion-algorithm 3 --diffusion-steps 256 --diffusion-visual
LLaDA architecture:
llama-diffusion-cli -m llada-8b.gguf -p "write code to train MNIST in pytorch" -ub 512 --diffusion-block-length 32 --diffusion-steps 256 --diffusion-visual
RND1 architecture:
llama-diffusion-cli -m RND1-Base-0910.gguf -p "write code to train MNIST in pytorch" -ub 512 --diffusion-algorithm 1 --diffusion-steps 256 --diffusion-visual --temp 0.5 --diffusion-eps 0.001