README.md
In sd2R: Stable Diffusion Image Generation

sd2R

sd2R is an R package that provides a native, GPU-accelerated Stable Diffusion pipeline by wrapping the C++ implementation from stable-diffusion.cpp and using ggmlR as the tensor backend.

sd2R exposes a high-level R interface for text-to-image and image-to-image generation, while all heavy computation (tokenization, encoders, denoiser, sampler, VAE, model loading) is implemented in C++. Supports SD 1.x, SD 2.x, SDXL, and Flux model families. Targets local inference on Linux with Vulkan-enabled AMD GPUs (with automatic CPU fallback via ggml), without relying on external Python or web APIs.

Flux without Python:

R  →  sd2R  →  ggmlR  →  ggml  →  Vulkan  →  GPU

C++ core (src/sd/): tokenizers, text encoders (CLIP, Mistral, Qwen, UMT5), diffusion UNet/MMDiT denoiser, samplers, VAE encoder/decoder, and model loading for .safetensors and .gguf weights.
R layer: user-facing pipeline functions, parameter validation, image helpers, testing, and documentation-friendly API.
Backend: links against ggmlR (headers via LinkingTo) and libggml.a, reusing the same GGML/Vulkan stack that also powers llamaR and other ggmlR-based packages.

Unified sd_generate() — single entry point for all generation modes. Automatically selects the optimal strategy (direct, tiled sampling, or highres fix) based on output resolution and available VRAM (vram_gb parameter in sd_ctx()). Users don't need to think about tiling at all.
CRAN-ready defaults: verbose = FALSE by default — no console output unless explicitly enabled. Cross-platform build system with configure/configure.win generating Makevars from templates.
VRAM-aware auto-routing: queries free GPU memory at runtime and routes to direct generation (fits in VRAM), highres fix (txt2img + upscale + tiled img2img, preferred for coherent large images), or tiled sampling (MultiDiffusion fallback). VAE tiling is also VRAM-aware — enabled automatically only when free memory is insufficient for the given resolution. Set vram_gb in sd_ctx() to override auto-detection.
Multi-GPU data parallelism: sd_generate_multi_gpu() distributes prompts across Vulkan GPUs via callr, one process per GPU, with progress reporting.
Multi-GPU model parallelism: device_layout parameter in sd_ctx() distributes sub-models across multiple Vulkan GPUs within a single process. Presets: "mono" (all on one GPU), "split_encoders" (CLIP/T5 on GPU 1, diffusion + VAE on GPU 0), "split_vae" (CLIP/T5 + VAE on GPU 1, diffusion on GPU 0), "encoders_cpu" (text encoders on CPU). Manual override via diffusion_gpu, clip_gpu, vae_gpu.
Profiling: built-in per-stage timing via sd_profile_start() / sd_profile_stop() / sd_profile_summary(). Tracks model loading, text encoding (with CLIP/T5 breakdown), sampling, and VAE decode/encode stages.
Text-to-image generation supporting Stable Diffusion 1.x, 2.x, SDXL, and Flux models with typical generations taking a few seconds on Vulkan-enabled GPUs.
Image-to-image workflows with noise strength control and reuse of the same denoising pipeline as text-to-image. Requires vae_decode_only = FALSE in context.
Optional upscaling using a dedicated upscaler context managed entirely in C++ and exposed to R through external pointers.
VRAM-aware Tiled VAE for high-resolution images (2K, 4K+) with bounded VRAM usage. vae_mode = "auto" (default) queries free GPU memory before VAE decode and enables tiling only when estimated peak usage exceeds available VRAM (with a 50 MB safety reserve). Falls back to a pixel-area threshold (vae_auto_threshold) when Vulkan memory query is unavailable (CPU backend, no GPU). Supports per-axis relative tile sizing (vae_tile_rel_x, vae_tile_rel_y) for non-square aspect ratios.
Tiled diffusion sampling (MultiDiffusion): at each denoising step the latent is split into overlapping tiles, each denoised independently, and merged with Gaussian weighting. VRAM usage scales with tile size, not output resolution.
Highres Fix: classic two-pass pipeline — generates base image at native model resolution, upscales (bilinear or ESRGAN), then refines with tiled img2img at low denoising strength. Produces coherent high-resolution images (2K, 4K+) with global composition preserved.
Image utilities in R: saving generated images to PNG, converting between internal tensors and R raw vectors, and simple inspection of output tensors.
System introspection via sd_system_info(), reporting GGML/Vulkan capabilities as detected by ggmlR at build time.
Pipeline graph API: sd_pipeline() + sd_node() for composable, sequential multi-step workflows (txt2img → upscale → img2img → save). Pipelines are serializable to JSON via sd_save_pipeline() / sd_load_pipeline().

pipe <- sd_pipeline(
  sd_node("txt2img", prompt = "a cat in space", width = 512, height = 512),
  sd_node("upscale", factor = 2),
  sd_node("img2img", strength = 0.3),
  sd_node("save", path = "output.png")
)

# Save / load as JSON
sd_save_pipeline(pipe, "my_pipeline.json")
pipe <- sd_load_pipeline("my_pipeline.json")

# Run
ctx <- sd_ctx("model.safetensors")
sd_run_pipeline(pipe, ctx, upscaler_ctx = upscaler)

Rcpp bindings: src/sd2R_interface.cpp defines a thin bridge between R and the C API in stable-diffusion.h, returning XPtr objects with custom finalizers for correct lifetime management of sd_ctx_t and upscaler_ctx_t.
Build system: configure / configure.win generate Makevars from .in templates, resolving ggmlR paths, OpenMP, and Vulkan at configure time. Per-target -include r_ggml_compat.h applied only to sd/*.cpp sources to avoid macro conflicts with system headers.
Package metadata: DESCRIPTION declares Rcpp and ggmlR in LinkingTo, and NAMESPACE is generated via roxygen2 with useDynLib and Rcpp imports.
On load: .onLoad() initializes logging and registers constant values that mirror the underlying C++ enums using 0-based indices.

verbose = FALSE by default — no output unless requested.
Per-target compiler flags for cross-platform compatibility (Linux, macOS, Windows).
All C++ warnings fixed (-Winconsistent-missing-override, deprecated codecvt).
Large tokenizer vocabularies (CLIP, Mistral, Qwen, UMT5) downloaded automatically during installation from GitHub Releases, keeping the source tarball small.

# Install ggmlR first (if not already installed)
remotes::install_github("Zabis13/ggmlR")

# Install sd2R
remotes::install_github("Zabis13/sd2R")

During installation, the configure script automatically downloads tokenizer vocabulary files (~128 MB total) from GitHub Releases. This requires curl or wget.

If you don't have internet access during installation, download the vocabulary files manually and place them into src/sd/ before building:

# Download from https://github.com/Zabis13/sd2R/releases/tag/assets
# Files: vocab.hpp, vocab_mistral.hpp, vocab_qwen.hpp, vocab_umt5.hpp

wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_mistral.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_qwen.hpp -P src/sd/
wget https://github.com/Zabis13/sd2R/releases/download/assets/vocab_umt5.hpp -P src/sd/

R CMD INSTALL .

R ≥ 4.1.0, C++17 compiler
curl or wget (for downloading vocabulary files during installation)
Optional GPU: libvulkan-dev + glslc (Linux) or Vulkan SDK (Windows)
Platforms: Linux, macOS, Windows (x86-64, ARM64)

CLIP-L + T5-XXL text encoders, VAE. sample_steps = 10.

| Test | AMD RX 9070 (16 GB) | Tesla P100 (16 GB) | 2x Tesla T4 (16 GB) | |---|---|---|---| | 1. 768x768 direct | 44.2 s | 94.0 s | 133.1 s | | 2. 1024x1024 tiled VAE | 163.6 s | 151.4 s | 243.6 s | | 3. 2048x1024 highres fix | 309.7 s | 312.5 s | 492.2 s | | 4. img2img 768x768 direct | 29.6 s | 51.0 s | 73.5 s | | 5. 1024x1024 direct | 163.0 s | 152.2 s | 243.3 s | | 6. Multi-GPU 4 prompts | -- | -- | 284.9 s (4 img) |

CLIP-L + T5-XXL (Q5_K_M) text encoders, VAE. sample_steps = 25.

| Test | AMD RX 9070 (16 GB) | 2x Tesla T4 (16 GB) | |---|---|---| | 768x768 direct | 110.8 s | -- | | 1024x1024 direct | -- | 553.1 s |

| | SD 1.5 | Flux Q4_K_S | |---|---|---| | Diffusion params | ~860 MB | ~6.5 GB | | Text encoders | CLIP ~240 MB | CLIP-L + T5-XXL ~3.9 GB | | Sampling per step (768x768) | ~0.1–0.3 s | ~3.9 s | | Architecture | UNet | MMDiT (57 blocks) |

For a live, runnable demo see the Kaggle notebook: Stable Diffusion in R (ggmlR + Vulkan GPU).

llamaR — LLM inference in R
sd2R — Stable Diffusion in R
ggml — underlying C library

MIT

Any scripts or data that you put into this service are public.

sd2R documentation built on March 30, 2026, 5:08 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sd2R
Stable Diffusion Image Generation

README.md
In sd2R: Stable Diffusion Image Generation

sd2R

Overview

Architecture

Key Features

Pipeline Example

Implementation Details

CRAN Readiness

Installation

Offline / Manual Installation

System Requirements

Benchmarks

FLUX.1-dev Q4_K_S — 10 steps

FLUX.1-dev Q4_K_S — 25 steps

Model size comparison

Examples

See Also

License

Try the sd2R package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

sd2R Stable Diffusion Image Generation

README.md In sd2R: Stable Diffusion Image Generation

sd2R

Overview

Architecture

Key Features

Pipeline Example

Implementation Details

CRAN Readiness

Installation

Offline / Manual Installation

System Requirements

Benchmarks

FLUX.1-dev Q4_K_S — 10 steps

FLUX.1-dev Q4_K_S — 25 steps

Model size comparison

Examples

See Also

License

Try the sd2R package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

sd2R
Stable Diffusion Image Generation

README.md
In sd2R: Stable Diffusion Image Generation