dot-estimate_vae_vram: Estimate peak VAE VRAM usage in bytes
In sd2R: Stable Diffusion Image Generation

.estimate_vae_vram

R Documentation

Estimate peak VAE VRAM usage in bytes

Description

Analytic upper bound on the peak compute-buffer size of the VAE decoder. The peak occurs in the ResNet block that runs at full pixel resolution (W x H) with the decoder's base channel width. The per-pixel cost is derived from architecture (base channels x dtype bytes); the only empirical constant is live_tensors — how many such full-res tensors ggml's graph allocator keeps alive simultaneously. That value is calibrated against an observed Flux failure: a 2048x1024 decode requested 19238223904 bytes, i.e. 19238223904 / (2048*1024) ~= 9175 B/px, and 9175 / (128 ch * 4 B) ~= 17.9 live full-res tensors. We round up to 18 for a safe over-estimate (tiling should engage rather than OOM).

Usage

.estimate_vae_vram(width, height, model_type = "sd1", batch = 1L)

Arguments

`width`	Image width in pixels
`height`	Image height in pixels
`model_type`	Model type string ("sd1", "sd2", "sdxl", "flux", etc.)
`batch`	Batch size (default 1)