rmeshedgp: Prior sampling from a Meshed Gaussian Process

View source: R/rmgp.r

rmeshedgpR Documentation

Prior sampling from a Meshed Gaussian Process

Description

Generates samples from a (univariate) MGP assuming a cubic directed acyclic graph and axis-parallel domain partitioning.

Usage

rmeshedgp(coords, theta, 
  axis_partition = NULL, block_size = 100, 
  n_threads=1, cache=TRUE, verbose=FALSE, debug=FALSE)

Arguments

coords

matrix of spatial or spatiotemporal coordinates with d=2 or d=3 columns for spatial or spatiotemporal data, respectively.

theta

vector with covariance parameters. If d=2 and theta is a 2-dimensional vector then θ = (φ, σ^2) where φ is the spatial decay and σ^2 is the spatial variance in the exponential covariance model. If d=2 and theta is a 3-dimensional vector then θ = (φ, ν, σ^2) and a Matern model with smoothness ν is used instead. If d=3, theta must be a 4-dimensional vector and θ=(a, φ, b, σ^2) using Gneiting's non-separable spatiotemporal covariance detailed below.

axis_partition

integer vector of length d with the number of intervals along which each axis should be partitioned. The domain will be partitioned into prod(axis_partition) blocks. This argument can be left blank when using block_size.

block_size

integer specifying the (approximate) size of the blocks, i.e. how many spatial or spatiotemporal locations should be included in each block. Note: larger values correspond to an MGP that is closer to a full GP, but require more expensive computations.

n_threads

integer number of OpenMP threads. This is ineffective if meshed was not compiled with OpenMP support.

cache

bool: whether to use cache. Some computational speedup is associated to cache=TRUE if coords are a grid.

verbose

bool: print some messages.

debug

bool: print more messages.

Details

Gaussian processes (GPs) lack in scalability to big datasets due to the assumed unrestricted dependence across the spatial or spatiotemporal domain. Meshed GPs instead use a directed acyclic graph (DAG) with patterns, called mesh, to simplify the dependence structure across the domain. Each DAG node corresponds to a partition of the domain. MGPs can be interpreted as approximating the GP they originate from, or as standalone processes that can be sampled from. This function samples random MGPs and can thus be used to generate big spatial or spatiotemporal data. The only requirement to sample from a MGP compared to a standard GP is the specification of the domain partitioning strategy. Here, either axis_partition or block_size can be used; the default block_size=100 can be used to quickly sample smooth surfaces at millions of locations.

Just like in a standard GP, one needs a covariance function or kernel which can be set as follows. For spatial data (d=2), the length of theta determines which model is used (see above). Letting h = \| s-s' \| where s and s' are locations in the spatial domain, the exponential covariance is defined as:

C(h) = σ^2 \exp \{ - φ h \},

whereas the Matern model is

C(h) = σ^2 \frac{2^{1-ν}}{Γ(ν)} φ^{ν} h^{ν} K_{ν} ( φ h ),

where K_{ν} is the modified Bessel function of the second kind of order ν. For spatiotemporal data (d=3) the covariance function between locations (s, t) and (s', t') with distance h = \| s-s' \| and time lag u = \| t-t' \| is defined as

C(h, u) = σ^2 / (a u + 1) \exp \{ -φ h (a u + 1)^{-b/2} \},

which is a special case of non-separable spacetime covariance as introduced by Gneiting (2002).

Value

data.frame with the (reordered) supplied coordinates in the first d columns, and the MGP sample in the last column, labeled w.

Author(s)

Michele Peruzzi <michele.peruzzi@duke.edu>

References

Gneiting, T (2002) Nonseparable, Stationary Covariance Functions for Space-Time Data. Journal of the American Statistical Association. doi: 10.1198/016214502760047113

Peruzzi, M., Banerjee, S., and Finley, A.O. (2020) Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. Journal of the American Statistical Association, in press. doi: 10.1080/01621459.2020.1833889

Examples

  
library(ggplot2)
library(magrittr)
library(meshed)

# spatial domain (we choose a grid to make a nice image later)
# this generates a dataset of size 6400
xx <- seq(0, 1, length.out=80)
coords <- expand.grid(xx, xx) %>%
  as.matrix()

raster_plot <- function(df){
  ggplot(df, aes(Var1, Var2, fill=w)) +
    geom_raster() +
    scale_fill_viridis_c() +
    theme_minimal() }

# spatial data, exponential covariance
# phi=14, sigma^2=2
simdata <- rmeshedgp(coords, c(14, 2))
raster_plot(simdata)

# spatial data, matern covariance
# phi=14, nu=1, sigma^2=2
simdata <- rmeshedgp(coords, c(14, 1, 2))
raster_plot(simdata)

# spacetime data, gneiting's covariance
# 64000 locations
stcoords <- expand.grid(xx, xx, seq(0, 1, length.out=10))

# it should take less than a couple of seconds
simdata <- rmeshedgp(stcoords, c(1, 14, .5, 2))
# plot data at 7th time period
raster_plot(simdata %>% dplyr::filter(Var3==unique(Var3)[7])) 


meshed documentation built on Sept. 20, 2022, 1:06 a.m.