rmeshedgp | R Documentation |
Generates samples from a (univariate) MGP assuming a cubic directed acyclic graph and axis-parallel domain partitioning.
rmeshedgp(coords, theta, axis_partition = NULL, block_size = 100, n_threads=1, cache=TRUE, verbose=FALSE, debug=FALSE)
coords |
matrix of spatial or spatiotemporal coordinates with d=2 or d=3 columns for spatial or spatiotemporal data, respectively. |
theta |
vector with covariance parameters. If d=2 and |
axis_partition |
integer vector of length d with the number of intervals along which each axis should be partitioned. The domain will be partitioned into |
block_size |
integer specifying the (approximate) size of the blocks, i.e. how many spatial or spatiotemporal locations should be included in each block. Note: larger values correspond to an MGP that is closer to a full GP, but require more expensive computations. |
n_threads |
integer number of OpenMP threads. This is ineffective if |
cache |
bool: whether to use cache. Some computational speedup is associated to |
verbose |
bool: print some messages. |
debug |
bool: print more messages. |
Gaussian processes (GPs) lack in scalability to big datasets due to the assumed unrestricted dependence across the spatial or spatiotemporal domain.
Meshed GPs instead use a directed acyclic graph (DAG) with patterns, called mesh, to simplify the dependence structure across the domain. Each DAG node corresponds to a partition of the domain. MGPs can be interpreted as approximating the GP they originate from, or as standalone processes that can be sampled from. This function samples random MGPs and can thus be used to generate big spatial or spatiotemporal data.
The only requirement to sample from a MGP compared to a standard GP is the specification of the domain partitioning strategy. Here, either axis_partition
or block_size
can be used; the default block_size=100
can be used to quickly sample smooth surfaces at millions of locations.
Just like in a standard GP, one needs a covariance function or kernel which can be set as follows.
For spatial data (d=2), the length of theta
determines which model is used (see above). Letting h = \| s-s' \| where s and s' are locations in the spatial domain, the exponential covariance is defined as:
C(h) = σ^2 \exp \{ - φ h \},
whereas the Matern model is
C(h) = σ^2 \frac{2^{1-ν}}{Γ(ν)} φ^{ν} h^{ν} K_{ν} ( φ h ),
where K_{ν} is the modified Bessel function of the second kind of order ν. For spatiotemporal data (d=3) the covariance function between locations (s, t) and (s', t') with distance h = \| s-s' \| and time lag u = \| t-t' \| is defined as
C(h, u) = σ^2 / (a u + 1) \exp \{ -φ h (a u + 1)^{-b/2} \},
which is a special case of non-separable spacetime covariance as introduced by Gneiting (2002).
data.frame with the (reordered) supplied coordinates in the first d
columns, and the MGP sample in the last column, labeled w
.
Michele Peruzzi <michele.peruzzi@duke.edu>
Gneiting, T (2002) Nonseparable, Stationary Covariance Functions for Space-Time Data. Journal of the American Statistical Association. doi: 10.1198/016214502760047113
Peruzzi, M., Banerjee, S., and Finley, A.O. (2020) Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. Journal of the American Statistical Association, in press. doi: 10.1080/01621459.2020.1833889
library(ggplot2) library(magrittr) library(meshed) # spatial domain (we choose a grid to make a nice image later) # this generates a dataset of size 6400 xx <- seq(0, 1, length.out=80) coords <- expand.grid(xx, xx) %>% as.matrix() raster_plot <- function(df){ ggplot(df, aes(Var1, Var2, fill=w)) + geom_raster() + scale_fill_viridis_c() + theme_minimal() } # spatial data, exponential covariance # phi=14, sigma^2=2 simdata <- rmeshedgp(coords, c(14, 2)) raster_plot(simdata) # spatial data, matern covariance # phi=14, nu=1, sigma^2=2 simdata <- rmeshedgp(coords, c(14, 1, 2)) raster_plot(simdata) # spacetime data, gneiting's covariance # 64000 locations stcoords <- expand.grid(xx, xx, seq(0, 1, length.out=10)) # it should take less than a couple of seconds simdata <- rmeshedgp(stcoords, c(1, 14, .5, 2)) # plot data at 7th time period raster_plot(simdata %>% dplyr::filter(Var3==unique(Var3)[7]))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.