spmeshed | R Documentation |
Fits Bayesian multivariate spatial or spatiotemporal regression models with latent MGPs via Markov chain Monte Carlo.
spmeshed(y, x, coords, k=NULL, family = "gaussian", axis_partition = NULL, block_size = 30, grid_size = NULL, grid_custom = NULL, n_samples = 1000, n_burnin = 100, n_thin = 1, n_threads = 4, verbose = 0, predict_everywhere = FALSE, settings = list(adapting=TRUE, forced_grid=NULL, cache=NULL, ps=TRUE, saving=TRUE, low_mem=FALSE, hmc=4), prior = list(beta=NULL, tausq=NULL, sigmasq = NULL, phi=NULL, a=NULL, nu = NULL, toplim = NULL, btmlim = NULL, set_unif_bounds=NULL), starting = list(beta=NULL, tausq=NULL, theta=NULL, lambda=NULL, v=NULL, a=NULL, nu = NULL, mcmcsd=.05, mcmc_startfrom=0), debug = list(sample_beta=TRUE, sample_tausq=TRUE, sample_theta=TRUE, sample_w=TRUE, sample_lambda=TRUE, verbose=FALSE, debug=FALSE), indpart=FALSE )
y |
matrix of multivariate outcomes with n rows and q columns. Each row of |
x |
matrix of covariates with n rows and p columns. |
coords |
matrix of coordinates with n rows and d=2 or d=3 columns for spatial or spacetime regression, respectively. |
k |
integer k≤q q, number of latent processes to use for the linear model of coregionalization. If unspecified, this is set to q |
family |
a vector with length 1 or q whose elements corresponds to the data types of columns of |
axis_partition |
integer vector of size d: number of intervals each coordinate axis is split into |
block_size |
integer approximate size of the blocks after domain partitioning. Only used if |
grid_size |
integer vector of size d: number of 'knots' of the reference grid along each axis.
This grid is then partitioned using either |
grid_custom |
list with elements |
n_samples |
integer number of MCMC samples at which all the unknowns are stored (including the latent effects). |
n_burnin |
integer number of MCMC samples to discard at the beginning of the chain. |
n_thin |
integer thinning parameter for the MCMC chain. Only the chain of latent effects (w) is thinned to save memory in big data problems. Chains for other unknowns are not thinned and thus will be of length |
n_threads |
integer number of OpenMP threads. This is ineffective if |
verbose |
integer. If |
predict_everywhere |
bool used if settings$forced_grid=T. Should predictions be made at the reference grid locations? If not, predictions will be made only at the supplied NA values of Y. |
settings |
list: |
prior |
list: setup for priors of unknown parameters. |
starting |
list: setup for starting values of unknown parameters. |
debug |
list: setup for debugging things. Some parts of MCMC can be turned off here. |
indpart |
bool defaults to |
This function targets the following model:
y(s) = x(s)^\top β + Λ v(s) + ε(s),
where y(s) is a q-dimensional vector of outcomes at spatial location s, x(s) is a p-dimensional vector of covariates with static coefficients β, Λ is a matrix of factor loadings of size (q, k), v(s) is a k-dimensional vector which collects the realization of independent Gaussian processes v_j \sim spmeshed(0, C_j) for j=1, …, k and where C_j(s, s') is a correlation function. s is a coordinate in space (d=2) or space plus time (d=3). The Meshed GP implemented here associates an axis-parallel tessellation of the domain to a cubic directed acyclic graph (mesh).
coordsdata |
data.frame including the original n coordinates plus the n_g knot coordinates if the model was run on a forced grid. The additional column |
savedata |
Available if |
yhat_mcmc |
list of length |
v_mcmc |
list of length |
w_mcmc |
list of length |
lp_mcmc |
list of length |
beta_mcmc |
array of size |
tausq_mcmc |
matrix of size |
theta_mcmc |
array of size |
lambda_mcmc |
array of size |
paramsd |
Cholesky factorization of the proposal covariance for adaptive MCMC, after adaptation. |
mcmc |
Total number of MCMC iterations performed. |
mcmc_time |
Time in seconds taken for MCMC (not including preprocessing). |
Michele Peruzzi michele.peruzzi@duke.edu
Peruzzi, M., Banerjee, S., and Finley, A.O. (2022) Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains. Journal of the American Statistical Association, 117(538):969-982. doi: 10.1080/01621459.2020.1833889
Peruzzi, M., Banerjee, S., Dunson, D.B., and Finley, A.O. (2021) Grid-Parametrize-Split (GriPS) for Improved Scalable Inference in Spatial Big Data Analysis. https://arxiv.org/abs/2101.03579
Peruzzi, M. and Dunson, D.B. (2022) Spatial meshing for general Bayesian multivariate models. https://arxiv.org/abs/2201.10080
# toy example with tiny dataset and short MCMC # on a univariate outcome library(magrittr) library(dplyr) library(ggplot2) library(meshed) set.seed(2021) SS <- 12 n <- SS^2 # total n. locations, including missing ones coords <- expand.grid(xx <- seq(0,1,length.out=SS), xx) %>% as.matrix() # generate data sigmasq <- 2.3 phi <- 6 tausq <- .1 B <- c(-1,.5,1) CC <- sigmasq * exp(-phi * as.matrix(dist(coords))) LC <- t(chol(CC)) w <- LC %*% rnorm(n) p <- length(B) X <- rnorm(n * p) %>% matrix(ncol=p) y_full <- X %*% B + w + tausq^.5 * rnorm(n) set_missing <- rbinom(n, 1, 0.1) simdata <- data.frame(coords, y_full = y_full, w_latent = w) %>% mutate(y_observed = ifelse(set_missing==1, NA, y_full)) # MCMC setup mcmc_keep <- 500 mcmc_burn <- 100 mcmc_thin <- 2 y <- simdata$y_observed ybar <- mean(y, na.rm=TRUE) meshout <- spmeshed(y-ybar, X, coords, axis_partition=c(4,4), n_samples = mcmc_keep, n_burn = mcmc_burn, n_thin = mcmc_thin, prior=list(phi=c(1,15)), verbose = 0, n_threads = 1) # posterior means best_post_mean <- meshout$beta_mcmc %>% apply(1:2, mean) # process means wmesh <- data.frame(w_mgp = meshout$w_mcmc %>% summary_list_mean()) # predictions ymesh <- data.frame(y_mgp = meshout$yhat_mcmc %>% summary_list_mean()) outdf <- meshout$coordsdata %>% cbind(ymesh, wmesh) # plot predictions pred_plot <- outdf %>% ggplot(aes(Var1, Var2, color=y_mgp)) + geom_point() + scale_color_viridis_c() # plot latent process latent_plot <- outdf %>% ggplot(aes(Var1, Var2, color=w_mgp)) + geom_point() + scale_color_viridis_c() # estimation of regression coefficients plot(density(meshout$beta_mcmc[1,1,])) abline(v=B[1], col="red")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.