Description Usage Arguments Details Value Author(s) References Examples
Bayesian linear multivariate spatial regression using SpamTrees.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | spamtree(y, x, coords,
mv_id = rep(1, length(y)),
cell_size = 25,
K = rep(2, ncol(coords)),
start_level = 0,
tree_depth = Inf,
last_not_reference = TRUE,
limited_tree = FALSE,
cherrypick_same_margin = TRUE,
cherrypick_group_locations = TRUE,
mvbias = 0,
mcmc = list(keep = 1000, burn = 0, thin = 1),
num_threads = 4,
verbose = FALSE,
settings = list(adapting = TRUE, mcmcsd = 0.01,
debug = FALSE, printall = FALSE),
prior = list(set_unif_bounds = NULL,
btmlim = NULL, toplim = NULL, vlim = NULL),
starting = list(beta = NULL, tausq = NULL, theta = NULL, w = NULL),
debug = list(sample_beta = TRUE, sample_tausq = TRUE,
sample_theta = TRUE, sample_w = TRUE,
sample_predicts = TRUE)
)
|
y |
vector of outcomes of size n. Correspondingly, |
x |
matrix of covariates with dimension (n, p). |
coords |
matrix of coordinates with dimension (n, 2). |
mv_id |
integer vector of outcome IDs of size n with values in \{1, …, q\}. |
cell_size |
integer number of knots for each node in the treed DAG. Defaults to 25. This is a target number and some nodes may include more or less locations. Here, knots can only be chosen among observed locations. |
K |
integer vector of dimension 2 indicating the number of intervals for axis-parallel recursive partitioning. Each tree level will thus have |
start_level |
integer indicating the root level. Example: |
tree_depth |
integer indicating the number of branching steps in the tree. Defaults to |
last_not_reference |
bool indicating whether to treat the last level of the tree as a reference set. The default value |
limited_tree |
bool determining whether to use a recursive tree. If |
cherrypick_same_margin |
bool used only for multivariate outcomes. This determines how to assign parents to leaf nodes. In a SpamTree, outcome j at a new spatial location is assigned the same parent of its nearest neighbor. If |
cherrypick_group_locations |
bool used in multivariate settings to determine whether the allocation of knots to tree nodes should treat the q outcomes at location s \in D as either (1) a q dimensional vector observed at 1 location, or (2) one observed outcome at each of q locations (i.e. same spatial location but different outcome index). |
mvbias |
parameter used in settings of multivariate misalignment in which one or more outcomes are observed at a number of locations that is much smaller than others. |
mcmc |
list for setting up MCMC. |
num_threads |
integer number of OpenMP threads to use within MCMC. Ineffective if source is compiled without OpenMP support. |
verbose |
level of verbosity. All messages are suppressed if |
settings |
list with additional settings. |
prior |
setup for prior on θ, which currently only allows to specify the support of independent uniform distributions. See examples. (subject to change). |
starting |
list with starting values for all unknowns. Compatibility checks with |
debug |
list with debug settings. Can be used to turn off parts of MCMC. |
This implements the following model (in stacked vector form):
y = X β + w + ε,
where y is a n-dimensional vector of outcomes, X is a matrix of covariates of dimension (n, p) associated to coefficients β, w is a n-dimensional vector storing the realization of a spatial multivariate Gaussian tree w(\cdot) \sim SpamTree_{G}(0, C_{θ}) where G is a treed directed acyclic graph, and where C_{θ}(s, s') is a matrix-valued non-separable cross-covariance function on latent dimensions (see Peruzzi and Dunson (2021), equation 18, and CrossCovarianceAG10
) where θ is a vector of unknown parameters.
SpamTrees Gaussian processes are a scalable alternative to a spatial multivariate GP. Conditional independence across domain locations is assumed to be determined by the treed graph G, whose sparsity enables more efficient computations for the Gibbs sampler computed with this function. The graph architecture can be customized using inputs of the spamtree
function. The example below computes SpamTrees on univariate data. A vignette exists with bivariate misaligned spatial data.
coords |
reordered spatial coordinates |
coordsinfo |
reordered spatial coordinates plus partitioning information. |
mv_id |
reordered outcome IDs. |
w_mcmc |
posterior sample of the spatial random effect. This is a list of length |
yhat_mcmc |
posterior predictive sample. This is a list of length |
beta_mcmc |
array of size |
tausq_mcmc |
matrix with posterior samples of the q nuggets, one for each outcome. |
theta_mcmc |
matrix with posterior samples of the cross-covariance parameters. These include the latent distance between outcomes which may be poorly identifiable. |
mcmc_time |
elapsed clock time for MCMC. |
Michele Peruzzi michele.peruzzi@duke.edu,
David B. Dunson dunson@duke.edu
Peruzzi, M. and Dunson, D. B. (2021) Spatial Multivariate Trees for Big Data Bayesian Regression. https://arxiv.org/abs/2012.00943
Vihola, M. (2012) Robust adaptive Metropolis algorithm with coerced acceptance rate. Statistics and Computing, 22:997-1008. doi: 10.1007/s11222-011-9269-5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | # toy example with tiny dataset and short MCMC
# on a univariate outcome
library(magrittr)
library(dplyr)
library(ggplot2)
library(spamtree)
set.seed(2021)
SS <- 15
n <- SS^2 # total n. locations, including missing ones
coords <- data.frame(Var1=runif(n), Var2=runif(n)) %>%
as.matrix()
# generate data
sigmasq <- 2.3
phi <- 6
tausq <- .1
B <- c(-1,.5,1)
CC <- sigmasq * exp(-phi * as.matrix(dist(coords)))
LC <- t(chol(CC))
w <- LC %*% rnorm(n)
p <- length(B)
X <- rnorm(n * p) %>% matrix(ncol=p)
y <- X %*% B + w + tausq^.5 * rnorm(n)
set_missing <- rbinom(n, 1, 0.1)
simdata <- data.frame(coords,
y_full = y,
w_latent = w) %>%
mutate(y_observed = ifelse(set_missing==1, NA, y_full))
# MCMC setup
mcmc_keep <- 300
mcmc_burn <- 300
mcmc_thin <- 1
ybar <- mean(y, na.rm=TRUE)
# fit spamtree with defaults
spamtree_done <- spamtree(y - ybar, X, coords,
mcmc = list(keep=mcmc_keep, burn=mcmc_burn, thin=mcmc_thin),
num_threads = 1)
# predictions
y_out <- spamtree_done$yhat_mcmc %>%
abind::abind(along=3) %>% `[`(,1,) %>%
add(ybar) %>% apply(1, mean)
w_out <- spamtree_done$w_mcmc %>%
abind::abind(along=3) %>% `[`(,1,) %>%
apply(1, mean)
outdf <- spamtree_done$coordsinfo %>%
cbind(data.frame(w_spamtree = w_out,
y_spamtree = y_out)) %>%
left_join(simdata)
# plot predictions
pred_plot <- outdf %>%
ggplot(aes(Var1, Var2, color=y_spamtree)) +
geom_point() +
scale_color_viridis_c()
# plot latent process
latent_plot <- outdf %>%
ggplot(aes(Var1, Var2, color=w_spamtree)) +
geom_point() +
scale_color_viridis_c()
# estimation of regression coefficients
plot(density(spamtree_done$beta_mcmc[1,,1]))
abline(v=B[1], col="red")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.