knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of NPVecchia is to provide users with scalable Gaussian Process (GP) covariance estimation without assuming its form. We assume that the GP is de-trended (has zero mean) and that the covariance between points decreases with some kind of distance. It provides the functions to accompany an upcoming paper with Dr. Katzfuss.
TL;DR This method reorders the data and computes a sparse estimate Û of the inverse of the Cholesky of the (reordered) covariance matrix such that
.
ans <- run_npvecchia(data, locations) #output contains ans$u, ans$order
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("katzfuss-group/NPvecchia") # library(NPvecchia)
library(NPVecchia)
It is recommended to use the provided function "orderMaxMinFaster" to order the data in a max-min ordering based on Euclidean distance. Basically, this ordering starts at a point in the center, then adds points one at a time that maximize the minimum distance from all previous points in the ordering. While not required, the decay of the both the regression errors and coefficients (i.e. the prior) was specifically developed based on this ordering. There is also the option to order based on a user defined distance matrix (order_maximin_dist) or based on the sample correlation matrix (order_mm_tapered).
#number of locations, dimensions, samples n_locs <- 50 d <- 2 num_reps <- 5 #random locations and data locs <- matrix(runif(d*n_locs), nc = d) dataa <- matrix(rnorm(num_reps*n_locs), nc = n_locs) #Compute maximin ordering order1 <- orderMaxMinFaster(locs) #Correlation ordering order2 <- order_mm_tapered(locs, dataa) #Reorder data and location by correlation ordering dataa <- dataa[, order2] locs <- locs[order2, ]
The best previous neighbors (ordered from nearest to furthest) are also needed. As our method trims down the number of neighbors, it is advisable to initially compute the full neighbor matrix (or at least a large number of neighbors). find_nn computes the neighbors based on correlation (analogous to order_mm_tapered), and find_nn_dist is similarly analogous to order_maximin_dist.
#Correlation neighbors nearest_neighbors <- find_nn(locs, dataa, m = n_locs) #Euclidean neighbors nearest_neighbors2 <- find_nn_dist(fields::rdist(dataa), n_locs)
The method relies on three hyperparameters related to the decays of the regression coefficients and the marginal variances. One can use Bayesian methods, optimization, etc. to determine hyperparameters.
The easiest is to use ?optim to obtain good values of the hyperparameters and then get the maximum a posteriori (MAP) value for Û.
#initial thetas init_theta = c(1,-1,0) #Optimization, where one can change limits to match ones desired decay thetas_f <- optim(init_theta, minus_loglikeli_c, datum=dataa, NNarray = nearest_neighbors, method="L-BFGS-B", lower=-6, upper=4) #Get MAP uhat <- get_map(thetas_f$par, dataa, nearest_neighbors)
While no Metropolis sampler for the hyperparameters is provided, adaptMCMC::MCMC was used for our experiments and works well.
#A scale matrix similar to this worked well for our application scale_mat <- matrix(c(0.05,-0.04,0,-0.04,0.05,0,0,0,0.01),nc=3) #number of samples nruns <- 100 #Run the sampler samp <- adaptMCMC::MCMC(minus_loglikeli_c, datum = dataa, NNarray = nearest_neighbors, init=init_theta, negativ=FALSE, scale=scale_mat, adapt=TRUE, acc.rate=0.234, n= nruns) #Sample z ~ N(0, (UU')^{-1}) for each theta sampled <- apply(samp$samples,1,function(thetas_temp){ #Get priors from thetas ptt <- thetas_to_priors(thetas_temp, n_locs) #get number of neighbors m <- min(ncol(ptt[[3]]),ncol(nearest_neighbors)) #Get posteriors posts <- get_posts_c(dataa, ptt, nearest_neighbors[, 1:m]) #Get a sample sample1 <- samp_posts(posts, nearest_neighbors[, 1:m], bayesian = T, uhat = F) #return sample1 })
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.