README.md

surfcov

The purpose of R package surfcov is to enable covariance estimation for random surfaces beyond separability, proposed in the papers arXiv:1912.12870 and arXiv:2007.12175.

Let X_1, \ldots, X_N be i.i.d. matrices of size K_1 \times K_2 representing discrete measurements (on a grid) of some latent random surfaces on a 2D domain, and C := cov(X_1) be the covariance operator. The covariance is a tensor of size K_1 \times K_2 \times K_1 \times K_2, which becomes problematic to handle for K_1 and K_2 as small as 100. The assumption of separability postulates that

C[i,j,i',j'] = C_1[i,i'] C_2[j,j'],

which reduces statistical and computational burden, but is often critized as an oversimplification, since it does not allow any interaction between the two dimensions.

This package allows for efficient estimation and subsequent manipulation of two alternative models, which are both strict generalizations of separability:

  1. the separable-plus-banded model: [i,j,i',j'] = A_1[i,i'] A_2[j,j'] + B[i,j,i',j'], where B[i,j,i',j'] = 0 for \|i-i'\| > d or \|j-j'\| > d;
  2. the R-separable model: C[i,j,i',j'] = A_1[i,i']B_1[j,j'] + \ldots + A_R[i,i']B_R[j,j'].

When the data are stored in form of an array X of size N \times K_1 \times K_2, it can be simply checked whether a given model can be potentially useful by running

  1. spb(X) for the separable-plus-banded model; or
  2. scd(X) for the R-separable model.

Installation

You can install the development version from GitHub with:

install.packages("devtools") # only if devtools is not yet installed
library(devtools)
install_github("TMasak/surfcov")

Examples

When the data are stored in form of an array X of size N \times K_1 \times K_2, it can be simply checked whether a given model can be potentially useful by running

In particular,

# X <- array(runif(100*20*30),c(100,20,30))
spb_est <- spb(X)
spb$d

Run cross-validation (CV) in order to pick a value of the bandwidth d, and then fits the model with the best d found. If a value larger than 0 is returned, it means that a separable-plus-banded model can fit the data better compared to a separable model. If we instead of fit-based CV decide to used prediction-based CV, it makes sense to check the gains in prediction:

# X <- array(runif(100*20*30),c(100,20,30))
spb_est <- spb(X,predict=T)
spb$d
spb$cv

The same can be done with the R-separable model, replacing function spb by scd, see ?spb and ?scd for a more detailed description and examples. Note that by default, the mean is always estimated empirically, unless other estimator of the mean is provided.

Validity of a separable-plus-model can also be checked using a bootstrap test, e.g. by

test_spb(X,d=1)

When one of the models is fitted, a list of functions that can be useful to the user for subsequent manipulation of the estimated covariances is as follows:

To apply the separable-plus-banded model fast, see to_book_format and BXfast. TODO:



TMasak/surfcov documentation built on April 25, 2022, 12:15 a.m.