In TMasak/surfcov: Random Surfaces' Covariance Estimation beyond Separability

knitr::opts_chunk$set(echo = TRUE)

surfcov

The purpose of R package surfcov is to enable covariance estimation for random surfaces beyond separability, proposed in the papers arXiv:1912.12870 and arXiv:2007.12175.

Let $X_1, \ldots, X_N$ be i.i.d. matrices of size $K_1 \times K_2$ representing discrete measurements (on a grid) of some latent random surfaces on a 2D domain, and $C := cov(X_1)$ be the covariance operator. The covariance is a tensor of size $K_1 \times K_2 \times K_1 \times K_2$, which becomes problematic to handle for $K_1$ and $K_2$ as small as 100. The assumption of separability postulates that

[ C[i,j,i',j'] = C_1[i,i'] C_2[j,j'], ]

which reduces statistical and computational burden, but is often critized as an oversimplification, since it does not allow any interaction between the two dimensions.

This package allows for efficient estimation and subsequent manipulation of two alternative models, which are both strict generalizations of separability:

the separable-plus-banded model: $[i,j,i',j'] = A_1[i,i'] A_2[j,j'] + B[i,j,i',j']$, where $B[i,j,i',j'] = 0$ for $|i-i'| > d$ or $|j-j'| > d$;
the $R$-separable model: $C[i,j,i',j'] = A_1[i,i']B_1[j,j'] + \ldots + A_R[i,i']B_R[j,j']$.

When the data are stored in form of an array X of size $N \times K_1 \times K_2$, it can be simply checked whether a given model can be potentially useful by running

spb(X) for the separable-plus-banded model; or
scd(X) for the $R$-separable model.

Installation

You can install the development version from GitHub with:

install.packages("devtools") # only if devtools is not yet installed
library(devtools)
install_github("TMasak/surfcov")

Examples

When the data are stored in form of an array X of size $N \times K_1 \times K_2$, it can be simply checked whether a given model can be potentially useful by running

spb(X) for the separable-plus-banded model; or
scd(X) for the $R$-separable model.

In particular,

# X <- array(runif(100*20*30),c(100,20,30))
spb_est <- spb(X)
spb$d

Run cross-validation (CV) in order to pick a value of the bandwidth $d$, and then fits the model with the best $d$ found. If a value larger than 0 is returned, it means that a separable-plus-banded model can fit the data better compared to a separable model. If we instead of fit-based CV decide to used prediction-based CV, it makes sense to check the gains in prediction:

# X <- array(runif(100*20*30),c(100,20,30))
spb_est <- spb(X,predict=T)
spb$d
spb$cv

The same can be done with the $R$-separable model, replacing function spb by scd, see ?spb and ?scd for a more detailed description and examples. Note that by default, the mean is always estimated empirically, unless other estimator of the mean is provided.

Validity of a separable-plus-model can also be checked using a bootstrap test, e.g. by

test_spb(X,d=1)

When one of the models is fitted, a list of functions that can be useful to the user for subsequent manipulation of the estimated covariances is as follows:

adi() solves the inverse problem involving a separable-plus-banded covariance
pcg() solves the inverse problem involving an $R$-separable covariance
apply_lhs() applies fast an $R$-separable covariance

To apply the separable-plus-banded model fast, see to_book_format and BXfast. TODO:

a handle for a fast application of the separable-plus-banded model
a handle for best linear unbiased predictor (BLUP) using the separable-plus-banded or $R$-separable model.

TMasak/surfcov documentation built on April 25, 2022, 12:15 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com