whiten: Whiten Data Matrix

View source: R/whiten.R

whitenR Documentation

Whiten Data Matrix

Description

whiten whitens a data matrix X using the empirical covariance matrix cov(X) as basis for computing the whitening transformation.

Usage

  whiten(X, center=FALSE, method=c("ZCA", "ZCA-cor", "PCA", "PCA-cor", "Cholesky"))

Arguments

X

Data matrix, with samples in rows and variables in columns.

center

Center columns to mean zero.

method

Determines the type of whitening transformation (see Details).

Details

The following whitening approaches can be selected:

method="ZCA" and method="ZCA-cov": ZCA whitening, also known as Mahalanobis whitening, ensures that the average covariance between whitened and orginal variables is maximal.

method="ZCA-cor": Likewise, ZCA-cor whitening leads to whitened variables that are maximally correlated (on average) with the original variables.

method="PCA" and method="PCA-cov": In contrast, PCA whitening lead to maximally compressed whitened variables, as measured by squared covariance.

method="PCA-cor": PCA-cor whitening is similar to PCA whitening but uses squared correlations.

method="Cholesky": computes a whitening matrix by applying Cholesky decomposition. This yields both a lower triangular positive diagonal whitening matrix and lower triangular positive diagonal loadings (cross-covariance and cross-correlation).

Note that Cholesky whitening depends on the ordering of input variables. In the convention used here the first input variable is linked with the first latent variable only, the second input variable is linked to the first and second latent variable only, and so on, and the last variable is linked to all latent variables.

ZCA-cor whitening is implicitely employed in computing CAT and CAR scores used for variable selection in classification and regression, see the functions catscore in the sda package and carscore in the care package.

In both PCA and PCA-cor whitening there is a sign-ambiguity in the eigenvector matrices. In order to resolve the sign-ambiguity we use eigenvector matrices with a positive diagonal so that PCA and PCA-cor cross-correlations and cross-covariances have a positive diagonal for the given ordering of the original variables.

For details see Kessy, Lewin, and Strimmer (2018).

Canonical correlation analysis (CCA) can also be understood as a special form of whitening (also implemented in this package).

Value

whiten returns the whitened data matrix Z = X W^T.

Author(s)

Korbinian Strimmer (https://strimmerlab.github.io) with Agnan Kessy and Alex Lewin.

References

Kessy, A., A. Lewin, and K. Strimmer. 2018. Optimal whitening and decorrelation. The American Statistician. 72: 309-314. <DOI:10.1080/00031305.2016.1277159>

See Also

whiteningMatrix, whiteningLoadings, scca.

Examples

# load whitening library
library("whitening")

######

# example data set
# E. Anderson. 1935.  The irises of the Gaspe Peninsula.
# Bull. Am. Iris Soc. 59: 2--5
data("iris")
X = as.matrix(iris[,1:4])
d = ncol(X) # 4
n = nrow(X) # 150
colnames(X) # "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"

# whitened data
Z.ZCAcor = whiten(X, method="ZCA-cor")

# check covariance matrix
zapsmall( cov(Z.ZCAcor) )

whitening documentation built on June 7, 2022, 5:10 p.m.

Related to whiten in whitening...