parallel_analysis: Perform parallel analysis

View source: R/parallel_analysis.R

parallel_analysisR Documentation

Perform parallel analysis

Description

Parallel analysis based on permutations

Usage

parallel_analysis(X, perm = 999, fun = c("prcomp", "fastSVD", "shrink"))

Arguments

X

Matrix or data frame containing the original data (observations in rows, variables in columns).

perm

number of permutations

fun

function to use internally to obtain eigenvalues (see Details)

Details

The function allows performing parallel analysis, which is a way to test for the number of significant eigenvalues/axes in a PCA. In this implementation, a null distribution of eigenvalues is obtained by randomly permuting observations independently for each of the starting variables. To compute p values, the observed eigenvalues are compared to the corresponding eigenvalues from this null distribution.

Parallel analysis may be used for dimensionality reduction, retaining only the first block of consecutive significant axes. That is, if for example the first 3 axes were significant, then the fourth not significant, one would keep only the first 3 axes (regardless of significance of the axes from the fifth on). Similarly, if the first axis is not significant, this may suggest lack of a clear structure in the data.

The function internally employs three possible strategies to obtain eigenvalues (argument of fun):

  • "prcomp" - the function prcomp (default)

  • "fastSVD" - an approach based on the function fast.svd (requires the package corpcor)

  • "shrink" - a decomposition of the covariance matrix estimated using linear shrinkage (much slower, requires the package nlshrink; Ledoit & Wolf 2004)

This choice should not make much difference in terms of the final result. However, for consistency, it is a good idea to use for parallel analysis the same function used for the actual PCA (this is why these three options are provided).

Value

Vector of class "parallel_analysis" containing the p values for each of the axes of a PCA on the data provided

The object of class parallel_analysis returned by the function has a summary() method associated to it. This means that using summary() on an object created by this function, a suggestion on the number of significant axes (if any) is provided (see examples).

Citation

The most appropriate citation for this approach to parallel analysis (using permutations) is Buja & Eyuboglu (1992).

References

Ledoit O, Wolf M. 2004. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88:365-411.

Buja A, Eyuboglu N. 1992. Remarks on parallel analysis. Multivariate Behavioral Research 27(4):509-540.

Examples

set.seed(666)
X=MASS::mvrnorm(100, mu=rep(0, 50), Sigma=diag(50))
# Simulate a multivariate random normal dataset
# with 100 observations and 50 indipendent variables

PA=parallel_analysis(X, perm = 999, fun = "fastSVD")
# Perform parallel analysis

summary(PA)
# Look at a summary of the results from parallel analysis
# Notice that no axis is significant
# This is correct, as we had simulated data with no structure

print(PA)
# Look at the p values for each axis



fruciano/GeometricMorphometricsMix documentation built on Jan. 31, 2024, 6:24 a.m.