View source: R/parallel_analysis.R
parallel_analysis | R Documentation |
Parallel analysis based on permutations
parallel_analysis(X, perm = 999, fun = c("prcomp", "fastSVD", "shrink"))
X |
Matrix or data frame containing the original data (observations in rows, variables in columns). |
perm |
number of permutations |
fun |
function to use internally to obtain eigenvalues (see Details) |
The function allows performing parallel analysis, which is a way to test for the number of significant eigenvalues/axes in a PCA. In this implementation, a null distribution of eigenvalues is obtained by randomly permuting observations independently for each of the starting variables. To compute p values, the observed eigenvalues are compared to the corresponding eigenvalues from this null distribution.
Parallel analysis may be used for dimensionality reduction, retaining only the first block of consecutive significant axes. That is, if for example the first 3 axes were significant, then the fourth not significant, one would keep only the first 3 axes (regardless of significance of the axes from the fifth on). Similarly, if the first axis is not significant, this may suggest lack of a clear structure in the data.
The function internally employs three possible strategies to obtain eigenvalues (argument of fun):
"prcomp" - the function prcomp (default)
"fastSVD" - an approach based on the function fast.svd (requires the package corpcor)
"shrink" - a decomposition of the covariance matrix estimated using linear shrinkage (much slower, requires the package nlshrink; Ledoit & Wolf 2004)
This choice should not make much difference in terms of the final result. However, for consistency, it is a good idea to use for parallel analysis the same function used for the actual PCA (this is why these three options are provided).
Vector of class "parallel_analysis" containing the p values for each of the axes of a PCA on the data provided
The object of class parallel_analysis returned by the function has a summary() method associated to it. This means that using summary() on an object created by this function, a suggestion on the number of significant axes (if any) is provided (see examples).
The most appropriate citation for this approach to parallel analysis (using permutations) is Buja & Eyuboglu (1992).
Ledoit O, Wolf M. 2004. A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis 88:365-411.
Buja A, Eyuboglu N. 1992. Remarks on parallel analysis. Multivariate Behavioral Research 27(4):509-540.
set.seed(666)
X=MASS::mvrnorm(100, mu=rep(0, 50), Sigma=diag(50))
# Simulate a multivariate random normal dataset
# with 100 observations and 50 indipendent variables
PA=parallel_analysis(X, perm = 999, fun = "fastSVD")
# Perform parallel analysis
summary(PA)
# Look at a summary of the results from parallel analysis
# Notice that no axis is significant
# This is correct, as we had simulated data with no structure
print(PA)
# Look at the p values for each axis
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.