permut_pc_test: permutation test of PCA

View source: R/user_functions.R

permut_pc_testR Documentation

permutation test of PCA

Description

Compute a nonparametric permutation test for a PCA solution. Two options are possible: hypothesis testing of the total variance accounted for (VAF) for each component, and hypothesis testing of the standardized loadings and communalities.

Usage

permut_pc_test(
  pca,
  pca_data,
  P = 1000,
  ndim = 3,
  statistic = "VAF",
  conf = 0.95,
  adj.method = "BH",
  perm.method = "permV",
  inParallel = F,
  n_cores = 2,
  inParallel_extra = NULL
)

Arguments

pca

Object of class prcomp, princals.

pca_data

Data passed to the prcomp or princals function.

P

Numeric. Number of permutations to run calling the permuted_pca function. Default=1000

ndim

Numeric. Number of PCs (1 to ndim) to run the analysis on. D

statistic

Character. Determines the statistic to compute. Possible values are Variance accounted for ("VAF") or the standardized loadings ("s.loadings"). Default="VAF"

conf

Numeric. Level of confidence region for the confidence interval. E.g. 0.95 generates 95CI. Default=0.95

adj.method

Character passed to the stats::p.adjust() to adjust the p value for multiple comparisons. See ?p.adjust.methods. Default="BH"

perm.method

Character determining the permutation method to use as in Linting et al., 2011. "permD" (Buja & Eyuboglu, 1992; Linting et al., 2011) where variables as permuted independently and concomitantly (Fig. 2A) as opposite of the "permV" permutation strategy (Linting et al., 2011) where variables are permuted one at the time. If statistic is set to "VAF", perm.method "permD" will be used. Default="permV".

inParallel

Logical. Whether to run the function in parallel using pbapply::pblapply. In window machines, parallelization is done through parallel::parLapply. In Unix machines, parallelization is done through parallel::mclapply. See ?pblapply for details. Default = F.

n_cores

Numeric. Number of cores to use. Available cores can be obtained by parallel::detectCores(). It is recommended to use less cores than available. Default = 2.

inParallel_extra

Character or vector Character with the string name of additional objects passed to the clusters when performing parallelization. This might be needed when the function to run the original PCA has been called with external objects. E.g., princals(..., ndim = ncol(dataFrame)), where dataFrame is a data frame object. In that case inParallel_extra = "dataFrame".

Details

Nonparametric permutation for hypothesis testing of the VAF of component, the loadings or communalities have been studied (see refs). The hypothesis test is defined as: H(null): PC metric (either VAF or loading) is indistinguishable from a random generation H(alternative): PC metric (either VAF or loading) is different from random The null distribution is generated by permuting the values of each variable several times (P) and re-running the PCA on each permuted sample. Confidence intervals of the permuted distribution (null distribution) are calculated using the percentile method. The p values are calculated as p = ((q+1))⁄((P+1)), where q is the number of times the chosen metric is higher in the permuted distribution than in the original PCA solution and P is the number of permutations. The user should note that the lowest p value that can be calculated is dependent on P. As an example, if P is set to a value of 10 (a relatively low value), the smallest p value that can be detected is 0.09, considering q=0. Accordingly, P should be set high enough to reach the desired floor p value. By default, we have set the number of permutations to 1000 (smallest p value approximately equal to 0.001 as a result) as this has been shown to be high enough for approximating the null distribution in most cases.

Permutation test of the loadings as in (Buja & Eyuboglu, 1992; Peres-Neto et al., 2003) that can serve to determine the loading threshold, where the variables are permuted simultaneously and concomitantly. Linting et al., designed and tested an strategy where only one variable is permuted at the time, showing great results in determining the contribution of variables using communalities (Linting et al., 2011). This method has resulted in better determination of the significant contribution of variables on the PCA solution with higher statistical power and proper type I error, and therefore has been incorporated in the package as the suggested method for loadings and communalities. Following Linting et al., terminology, user can specify the permutation strategy for the loadings as one variable at the time (permV, as in Linting et al., 2011) or as all the variable together (permD, as in Buja & Eyuboglu, 1992; Peres-Neto et al., 2003).

Value

List of class "syndromics" containing the following objects

methods

Character for the used method to obtain the object. "permutation"

statistic

Statistic use in the permutation analysis

perm.methods

Character for the used permutation method

ndim

Value specified in the ndim argument

conf

Confidence level used to compute CIs

per_sample

List of P loading matrices for the P permuted samples

adj.method

Character for the used method for adjusting p values

pca

Contains the object passed to the pca argument

pca_data

Contains the object passed to the pca_data argument

results

Object containing the results of the analysis

Author(s)

Abel Torres Espin

References

  1. Buja A, Eyuboglu N. Remarks on Parallel Analysis. Multivar Behav Res. 1992 Oct 1;27(4):509–40

  2. Linting M, van Os BJ, Meulman JJ. Statistical Significance of the Contribution of Variables to the PCA solution: An Alternative Permutation Strategy. Psychometrika. 2011 Jul 1;76(3):440–60

Examples

data(mtcars)
pca_mtcars<-prcomp(mtcars, center = TRUE, scale. = TRUE)

pca_mtcars_perm<-permut_pc_test(pca = pca_mtcars, pca_data = mtcars, ndim = 3, P = 500)
plot(pca_mtcars_perm, plot_resample= TRUE)


ucsf-ferguson-lab/syndRomics documentation built on June 26, 2022, 5:36 p.m.