permut_pc_test: permutation test of PCA
In ucsf-ferguson-lab/syndRomics: Component loading interpretation and visualization with syndromic plots

permut_pc_test

R Documentation

permutation test of PCA

Description

Compute a nonparametric permutation test for a PCA solution. Two options are possible: hypothesis testing of the total variance accounted for (VAF) for each component, and hypothesis testing of the standardized loadings and communalities.

Usage

permut_pc_test(
  pca,
  pca_data,
  P = 1000,
  ndim = 3,
  statistic = "VAF",
  conf = 0.95,
  adj.method = "BH",
  perm.method = "permV",
  inParallel = F,
  n_cores = 2,
  inParallel_extra = NULL
)

Arguments

`pca`	Object of class prcomp, princals.
`pca_data`	Data passed to the prcomp or princals function.
`P`	Numeric. Number of permutations to run calling the permuted_pca function. Default=1000
`ndim`	Numeric. Number of PCs (1 to ndim) to run the analysis on. D
`statistic`	Character. Determines the statistic to compute. Possible values are Variance accounted for ("VAF") or the standardized loadings ("s.loadings"). Default="VAF"
`conf`	Numeric. Level of confidence region for the confidence interval. E.g. 0.95 generates 95CI. Default=0.95
`adj.method`	Character passed to the stats::p.adjust() to adjust the p value for multiple comparisons. See ?p.adjust.methods. Default="BH"
`perm.method`	Character determining the permutation method to use as in Linting et al., 2011. "permD" (Buja & Eyuboglu, 1992; Linting et al., 2011) where variables as permuted independently and concomitantly (Fig. 2A) as opposite of the "permV" permutation strategy (Linting et al., 2011) where variables are permuted one at the time. If statistic is set to "VAF", perm.method "permD" will be used. Default="permV".
`inParallel`	Logical. Whether to run the function in parallel using pbapply::pblapply. In window machines, parallelization is done through parallel::parLapply. In Unix machines, parallelization is done through parallel::mclapply. See ?pblapply for details. Default = F.
`n_cores`	Numeric. Number of cores to use. Available cores can be obtained by parallel::detectCores(). It is recommended to use less cores than available. Default = 2.
`inParallel_extra`	Character or vector Character with the string name of additional objects passed to the clusters when performing parallelization. This might be needed when the function to run the original PCA has been called with external objects. E.g., princals(..., ndim = ncol(dataFrame)), where dataFrame is a data frame object. In that case inParallel_extra = "dataFrame".

Details

Nonparametric permutation for hypothesis testing of the VAF of component, the loadings or communalities have been studied (see refs). The hypothesis test is defined as: H(null): PC metric (either VAF or loading) is indistinguishable from a random generation H(alternative): PC metric (either VAF or loading) is different from random The null distribution is generated by permuting the values of each variable several times (P) and re-running the PCA on each permuted sample. Confidence intervals of the permuted distribution (null distribution) are calculated using the percentile method. The p values are calculated as p = ((q+1))⁄((P+1)), where q is the number of times the chosen metric is higher in the permuted distribution than in the original PCA solution and P is the number of permutations. The user should note that the lowest p value that can be calculated is dependent on P. As an example, if P is set to a value of 10 (a relatively low value), the smallest p value that can be detected is 0.09, considering q=0. Accordingly, P should be set high enough to reach the desired floor p value. By default, we have set the number of permutations to 1000 (smallest p value approximately equal to 0.001 as a result) as this has been shown to be high enough for approximating the null distribution in most cases.

Permutation test of the loadings as in (Buja & Eyuboglu, 1992; Peres-Neto et al., 2003) that can serve to determine the loading threshold, where the variables are permuted simultaneously and concomitantly. Linting et al., designed and tested an strategy where only one variable is permuted at the time, showing great results in determining the contribution of variables using communalities (Linting et al., 2011). This method has resulted in better determination of the significant contribution of variables on the PCA solution with higher statistical power and proper type I error, and therefore has been incorporated in the package as the suggested method for loadings and communalities. Following Linting et al., terminology, user can specify the permutation strategy for the loadings as one variable at the time (permV, as in Linting et al., 2011) or as all the variable together (permD, as in Buja & Eyuboglu, 1992; Peres-Neto et al., 2003).

Value

List of class "syndromics" containing the following objects

methods: Character for the used method to obtain the object. "permutation"
statistic: Statistic use in the permutation analysis
perm.methods: Character for the used permutation method
ndim: Value specified in the ndim argument
conf: Confidence level used to compute CIs
per_sample: List of P loading matrices for the P permuted samples
adj.method: Character for the used method for adjusting p values
pca: Contains the object passed to the pca argument
pca_data: Contains the object passed to the pca_data argument
results: Object containing the results of the analysis

Author(s)

Abel Torres Espin

References

Buja A, Eyuboglu N. Remarks on Parallel Analysis. Multivar Behav Res. 1992 Oct 1;27(4):509–40
Linting M, van Os BJ, Meulman JJ. Statistical Significance of the Contribution of Variables to the PCA solution: An Alternative Permutation Strategy. Psychometrika. 2011 Jul 1;76(3):440–60

Examples

data(mtcars)
pca_mtcars<-prcomp(mtcars, center = TRUE, scale. = TRUE)

pca_mtcars_perm<-permut_pc_test(pca = pca_mtcars, pca_data = mtcars, ndim = 3, P = 500)
plot(pca_mtcars_perm, plot_resample= TRUE)

ucsf-ferguson-lab/syndRomics documentation built on June 12, 2025, 3:04 p.m.