Test for association between the observed data and their systematic patterns of variations. The jackstraw enables statistical testing for association between observed variables and latent variables, as captured by principal component analysis (PCA), factor analysis (FA), or other estimates. Similarly, unsupervised clustering, such as K-means clustering, partition around medoids (PAM), and others, finds subpopulations among the observed variables. The jackstraw estimates statistical significance of cluster membership, including unsupervised evaluation of cell identities in single cell RNA-seq. P-values and posterior probabilities allows one to rigorously evaluate the strength of cluster membership assignments. See the GitHub repository for the latest developments and further helps.
The jackstraw package provides a resampling strategy and testing scheme to estimate statistical significance of association between the observed data and their latent variables. Depending on the data type and the analysis aim, the latent variables may be estimated by principal component analysis, K-means clustering, and related algorithms. The jackstraw methods learn over-fitting characteristics inherent in this circular analysis, where the observed data are used to estimate the latent variables and to again test against the estimated latent variables.
The jackstraw tests enable us to identify the data features (i.e., variables or observations) that are
driving systematic variation, in a completely unsupervised manner. Using jackstraw_pca, we can
find statistically significant features with regard to the top
r principal components.
Alternatively, jackstraw_kmeans can identify the data features that are statistically significant
members of the data-dependent clusters. Furthermore, this package includes more general algorithms such as
jackstraw_subspace for the dimension reduction techniques and jackstraw_cluster for the clustering algorithms.
Overall, it computes
m p-values of association between the
m data features and their corresponding latent variables.
m p-values, pip computes posterior inclusion probabilities, that are useful for feature selection and visualization.
Neo Christopher Chung firstname.lastname@example.org
Chung and Storey (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31(4): 545-554 https://academic.oup.com/bioinformatics/article/31/4/545/2748186
Chung (2020) Statistical significance of cluster membership for unsupervised evaluation of cell identities https://academic.oup.com/bioinformatics/article/36/10/3107/5788523
jackstraw_pca jackstraw_subspace jackstraw_kmeans jackstraw_cluster
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.