Description Usage Arguments Details Value Examples
This function computes reference PAC scores from simulated or permuted data based on an input matrix.
1 2 3 4 5 |
dat |
Probe by sample omic data matrix. Data should be filtered and normalized prior to analysis. |
max_k |
Integer specifying the maximum cluster number to evaluate.
Default is |
ref_method |
How should null data be generated? Options include |
B |
Number of reference datasets to generate. |
reps |
Number of subsamples to draw for consensus clustering. |
distance |
Distance metric for clustering. Supports all methods
available in |
cluster_alg |
Clustering algorithm to implement. Currently supports
hierarchical ( |
hclust_method |
Method to use if |
p_item |
Proportion of items to include in each subsample. |
p_feature |
Proportion of features to include in each subsample. |
wts_item |
Optional vector of item weights. |
wts_feature |
Optional vector of feature weights. |
pac_window |
Lower and upper bounds for the consensus index sub-interval over which to calculate the PAC. Must be on (0, 1). |
logit |
Logit transform PAC output? Allows for faster convergence of the null distribution toward normality, which aids in downstream statistical testing. |
seed |
Optional seed for reproducibility. |
parallel |
If a parallel backend is loaded and available, should the function use it? Highly advisable if hardware permits. |
Suitable reference PAC scores are essential to test the magnitude and
significance of cluster stability. This function generates B simulated
or permuted datasets with similar properties to dat
, but with random
sample cluster structure. The expected value of k for these datasets
is therefore 1, and PAC scores for each k form a null distribution
that tends toward normality as B increases.
ref_pacs
currently supports five methods for generating null datasets
from a given input matrix:
"pc_norm"
simulates the principal components by taking random
draws from a normal distribution with variance equal to the true
eigenvalues. Data are subsequently back-transformed to their original
dimensions by cross-multiplication with the true eigenvector matrix.
"pc_unif"
simulates the principal components by taking random
draws from a uniform distribution with ranges equal to those of the true
principal components. Data are subsequently back-transformed to their
original dimensions by cross-multiplication with the true eigenvector
matrix.
"cholesky"
simulates random Gaussian noise around the nearest
positive-definite approximation to dat
's feature-wise covariance
matrix.
"range"
selects random values uniformly from each feature's
observed range.
"permute"
shuffles each feature's observed values.
The first two options use the data's true eigenvectors to preserve
feature-wise covariance while scrambling sample-wise covariance.
"pc_norm"
tends to generate the most realistic null data, while Monte Carlo
replicates generated via "pc_unif"
converge more quickly to a true
k of 1. Both methods are fast and stable when features outnumber
samples. When samples outnumber features, ref_method
defaults to
"cholesky"
, which takes longer to compute, but is better suited for
such cases. "range"
and "permute"
are included for convenience,
but are not recommended since they do not preserve feature-wise covariance,
which may bias results.
Just as reference PAC distributions are the theoretical core of the CCtestr
approach to cluster validation, ref_pacs
is the computational core
of the CCtestr
package. This function can take some time to execute,
and should ideally be run in parallel, especially with large datasets.
A matrix with B
rows and max_k - 1
columns containing
null PAC scores for each cluster number k.
1 2 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.