| fixedPCA | R Documentation |
Perform a PCA where the desired number of components is known ahead of time.
fixedPCA(
x,
rank = 50,
value = c("pca", "lowrank"),
subset.row,
preserve.shape = TRUE,
assay.type = "logcounts",
name = NULL,
BSPARAM = bsparam(),
BPPARAM = SerialParam()
)
x |
A SingleCellExperiment object containing a log-expression amtrix. |
rank |
Integer scalar specifying the number of components. |
value |
String specifying the type of value to return.
|
subset.row |
A logical, character or integer vector specifying the rows of |
preserve.shape |
Logical scalar indicating whether or not the output SingleCellExperiment should be subsetted to |
assay.type |
A string specifying which assay values to use. |
name |
String containing the name which which to store the results.
Defaults to |
BSPARAM |
A BiocSingularParam object specifying the algorithm to use for PCA. |
BPPARAM |
A BiocParallelParam object to use for parallel processing. |
In theory, there is an optimal number of components for any given application,
but in practice, the criterion for the optimum is difficult to define.
As a result, it is often satisfactory to take an a priori-defined “reasonable” number of PCs for downstream analyses.
A good rule of thumb is to set this to the upper bound on the expected number of subpopulations in the dataset
(see the reasoning in getClusteredPCs.
We can use subset.row to perform the PCA on a subset of genes.
This is typically used to subset to HVGs to reduce computational time and increase the signal-to-noise ratio of downstream analyses.
If preserve.shape=TRUE, the rotation matrix is extrapolated to include loadings for “unselected” genes, i.e., not in subset.row.
This is done by projecting their expression profiles into the low-dimensional space defined by the SVD on the selected genes.
By doing so, we ensure that the output always has the same number of rows as x such that any value="lowrank" can fit into the assays.
Otherwise, if preserve.shape=FALSE, the output is subsetted by any non-NULL value of subset.row.
This is equivalent to the return value after calling the function on x[subset.row,].
A modified x with:
the PC results stored in the reducedDims as a "PCA" entry, if type="pca".
The attributes contain the rotation matrix, the variance explained and the percentage of variance explained.
(Note that the last may not sum to 100% if max.rank is smaller than the total number of PCs.)
a low-rank approximation stored as a new "lowrank" assay, if type="lowrank".
This is represented as a LowRankMatrix.
Aaron Lun
denoisePCA, where the number of PCs is automatically chosen.
getClusteredPCs, another method to choose the number of PCs.
library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)
# Modelling the variance:
var.stats <- modelGeneVar(sce)
hvgs <- getTopHVGs(var.stats, n=1000)
# Defaults to pulling out the top 50 PCs.
set.seed(1000)
sce <- fixedPCA(sce, subset.row=hvgs)
reducedDimNames(sce)
# Get the percentage of variance explained.
attr(reducedDim(sce), "percentVar")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.