fixedPCA | R Documentation |
Perform a PCA where the desired number of components is known ahead of time.
fixedPCA(
x,
rank = 50,
value = c("pca", "lowrank"),
subset.row,
preserve.shape = TRUE,
assay.type = "logcounts",
name = NULL,
BSPARAM = bsparam(),
BPPARAM = SerialParam()
)
x |
A SingleCellExperiment object containing a log-expression amtrix. |
rank |
Integer scalar specifying the number of components. |
value |
String specifying the type of value to return.
|
subset.row |
A logical, character or integer vector specifying the rows of |
preserve.shape |
Logical scalar indicating whether or not the output SingleCellExperiment should be subsetted to |
assay.type |
A string specifying which assay values to use. |
name |
String containing the name which which to store the results.
Defaults to |
BSPARAM |
A BiocSingularParam object specifying the algorithm to use for PCA. |
BPPARAM |
A BiocParallelParam object to use for parallel processing. |
In theory, there is an optimal number of components for any given application,
but in practice, the criterion for the optimum is difficult to define.
As a result, it is often satisfactory to take an a priori-defined “reasonable” number of PCs for downstream analyses.
A good rule of thumb is to set this to the upper bound on the expected number of subpopulations in the dataset
(see the reasoning in getClusteredPCs
.
We can use subset.row
to perform the PCA on a subset of genes.
This is typically used to subset to HVGs to reduce computational time and increase the signal-to-noise ratio of downstream analyses.
If preserve.shape=TRUE
, the rotation matrix is extrapolated to include loadings for “unselected” genes, i.e., not in subset.row
.
This is done by projecting their expression profiles into the low-dimensional space defined by the SVD on the selected genes.
By doing so, we ensure that the output always has the same number of rows as x
such that any value="lowrank"
can fit into the assays.
Otherwise, if preserve.shape=FALSE
, the output is subsetted by any non-NULL
value of subset.row
.
This is equivalent to the return value after calling the function on x[subset.row,]
.
A modified x
with:
the PC results stored in the reducedDims
as a "PCA"
entry, if type="pca"
.
The attributes contain the rotation matrix, the variance explained and the percentage of variance explained.
(Note that the last may not sum to 100% if max.rank
is smaller than the total number of PCs.)
a low-rank approximation stored as a new "lowrank"
assay, if type="lowrank"
.
This is represented as a LowRankMatrix.
Aaron Lun
denoisePCA
, where the number of PCs is automatically chosen.
getClusteredPCs
, another method to choose the number of PCs.
library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)
# Modelling the variance:
var.stats <- modelGeneVar(sce)
hvgs <- getTopHVGs(var.stats, n=1000)
# Defaults to pulling out the top 50 PCs.
set.seed(1000)
sce <- fixedPCA(sce, subset.row=hvgs)
reducedDimNames(sce)
# Get the percentage of variance explained.
attr(reducedDim(sce), "percentVar")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.