scpModelComponentMethods | R Documentation |
Component analysis is a powerful tool for exploring data. The package implements the ANOVA-principal component analysis extended to linear models (APCA+) and derivatives (suggested by Thiel at al. 2017). This framework is based on principal component analysis (PCA) and allows exploring the data captured by each model variable individually.
scpModelComponentMethods
scpComponentAnalysis(
object,
method = NULL,
effects = NULL,
pcaFUN = "auto",
residuals = TRUE,
unmodelled = TRUE,
name,
...
)
scpComponentAggregate(componentList, fcol, fun = colMedians, ...)
scpComponentPlot(
componentList,
comp = 1:2,
pointParams = list(),
maxLevels = NULL
)
scpComponentBiplot(
scoreList,
eigenvectorList,
comp = 1:2,
pointParams = list(),
arrowParams = list(arrow = arrow(length = unit(0.2, "cm"))),
labelParams = list(size = 2, max.overlaps = 10),
textBy = "feature",
top = 10,
maxLevels = NULL
)
object |
An object that inherits from the
|
method |
A |
effects |
A |
pcaFUN |
A |
residuals |
A |
unmodelled |
A |
name |
A |
... |
For |
componentList |
A list of components analysis results. This
is typically the |
fcol |
A |
fun |
A |
comp |
An |
pointParams |
A |
maxLevels |
An |
scoreList |
A list of components analysis results. This
is typically the |
eigenvectorList |
A list of components analysis results. This
is typically the |
arrowParams |
A |
labelParams |
A |
textBy |
A |
top |
An |
An object of class character
of length 3.
Given A
a m x n matrix, PCA can be summarized as the
following decomposition:
AA^T / (n - 1) = VLV^T
Where V
is a m x k orthogonal matrix, that is VV^T = I
,
with k the number of components. V
is called the matrix of
eigenvectors. L
is the k x k diagonal matrix of eigenvalues
that contains the variance associated to each component, ordered
from highest to lowest variance. The unscaled PC scores are given
by S = A^TV
.
There are 2 available algorithm to perform PCA:
nipals
: The non-linear iterative partial least squares
(NIPALS) algorithm can handle missing values and
approximates classical PCA, although it does not explicitly
maximize the variance. This is implemented in nipals::nipals()
.
svd
: The singular value decomposition (SVD) is used to perform
an exact PCA, but it cannot handle missing values. This is
implemented in base::svd()
.
Which algorithm to use is controlled by the pcaFUN
argument, by
default ("auto"
), the function automatically uses svd
when
there is no missing values and nipals
when there is at least
one missing value.
scpComponentAnalysis()
performs a PCA on the modelling output.
What modelling output the function will use depends on the
method
. The are 3 PCA approaches:
ASCA
performs a PCA on the effect matrix, that is
A = \hat{M_f}
where f
is one of the effects in the
model. This PCA is useful to explore the modelled effects and
the relationship between different levels of a factor.
ASCA.E
: perform PCA on the effect matrix, just like ASCA. The
scores are then updated by projecting the effect matrix added to
the residuals using the eigenvectors, that is
scores = (\hat{M_f} + \epsilon)^TV
. This PCA is useful
to explore the modelled effects while blurring these effects
with the unmodelled variability. Note however that for this
approach, the scores are no longer guaranteed to be orthogonal
and the eigenvalues are no longer meaningful. The percentage of
variation should not be interpreted.
APCA
(default) performs PCA on the effect matrix plus the
residuals, that is A = \hat{M_f} + \epsilon
. This PCA
is useful to explore the modelled effects in relation with the
unmodelled variability that is remaining in the residuals.
Available methods are listed in scpModelComponentMethods
.
Note that for all three methods, a PCA on the residual matrix is
also performed when residuals = TRUE
, that is
A = \epsilon = Y - \hat{\beta}X^T
. A PCA on the residuals is
useful to explore residual effects that are not captured by any
effect in the model. Similarly, a PCA on the input data matrix,
that is on the data before modelling is also performed when
unmodelled = TRUE
, that is A = Y
.
scpComponentAnalysis()
always returns a list with 2 elements.
The first element, bySample
is a list where each element
contains the PC scores for the desired model variable(s). The
second element, byFeature
is a list where each element
contains the eigenvectors for the desired model variable(s).
scpAnnotateResults()
adds annotations to the component
analysis results. The annotations are added to all elements of the
list returned by scpComponentAnalysis()
. See the associated man
page for more information.
scpComponentPlot()
takes one of the two elements of the list
generated by scpComponentAnalysis()
and returns a list of
ggplot2
scatter plots. Commonly, the first two components,
that bear most of the variance, are explored for visualization,
but other components can be explored as well thanks to the comp
argument. Each point represents either a sample or a feature,
depending on the provided component analysis results
(see examples). Change the point aesthetics by providing ggplot
arguments in a list (see examples).
scpComponentBiplot()
simultaneously explores the PC scores
(sample-space) and the eigenvectors (feature-space). Scores are
shown as points while eigenvectors are shown as arrows. Point
aesthetics and arrow aesthetics can be controlled with the
pointParams
and the arrowParams
arguments, respectively.
Moreover, arrows are also labelled and label aesthetics can be
controlled using labelParams
and textBy
. Plotting all
eigenvectors as arrows leads to overcrowded plots. You can limit the plotting to
the top longest arrows (default to the top 10) as defined by the
distance on the two selected PCs.
scpComponentAggregate()
offers functionality to aggregate the
results from multiple features. This can be used to obtain, for
example, component analysis results for proteins when modelling at
the peptide level. The approach is inspired from
scuttle::aggregateAcrossCells()
and combines, for each group, multiple values for each component
using QFeatures::aggregateFeatures()
. By default, values are
aggregated using the median, but QFeatures
offers other methods
as well. The annotation of the component results are automatically
aggregated as well. See the aggregateFeatures()
man page for
more information on available methods and expected behavior.
Christophe Vanderaa, Laurent Gatto
Thiel, Michel, Baptiste FĂ©raud, and Bernadette Govaerts. 2017. "ASCA+ and APCA+: Extensions of ASCA and APCA in the Analysis of Unbalanced Multifactorial Designs." Journal of Chemometrics 31 (6): e2895.
ScpModel-Workflow to run a model on SCP data upstream of component analysis.
The nipals::nipals()
function and package for detailed
information about the algorithm and associated parameters.
The ggplot2::ggplot()
functions and associated tutorials to
manipulate and save the visualization output
scpAnnotateResults()
to annotate component analysis results.
library("patchwork")
library("ggplot2")
data("leduc_minimal")
leduc_minimal$cell <- rownames(colData(leduc_minimal))
####---- Run component analysis ----####
(pcs <- scpComponentAnalysis(
leduc_minimal, method = "ASCA", effects = "SampleType",
pcaFUN = "auto", residuals = FALSE, unmodelled = FALSE
))
####---- Annotate results ----####
## Add cell annotation available from the colData
bySamplePCs <- scpAnnotateResults(
pcs$bySample, colData(leduc_minimal), by = "cell"
)
## Add peptide annotations available from the rowData
byFeaturePCs <- scpAnnotateResults(
pcs$byFeature, rowData(leduc_minimal),
by = "feature", by2 = "Sequence"
)
####---- Plot results ----####
## Plot result in cell-space, ie each dot is a cell
scpComponentPlot(
bySamplePCs,
pointParams = list( ## ggplot arguments
aes(colour = SampleType, shape = lcbatch),
alpha = 0.6
)
) |>
wrap_plots(guides = "collect")
## Plot result in peptide-space, ie each dot is a peptide
scpComponentPlot(
byFeaturePCs,
pointParams = list(colour = "dodgerblue", alpha = 0.6)
) |>
wrap_plots(guides = "collect")
## Plot both
scpComponentBiplot(
bySamplePCs, byFeaturePCs,
pointParams = list(aes(colour = SampleType), alpha = 0.6),
labelParams = list(max.overlaps = 20),
textBy = "gene"
) |>
wrap_plots(guides = "collect")
####---- Aggregate results ----####
## Aggregate to protein-level results
byProteinPCs <- scpComponentAggregate(
byFeaturePCs, fcol = "Leading.razor.protein.id"
)
## Plot result in protein-space, ie each dot is a protein
scpComponentPlot(
byProteinPCs,
pointParams = list(colour = "firebrick", alpha = 0.6)
) |>
wrap_plots(guides = "collect")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.