Description Usage Arguments Details Value References See Also Examples
Assessment of the number of signatures in the data.
1 2 3 | assessNumberSignatures(m, nSigs, decomposition = nmfDecomposition, ..., nReplicates = 1)
plotNumberSignatures(gof)
|
m |
Mutational spectrum matrix, same as used for 'identifySignatures'. |
nSigs |
Vector of integers with the numbers of signatures that should be tested. See the 'nSigs' arugment for 'identifySignatures'. |
decomposition |
Function to apply for the matrix decomposition. See the 'decomposition' argument for 'identifySignatures'. |
... |
Additional arguments passed to the 'decomposition' function. See the '...' argument for 'identifySignatures'. |
nReplicates |
How many runs should be used for assessing a value of 'nSigs'? For decomposition methods with random seeding, values greater than 1 should be used. |
gof |
Data frame, as returned of 'assessNumberSignatures' |
.
Compute the decomposition for a given number of signatures, and assess the goodness of the reconstruction between the observed and fitted mutational spectra M and V, respectively. The residual sum of squares (RSS)
RSS = ∑_{i,j} (M_{ij} - V_{ij})^2
and the explained variance
evar = 1 - RSS/sum V_{ij}^2
are used as summary statistics which can generally applied to all decomposition approaches.
The 'plotNumberSignatures' function visualizes the results of the 'assessNumberSignatures' analysis. Statistics of the indivdual runs are shown as gray crosses, whereas the mean across the runs is depicted in red.
If a decomposition method uses random seeding and hence recomputing the decomposition of the same data can yield different results, evaluating the summary statistics will give more reliable estimates of the number of signatures. This applies to some NMF algorthims, for example. Methods with a deterministic decomposition, such as the standard PCA, do not need this, since repeated computations will yield the same decomposition. This behaviour is controlled by the 'nReplicates' parameter, where the default of '1' corresponds to a single run.
In practice, these summary statisics should not be trusted blindly, but rather interpreted together with biological knowledge and scientifc reasoning. For a discussion of the interpretation of these statistics with special focus on the NMF decomposition, please refer to the references listed below.
- assessNumberSignatures: A data frame with the RSS and explained variance for each run
- plotNumberSignatures: A ggplot object
Hutchins LN, Murphy SM, Singh P and Graber JH (2008): 'Position-dependent motif characterization using non-negative matrix factorization.' Bioinformatics, http://dx.doi.org/10.1093/bioinformatics/btn526
rss
and evar
functions of the
NMF
package.
1 2 3 4 5 6 | data("sca_mm", package = "SomaticSignatures")
nSigs = 2:8
stat = assessNumberSignatures(sca_mm, nSigs, nReplicates = 3)
plotNumberSignatures(stat)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.