number-signatures: Number of Signatures

Description Usage Arguments Details Value References See Also Examples

Description

Assessment of the number of signatures in the data.

Usage

1
2
3
assessNumberSignatures(m, nSigs, decomposition = nmfDecomposition, ..., nReplicates = 1)

plotNumberSignatures(gof)

Arguments

m

Mutational spectrum matrix, same as used for 'identifySignatures'.

nSigs

Vector of integers with the numbers of signatures that should be tested. See the 'nSigs' arugment for 'identifySignatures'.

decomposition

Function to apply for the matrix decomposition. See the 'decomposition' argument for 'identifySignatures'.

...

Additional arguments passed to the 'decomposition' function. See the '...' argument for 'identifySignatures'.

nReplicates

How many runs should be used for assessing a value of 'nSigs'? For decomposition methods with random seeding, values greater than 1 should be used.

gof

Data frame, as returned of 'assessNumberSignatures'

.

Details

Compute the decomposition for a given number of signatures, and assess the goodness of the reconstruction between the observed and fitted mutational spectra M and V, respectively. The residual sum of squares (RSS)

RSS = ∑_{i,j} (M_{ij} - V_{ij})^2

and the explained variance

evar = 1 - RSS/sum V_{ij}^2

are used as summary statistics which can generally applied to all decomposition approaches.

The 'plotNumberSignatures' function visualizes the results of the 'assessNumberSignatures' analysis. Statistics of the indivdual runs are shown as gray crosses, whereas the mean across the runs is depicted in red.

If a decomposition method uses random seeding and hence recomputing the decomposition of the same data can yield different results, evaluating the summary statistics will give more reliable estimates of the number of signatures. This applies to some NMF algorthims, for example. Methods with a deterministic decomposition, such as the standard PCA, do not need this, since repeated computations will yield the same decomposition. This behaviour is controlled by the 'nReplicates' parameter, where the default of '1' corresponds to a single run.

In practice, these summary statisics should not be trusted blindly, but rather interpreted together with biological knowledge and scientifc reasoning. For a discussion of the interpretation of these statistics with special focus on the NMF decomposition, please refer to the references listed below.

Value

- assessNumberSignatures: A data frame with the RSS and explained variance for each run

- plotNumberSignatures: A ggplot object

References

Hutchins LN, Murphy SM, Singh P and Graber JH (2008): 'Position-dependent motif characterization using non-negative matrix factorization.' Bioinformatics, http://dx.doi.org/10.1093/bioinformatics/btn526

See Also

identifySignatures

rss and evar functions of the NMF package.

Examples

1
2
3
4
5
6
  data("sca_mm", package = "SomaticSignatures")
  
  nSigs = 2:8
  stat = assessNumberSignatures(sca_mm, nSigs, nReplicates = 3)

  plotNumberSignatures(stat)

julian-gehring/SomaticSignatures documentation built on May 31, 2020, 5:54 a.m.