View source: R/unexplainedSamples.R
unexplainedSamples | R Documentation |
Given a catalogue of samples, and a set of signatures, this functions identifies samples that have a significantly higher error than the rest. It is important to have enough samples that are fully explained by the singatures in the catalogue matrix, so that the background error distribution is well defined and the samples with significantly higher error can be identified more clearly. Two criteria are used to identify the samples that are not fully explained by the signatures. The first criterion considers the relative amount of mutations and is governed by the parameters pvalueMethod and pvalue_threshold. In practice, we calculate a p-value for each sample error or residual (normalised by total mutations of the sample) to determine if i is higher than the rest. The parameter pvalueMethod allows to select different errors, though we advise to use normErrorSAD, which is the sum of absolute deviations of every channel divided by the number of mutations in the sample. This first criterion can be disabled by setting considerOnlyNmutsThreshold=TRUE. The second criterion is a minimum number of mutations that should be in the error or residual, which can be set with nmuts_threshold. The parameter nmutsMethod determines the type of error used, here we suggest residualSSD, the sum of signed differences of every channel of the residual. In this function we refer to the error as the difference between the original catalogue and the catalogue reconstructed from the exposures obtained with a simple signature fit (optimise for KLD), while we refer to the residual as the difference between the original catalogue and the catalogue reconstructed from the exposures obtained with a contrained fit, where the difference is constrained to be mostly positive, to better highlight the pattern that might be present in the catalogue butt not captured by he given signatures. Both criteria are necessary to consider a sample unexplained, unless considerOnlyNmutsThreshold=TRUE.
unexplainedSamples(
outfileRoot = NULL,
catalogues,
sigs,
pvalue_threshold = 0.03,
nmuts_threshold = 300,
pvalueMethod = "normErrorSAD",
nmutsMethod = "residualSSD",
considerOnlyNmutsThreshold = FALSE
)
outfileRoot |
if specified, generate a plot, otherwise no plot is generated |
catalogues |
original catalogues, channels as rows and samples as columns |
sigs |
mutational signatures used for fitting, channels as rows, signatures as columns |
pvalue_threshold |
threshold for statistical significance of the normalised error/residual |
nmuts_threshold |
minimum number of mutations in the error/residual to consider the samples unexplained |
pvalueMethod |
method to be used for the relative criterion. Default is normErrorSAD. Alternatives are normResidualSAD or normResidualSSD |
nmutsMethod |
method to be used for the absolute criterion. Default is residualSSD. Alternatives are errorSAD or residualSAD |
considerOnlyNmutsThreshold |
use only the absolute criterion and disables the relative criterion |
table of samples with associated error metrics and samples with significant error and/orr residual highlighted
resObj <- unexplainedSamples(catalogues=catalogues,
sigs=signatures)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.