unexplainedSamples: Estimate samples not fully explained by the given signatures

View source: R/unexplainedSamples.R

unexplainedSamplesR Documentation

Estimate samples not fully explained by the given signatures

Description

Given a catalogue of samples, and a set of signatures, this functions identifies samples that have a significantly higher error than the rest. It is important to have enough samples that are fully explained by the singatures in the catalogue matrix, so that the background error distribution is well defined and the samples with significantly higher error can be identified more clearly. Two criteria are used to identify the samples that are not fully explained by the signatures. The first criterion considers the relative amount of mutations and is governed by the parameters pvalueMethod and pvalue_threshold. In practice, we calculate a p-value for each sample error or residual (normalised by total mutations of the sample) to determine if i is higher than the rest. The parameter pvalueMethod allows to select different errors, though we advise to use normErrorSAD, which is the sum of absolute deviations of every channel divided by the number of mutations in the sample. This first criterion can be disabled by setting considerOnlyNmutsThreshold=TRUE. The second criterion is a minimum number of mutations that should be in the error or residual, which can be set with nmuts_threshold. The parameter nmutsMethod determines the type of error used, here we suggest residualSSD, the sum of signed differences of every channel of the residual. In this function we refer to the error as the difference between the original catalogue and the catalogue reconstructed from the exposures obtained with a simple signature fit (optimise for KLD), while we refer to the residual as the difference between the original catalogue and the catalogue reconstructed from the exposures obtained with a contrained fit, where the difference is constrained to be mostly positive, to better highlight the pattern that might be present in the catalogue butt not captured by he given signatures. Both criteria are necessary to consider a sample unexplained, unless considerOnlyNmutsThreshold=TRUE.

Usage

unexplainedSamples(
  outfileRoot = NULL,
  catalogues,
  sigs,
  pvalue_threshold = 0.03,
  nmuts_threshold = 300,
  pvalueMethod = "normErrorSAD",
  nmutsMethod = "residualSSD",
  considerOnlyNmutsThreshold = FALSE
)

Arguments

outfileRoot

if specified, generate a plot, otherwise no plot is generated

catalogues

original catalogues, channels as rows and samples as columns

sigs

mutational signatures used for fitting, channels as rows, signatures as columns

pvalue_threshold

threshold for statistical significance of the normalised error/residual

nmuts_threshold

minimum number of mutations in the error/residual to consider the samples unexplained

pvalueMethod

method to be used for the relative criterion. Default is normErrorSAD. Alternatives are normResidualSAD or normResidualSSD

nmutsMethod

method to be used for the absolute criterion. Default is residualSSD. Alternatives are errorSAD or residualSAD

considerOnlyNmutsThreshold

use only the absolute criterion and disables the relative criterion

Value

table of samples with associated error metrics and samples with significant error and/orr residual highlighted

Examples

resObj <- unexplainedSamples(catalogues=catalogues,
                             sigs=signatures)

Nik-Zainal-Group/signature.tools.lib documentation built on April 13, 2025, 5:50 p.m.