decomposeTumorGenomes: Decompose tumor genomes into mutational signatures

View source: R/decomposeTumorGenomes.R

decomposeTumorGenomesR Documentation

Decompose tumor genomes into mutational signatures

Description

'decomposeTumorGenomes()' is the core function of this package. It decomposes tumor genomes into a given set of mutational signatures by computing their contributions (exposures) to the mutational load via quadratic programming. The function takes a set of mutational signatures and the mutation features of one or more tumor genomes and computes weights, i.e., contributions for each of the signatures in each individual genome. Alternatively, the function can determine for each genome only a subset of signatures whose contributions are sufficient to exceed a user-given minimum threshold for the explained variance of the genome's mutation patterns.

Usage

decomposeTumorGenomes(genomes, signatures, minExplainedVariance=NULL,
minNumSignatures=2, maxNumSignatures=NULL, greedySearch=FALSE,
constrainToMaxContribution=FALSE, tolerance=0.1, verbose=FALSE)

Arguments

genomes

(Mandatory) Can be either a vector, a data frame or a matrix (for an individual tumor genome), or a list of one of these object types (for multiple tumors). Each tumor genome must be of the same form as the signatures.

signatures

(Mandatory) A list of vectors, data frames or matrices. Each of the objects represents one mutational signature. Vectors are used for Alexandrov signatures, data frames or matrices for Shiraishi signatures.

minExplainedVariance

(Optional) If NULL (default), exactly maxNumSignatures (see below; default: all) will be taken for decomposing each genome. If a numeric value between 0 and 1 is specified for minExplainedVariance, for each genome the function will select the smallest number of signatures which is sufficient to explain at least the specified fraction of the variance of the genome's mutation patterns. E.g., if minExplainedVariance=0.99 the smallest subset of signatures that explains at least 99% of the variance is taken. Please note: depending on the number of signatures, this may take quite a while because by default for each number K of signatures, all possible subsets composed of K signatures will be tested to identify the subset that explains the highest part of the variance. If not enough variance is explained, K will be incremented by one. Notes: 1) to speed up the search, the parameters minNumSignatures, maxNumSignatures and greedySearch can be used; 2) for genomes for which none of the possible subsets of signatures explains enough variance, the returned exposure vector will be set to NULL.

minNumSignatures

(Optional) Used if minExplainedVariance is specified (see above). To find the smallest subset of signatures which explain the variance, at least this number of signatures will be taken. This can be used to reduce the search space in a time-consuming search over a large number of signatures.

maxNumSignatures

(Optional) If minExplainedVariance is specified to find the smallest subset of signatures which explain the variance, at most maxNumSignatures will be taken. This can be used to reduce the search space in a time-consuming search over a large number of signatures. If minExplainedVariance is NULL, then exactly maxNumSignatures signatures will be used. The default for maxNumSignatures is NULL (all signatures).

greedySearch

(Optional) Used only in case minExplainedVariance has been specified. If greedySearch is TRUE then not all possible combinations of minNumSignatures to maxNumSignatures signatures will be checked. Instead, first all possible combinations for exactly minNumSignatures will be checked to select the best starting set, then iteratively the next best signature will be added (maximum increase in explained variability) until minExplainedVariance of the variance can be explained (or maxNumSignatures is exceeded). NOTE: this approximate search is highly recommended for large sets of signatures (>15)!

constrainToMaxContribution

(Optional) [Note: this is EXPERIMENTAL and is usually not needed!] If TRUE, the maximum contribution that can be attributed to a signature will be constraint by the variant feature counts (e.g., specific flanking bases) observed in the individual tumor genome. If, for example, 30% of all observed variants have a specific feature and 60% of the variants produced by a mutational process/signature will manifest the feature, then the signature can have contributed up to 0.3/0.6 (=0.5 or 50%) of the observed variants. The lowest possible contribution over all signature features will be taken as the allowed maximum contribution of the signature. This allowed maximum will additionally be increased by the value specified as tolerance (see below). For the illustrated example and tolerance=0.1 a contribution of up to 0.5+0.1 = 0.6 (or 60%) of the signature would be allowed.

tolerance

(Optional) If constrainToMaxContribution is TRUE, the maximum contribution computed for a signature is increased by this value (see above). If the parameter constrainToMaxContribution is FALSE, the tolerance value is ignored. Default: 0.1.

verbose

(Optional) If TRUE some information about the processed genome and used number of signatures will be printed.

Value

A list of signature weight vectors (also called 'exposures'), one for each tumor genome. E.g., the first vector element of the first list object is the weight/contribution of the first signature to the first tumor genome. IMPORTANT: If minExplainedVariance is specified, then the exposures of a genome will NOT be returned if the minimum explained variance is not reached within the requested minimum and maximum numbers of signatures (minNumSignatures and maxNumSignatures)! The corresponding exposure vector will be set to NULL.

Author(s)

Rosario M. Piro, Politecnico di Milano
Sandra Krueger, Freie Universitaet Berlin
Maintainer: Rosario M. Piro
E-Mail: <rmpiro@gmail.com> or <rosariomichael.piro@polimi.it>

References

http://rmpiro.net/decompTumor2Sig/
Krueger, Piro (2019) decompTumor2Sig: Identification of mutational signatures active in individual tumors. BMC Bioinformatics 20(Suppl 4):152.

See Also

decompTumor2Sig

Examples


### get Alexandrov signatures from COSMIC
signatures <- readAlexandrovSignatures()

### load reference genome
refGenome <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19

### read breast cancer genomes from Nik-Zainal et al (PMID: 22608084) 
gfile <- system.file("extdata",
         "Nik-Zainal_PMID_22608084-VCF-convertedfromMPF.vcf.gz", 
         package="decompTumor2Sig")
genomes <- readGenomesFromVCF(gfile, numBases=3, type="Alexandrov",
         trDir=FALSE, refGenome=refGenome, verbose=FALSE)

### compute exposures
exposures <- decomposeTumorGenomes(genomes, signatures, verbose=FALSE)

### (for further examples on searching subsets, please see the vignette)


rmpiro/decompTumor2Sig documentation built on May 15, 2022, 3:27 a.m.