View source: R/signatureFitMultiStepLib.R
FitMS | R Documentation |
Given a set of mutational catalogues, this function will attempt fit mutational signature in a multi-step manner. In the first step, only the common signatures are fitted into the samples. In the following steps, one or more rare signatures are fitted into the samples in addition to the common signatures. Common and rare signatures can be determined automatically by providing the name of an organ, or can be supplied by the user.
FitMS(
catalogues,
organ = NULL,
commonSignatureTier = "T1",
rareSignatureTier = "T2",
commonSignatures = NULL,
rareSignatures = NULL,
method = "KLD",
exposureFilterType = "fixedThreshold",
threshold_percent = 5,
threshold_nmuts = -1,
giniThresholdScaling = 10,
giniThresholdScaling_nmuts = -1,
multiStepMode = "errorReduction",
residualNegativeProp = 0.003,
minResidualMutations = NULL,
minCosSimRareSig = 0.8,
minErrorReductionPerc = 15,
minCosSimIncrease = 0.02,
useBootstrap = FALSE,
nboot = 200,
threshold_p.value = 0.05,
maxRareSigsPerSample = 1,
rareCandidateSelectionCriteria = "MaxCosSim",
nparallel = 1,
randomSeed = NULL,
verbose = FALSE
)
catalogues |
catalogues matrix, samples as columns, channels as rows |
organ |
#automatically sets the commonSignatures and rareSignatures parameters, which can be left as NULL. The following organs are available: "Biliary", "Bladder", "Bone_SoftTissue", "Breast", "CNS", "Colorectal", "Esophagus", "Head_neck", "Kidney", "Liver", "Lung", "Lymphoid", "Myeloid", "NET", "Oral_Oropharyngeal", "Ovary", "Pancreas", "Prostate", "Skin", "Stomach", "Uterus". Alternatively, set this to "Other" to use a curated set of common and rare signatures. For SBS the "Other" common signatures set contains: SBS1, SBS2, SBS3, SBS5, SBS8, SBS13, SBS17 and SBS18. |
commonSignatureTier |
is either T1, T2 or T3. The default option is T1. For each organ, T1 indicates to use the common organ-specific signatures, while T2 indicates to use the corresponding reference signatures. In general, T1 should be more appropriate for organs where there are no mixed organ-specific signatures, e.g. GEL-Ovary_common_SBS1+18, while T2 might be more suitable for when such mixed signatures are present, so that each signature can be fitted, e.g. fitting the two signatures SBS1 and SBS18, instead of a single GEL-Ovary_common_SBS1+18. T3 is an intermediate option between T1 and T2, where only the mixed organ signatures are replaced with the corresponding reference signatures. |
rareSignatureTier |
is either T0, T1, T2, T3 or T4. The default option is T2. For each organ, T0 are rare signatures that were observed in the requested organ, including low quality signatures (QC amber and red signatures). T1 are high quality (QC green) rare signatures that were observed in the requested organ. T2-T4 signatures extend the rare signatures set to what has been observed also in other organs. T2 includes all QC green signatures found in other organs, with the additional restriction in the case of SBS that the additional signatures were classified as rare at least twice in Degasperi et al. 2022 Science. T3 includes all QC green signatures (if not SBS, T3=T2). T4 includes all signatures including QC amber and red. In general we advise to use the rare T2 tier. |
commonSignatures |
signatures, signatures as columns, channels as rows. These are the signatures that are assumed to be present in most samples and will be used in the first step. Can be set automatically by specifying the organ parameter |
rareSignatures |
signatures, signatures as columns, channels as rows. These are the signatures that are assumed to be rarely present in a sample, at most maxRareSigsPerSample rare signatures in each sample. Can be set automatically by specifying the organ parameter and the rareSignatureTier parameter |
method |
KLD or NNLS |
exposureFilterType |
use either fixedThreshold or giniScaledThreshold. When using fixedThreshold, exposures will be removed based on a fixed percentage with respect to the total number of mutations (threshold_percent will be used). When using giniScaledThreshold each signature will used a different threshold calculated as (1-Gini(signature))*giniThresholdScaling |
threshold_percent |
threshold in percentage of total mutations in a sample, only exposures larger than threshold are considered. Set it to -1 to deactivate. |
threshold_nmuts |
threshold in number of mutations in a sample, only exposures larger than threshold are considered.Set it to -1 to deactivate. |
giniThresholdScaling |
scaling factor for the threshold type giniScaledThreshold, which is based on the Gini score of a signature. The threshold is computed as (1-Gini(signature))*giniThresholdScaling, and will be used as a percentage of mutations in a sample that the exposure of "signature" need to be larger than. Set it to -1 to deactivate. |
giniThresholdScaling_nmuts |
scaling factor for the threshold type giniScaledThreshold, which is based on the Gini score of a signature. The threshold is computed as (1-Gini(signature))*giniThresholdScaling_nmuts, and will be used as number of mutations in a sample that the exposure of "signature" need to be larger than. Set to -1 to deactivate. |
multiStepMode |
use one of the following: "constrainedFit", "partialNMF", "errorReduction", or "cossimIncrease". |
residualNegativeProp |
maximum proportion of mutations (w.r.t. total mutations in a sample) that can be in the negative part of a residual when using the constrained least squares fit when using multiStepMode=constrainedFit |
minResidualMutations |
minimum number of mutations in a residual when using constrainedFit or partialNMF. Deactivated by default. |
minCosSimRareSig |
minimum cosine similarity between a residual and a rare signature for considering the rare signature as a candidate for a sample when using constrainedFit or partialNMF |
minErrorReductionPerc |
minimum percentage of error reduction for a signature to be considered as candidate when using the errorReduction method. The error is computed as mean absolute deviation |
minCosSimIncrease |
minimum cosine similarity increase for a signature to be considered as candidate when using the cossimIncrease method |
useBootstrap |
set to TRUE to use bootstrap |
nboot |
number of bootstraps to use, more bootstraps more accurate results |
threshold_p.value |
p-value to determine whether an exposure is above the threshold_percent. In other words, this is the empirical probability that the exposure is lower than the threshold |
maxRareSigsPerSample |
masimum number of rare signatures that should be serched in each sample. In most situations, leaving this at 1 should be enough. |
rareCandidateSelectionCriteria |
MaxCosSim or MinError. Whenever there is more than one rare signature that passes the multiStepMode criteria, then the best candidate rare signature is automatically selected using the rareCandidateSelectionCriteria. Candidate rare signatures can be manually selected using the function fitMerge. The parameter rareCandidateSelectionCriteria is set to MaxCosSim by default. Error is computed as the mean absolute deviation of channels. |
nparallel |
to use parallel specify >1 |
randomSeed |
set an integer random seed |
verbose |
use FALSE to suppress messages |
We provide four methods to identify the rare signatures in the samples: "constrainedFit", "partialNMF", "errorReduction", or "cossimIncrease". The methods constrainedFit and partialNMF work in a similar way: they identify a residual in each given sample, as the leftover mutations after fitting the common signatures. They will attempt to produce a mostly positive residual. Each residual is then compared to each rare signature, and a rare signature is considered as a candidate rare signature for a sample if the cosine similarity between the residual and the signature is at least minCosSimRareSig. One can also request that the residual is at least minResidualMutations. While constrainedFit will use a constrained least square fit where the negative part of the residual is at most residualNegativeProp (a proportion of the number of mutations in the sample), partialNMF will instead use a few iterations of a KLD based NMF algorithm where the matrix of the signatures contains the common signature and an additional signatures that needs to be estimated (NNLM package). The methods errorReduction and cossimIncrease work in a similar way: they will fit the common signatures along with one additional rare signature, testing all rare signatures one at a time, and then determine difference in error (or cosine similarity) between fitting the common signatures only and with the additional rare signatures. If the error reduction is at least minErrorReductionPerc (or the cosine similarity increase is at least minCosSimIncrease then the rare signature will be considered as a candidate.
After any of ghe procedures above, each sample may have multiple candidate rare signatures, so one is chosen according to the highest associated cosine similarity either of the residual to the candidate rare signature (constrainedFit and partialNMF methods), or of the catalogue and the reconstructed sample (errorReduction and cossimIncrease methods). It is then possible to plot all the fits with plotFitMS and even change the choise of the candidate rare signature using the function fitMerge.
A post fit exposure filter will reduce the false positive singature assignments by setting to zero exposure values that are below a certain threshold. We provide two exposureFilterType methods: fixedThreshold and giniScaledThreshold. The fixedThreshold method will set to zero exposures that are below a fixed threshold given as a percentage of the mutations in a sample (parameter threshold_percent), while the method giniScaledThreshold will use a different threshold for each signature, computed as (1-Gini(signature))*giniThresholdScaling, which will also be a percentage of the mutations in a sample.
returns the activities/exposures of the signatures in the given sample and other information
A. Degasperi, X. Zou, T. D. Amarante, ..., H. Davies, Genomics England Research Consortium, S. Nik-Zainal. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science, 2022.
res <- FitMS(catalogues,"Breast")
plotFitMS(res,"results/")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.