# Andrea Degasperi ad923@cam.ac.uk, Serena Nik-Zainal group, University of Cambridge, UK, 2021
#' Multi-Step Mutational Signatures Fit
#'
#' Given a set of mutational catalogues, this function will attempt fit mutational signature in a multi-step manner. In the first step, only the common signatures are fitted into the samples.
#' In the following steps, one or more rare signatures are fitted into the samples in addition to the common signatures. Common and rare signatures can be determined automatically by providing
#' the name of an organ, or can be supplied by the user.
#'
#' We provide four methods to identify the rare signatures in the samples:
#' "constrainedFit", "partialNMF", "errorReduction", or "cossimIncrease". The methods constrainedFit and partialNMF work in a similar way:
#' they identify a residual in each given sample, as the leftover mutations after fitting the common signatures. They will attempt to produce a mostly positive residual.
#' Each residual is then compared to each rare signature, and a rare signature is considered as a candidate rare signature for a sample if the cosine similarity between the residual
#' and the signature is at least minCosSimRareSig. One can also request that the residual is at least minResidualMutations. While constrainedFit will use a constrained least square fit where
#' the negative part of the residual is at most residualNegativeProp (a proportion of the number of mutations in the sample), partialNMF will instead use a few iterations of a KLD based NMF algorithm
#' where the matrix of the signatures contains the common signature and an additional signatures that needs to be estimated (NNLM package). The methods errorReduction and cossimIncrease work
#' in a similar way: they will fit the common signatures along with one additional rare signature, testing all rare signatures one at a time, and then determine difference in error
#' (or cosine similarity) between fitting the common signatures only and with the additional rare signatures. If the error reduction is at least minErrorReductionPerc (or the cosine similarity
#' increase is at least minCosSimIncrease then the rare signature will be considered as a candidate.
#'
#' After any of ghe procedures above, each sample may have multiple candidate rare signatures, so one is chosen according to the highest associated cosine similarity either of
#' the residual to the candidate rare signature (constrainedFit and partialNMF methods), or of the catalogue and the reconstructed sample (errorReduction and cossimIncrease methods).
#' It is then possible to plot all the fits with plotFitMS and even change the choise of the candidate rare signature using the function fitMerge.
#'
#' A post fit exposure filter will reduce the false positive singature assignments by setting to zero exposure values that
#' are below a certain threshold. We provide two exposureFilterType methods: fixedThreshold and giniScaledThreshold. The
#' fixedThreshold method will set to zero exposures that are below a fixed threshold given as a percentage of the mutations
#' in a sample (parameter threshold_percent), while the method giniScaledThreshold will use a different threshold for each
#' signature, computed as (1-Gini(signature))*giniThresholdScaling, which will also be a percentage of the mutations in a sample.
#'
#' @param catalogues catalogues matrix, samples as columns, channels as rows
#' @param organ #automatically sets the commonSignatures and rareSignatures parameters, which can be left as NULL. The following organs are available:
#' "Biliary", "Bladder", "Bone_SoftTissue", "Breast", "CNS", "Colorectal", "Esophagus", "Head_neck", "Kidney", "Liver", "Lung", "Lymphoid", "Myeloid",
#' "NET", "Oral_Oropharyngeal", "Ovary", "Pancreas", "Prostate", "Skin", "Stomach", "Uterus". Alternatively, set this to "Other" to use a curated set of common and rare signatures. For SBS the "Other" common signatures set contains: SBS1, SBS2, SBS3, SBS5, SBS8, SBS13, SBS17 and SBS18.
#' @param commonSignatureTier is either T1, T2 or T3. The default option is T1. For each organ, T1 indicates to use the common organ-specific signatures, while T2 indicates to use the corresponding reference signatures. In general, T1 should be more appropriate for organs where there are no mixed organ-specific signatures, e.g. GEL-Ovary_common_SBS1+18, while T2 might be more suitable for when such mixed signatures are present, so that each signature can be fitted, e.g. fitting the two signatures SBS1 and SBS18, instead of a single GEL-Ovary_common_SBS1+18. T3 is an intermediate option between T1 and T2, where only the mixed organ signatures are replaced with the corresponding reference signatures.
#' @param rareSignatureTier is either T0, T1, T2, T3 or T4. The default option is T2. For each organ, T0 are rare signatures that were observed in the requested organ, including low quality signatures (QC amber and red signatures). T1 are high quality (QC green) rare signatures that were observed in the requested organ. T2-T4 signatures extend the rare signatures set to what has been observed also in other organs. T2 includes all QC green signatures found in other organs, with the additional restriction in the case of SBS that the additional signatures were classified as rare at least twice in Degasperi et al. 2022 Science. T3 includes all QC green signatures (if not SBS, T3=T2). T4 includes all signatures including QC amber and red. In general we advise to use the rare T2 tier.
#' @param commonSignatures signatures, signatures as columns, channels as rows. These are the signatures that are assumed to be present in most samples and will be used in the first step.
#' Can be set automatically by specifying the organ parameter
#' @param rareSignatures signatures, signatures as columns, channels as rows. These are the signatures that are assumed to be rarely present in a sample, at most maxRareSigsPerSample rare signatures in each sample.
#' Can be set automatically by specifying the organ parameter and the rareSignatureTier parameter
#' @param method KLD or NNLS
#' @param exposureFilterType use either fixedThreshold or giniScaledThreshold. When using fixedThreshold, exposures will be removed based on a fixed percentage with respect to the total number of mutations (threshold_percent will be used). When using giniScaledThreshold each signature will used a different threshold calculated as (1-Gini(signature))*giniThresholdScaling
#' @param threshold_percent threshold in percentage of total mutations in a sample, only exposures larger than threshold are considered. Set it to -1 to deactivate.
#' @param threshold_nmuts threshold in number of mutations in a sample, only exposures larger than threshold are considered.Set it to -1 to deactivate.
#' @param giniThresholdScaling scaling factor for the threshold type giniScaledThreshold, which is based on the Gini score of a signature. The threshold is computed as (1-Gini(signature))*giniThresholdScaling, and will be used as a percentage of mutations in a sample that the exposure of "signature" need to be larger than. Set it to -1 to deactivate.
#' @param giniThresholdScaling_nmuts scaling factor for the threshold type giniScaledThreshold, which is based on the Gini score of a signature. The threshold is computed as (1-Gini(signature))*giniThresholdScaling_nmuts, and will be used as number of mutations in a sample that the exposure of "signature" need to be larger than. Set to -1 to deactivate.
#' @param multiStepMode use one of the following: "constrainedFit", "partialNMF", "errorReduction", or "cossimIncrease".
#' @param residualNegativeProp maximum proportion of mutations (w.r.t. total mutations in a sample) that can be in the negative part of a residual when using the constrained least squares fit
#' when using multiStepMode=constrainedFit
#' @param minResidualMutations minimum number of mutations in a residual when using constrainedFit or partialNMF. Deactivated by default.
#' @param minCosSimRareSig minimum cosine similarity between a residual and a rare signature for considering the rare signature as a candidate for a sample when using constrainedFit or partialNMF
#' @param minErrorReductionPerc minimum percentage of error reduction for a signature to be considered as candidate when using the errorReduction method. The error is computed as mean absolute deviation
#' @param minCosSimIncrease minimum cosine similarity increase for a signature to be considered as candidate when using the cossimIncrease method
#' @param useBootstrap set to TRUE to use bootstrap
#' @param nboot number of bootstraps to use, more bootstraps more accurate results
#' @param threshold_p.value p-value to determine whether an exposure is above the threshold_percent. In other words, this is the empirical probability that the exposure is lower than the threshold
#' @param maxRareSigsPerSample masimum number of rare signatures that should be serched in each sample. In most situations, leaving this at 1 should be enough.
#' @param rareCandidateSelectionCriteria MaxCosSim or MinError. Whenever there is more than one rare signature that passes the multiStepMode criteria, then the best candidate rare signature is automatically selected using the rareCandidateSelectionCriteria. Candidate rare signatures can be manually selected using the function fitMerge. The parameter rareCandidateSelectionCriteria is set to MaxCosSim by default. Error is computed as the mean absolute deviation of channels.
#' @param nparallel to use parallel specify >1
#' @param randomSeed set an integer random seed
#' @param verbose use FALSE to suppress messages
#' @return returns the activities/exposures of the signatures in the given sample and other information
#' @keywords mutational signatures fit
#' @export
#' @references A. Degasperi, X. Zou, T. D. Amarante, ..., H. Davies, Genomics England Research Consortium, S. Nik-Zainal. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science, 2022.
#' @examples
#' res <- FitMS(catalogues,"Breast")
#' plotFitMS(res,"results/")
FitMS <- function(catalogues,
organ = NULL, #automatically sets the common and rare signatures
commonSignatureTier = "T1",
rareSignatureTier = "T2",
commonSignatures = NULL,
rareSignatures = NULL,
method = "KLD",
exposureFilterType = "fixedThreshold", # or "giniScaledThreshold"
threshold_percent = 5,
threshold_nmuts = -1,
giniThresholdScaling = 10,
giniThresholdScaling_nmuts = -1,
multiStepMode = "errorReduction", # or "partialNMF", or "errorReduction", or "cossimIncrease"
residualNegativeProp = 0.003,
minResidualMutations = NULL,
minCosSimRareSig = 0.8,
minErrorReductionPerc = 15,
minCosSimIncrease = 0.02,
useBootstrap = FALSE,
nboot = 200,
threshold_p.value = 0.05,
maxRareSigsPerSample = 1,
rareCandidateSelectionCriteria="MaxCosSim",
nparallel = 1,
randomSeed = NULL,
verbose = FALSE){
# check type of mutations
typeofmuts <- getTypeOfMutationsFromChannels(catalogues)
# check common signatures
if(!is.null(commonSignatures)){
if(verbose) message("[info FitMS] Using user provided commonSignatures.")
}else{
if(is.null(organ)){
message("[error FitMS] No organ was specified and no commonSignatures were given. Nothing to do.")
return(NULL)
}else{
sigsToUse <- getSignaturesForFitting(organ = organ,
typemut = typeofmuts,
commontier = commonSignatureTier,
raretier = rareSignatureTier,
verbose = F)
# some checks
if(is.null(sigsToUse)){
message("[error FitMS] Common/Rare signatures for type of mutations ",typeofmuts," and organ ",organ," not available. As an alternative, you can provide your own commonSignatures.")
return(NULL)
}
if(ncol(sigsToUse$common)==0){
message("[error FitMS] Common signatures for type of mutations ",typeofmuts," and organ ",organ," not available. As an alternative, you can provide your own commonSignatures.")
return(NULL)
}
commonSignatures <- sigsToUse$common
}
}
# check rare signatures
if(!is.null(rareSignatures)){
if(verbose) message("[info FitMS] Using user provided rareSignatures.")
}else{
if(is.null(organ)){
message("[error FitMS] No organ was specified and no rareSignatures were given. Nothing to do.")
return(NULL)
}else{
sigsToUse <- getSignaturesForFitting(organ = organ,
typemut = typeofmuts,
commontier = commonSignatureTier,
raretier = rareSignatureTier,
verbose = F)
# some checks
if(is.null(sigsToUse)){
message("[error FitMS] Common/Rare signatures for type of mutations ",typeofmuts," and organ ",organ," not available. As an alternative, you can provide your own rareSignatures.")
return(NULL)
}
if(ncol(sigsToUse$rare)==0){
message("[error FitMS] Rare signatures for type of mutations ",typeofmuts," and organ ",organ," not available. As an alternative, you can provide your own rareSignatures.")
return(NULL)
}
rareSignatures <- sigsToUse$rare
}
}
# some checks
if((multiStepMode=="partialNMF" | multiStepMode=="constrainedFit") & rareCandidateSelectionCriteria=="MinError"){
message("[warning FitMS] multiStepMode set to ",multiStepMode,": changing rareCandidateSelectionCriteria to MaxCosSim.",
"The rareCandidateSelectionCriteria MinError is only used for multiStepMode errorReduction or cossimIncrease.")
rareCandidateSelectionCriteria <- "MaxCosSim"
}
# ---- now determine which samples are likely to have rare signatures and how many
# after this section, all we need are:
# - whichSamplesMayHaveRareSigs
# - candidateRareSigs
# - candidateRareSigsCosSim
commonSigsOnlyCosSim <- list()
commonSigsOnlyError <- list()
whichSamplesMayHaveRareSigs <- c()
candidateRareSigs <- list()
candidateRareSigsCosSim <- list()
candidateRareSigsError <- list()
# if you set maxRareSigsPerSample to 0, you should get the common sig fits only
if(maxRareSigsPerSample > 0){
for (depth in 1:maxRareSigsPerSample) {
# depth <- 1
for(i in 1:ncol(catalogues)){
# i <- 1
# if this is depth==1 then I am going to test all samples
# if this is depth>1 then I am going to test only the samples that have depth-1 rare signatures
# i.e. I am just going to search for one additional signature
sampleName <- colnames(catalogues)[i]
currentCatalogue <- catalogues[,sampleName,drop=F]
ncandidatesAtPreviousDepth <- 0
candidatesAtPreviousDepth <- NULL
if(depth>1) {
# check if something was found at previous depth for this sample
candidatesAtPreviousDepth <- unlist(sapply(candidateRareSigs[[sampleName]],function(x){
if(length(strsplit(x,split = ":")[[1]])==(depth-1)){
return(x)
}else{
return(NULL)
}
},USE.NAMES = F))
if(!is.null(candidatesAtPreviousDepth)) ncandidatesAtPreviousDepth <- length(candidatesAtPreviousDepth)
}
if((depth==1 | ncandidatesAtPreviousDepth>0) & (ncol(rareSignatures)-depth>=0)){
# OK we are now GO for this sample to search for additional signatures
# we are searching either only once if depth is 1, or as many times as necessary if depth is >1
# also we know we have enough rare signatures to search up to this depth
nloop <- 1
if(depth>1) nloop <- ncandidatesAtPreviousDepth
for(loopi in 1:nloop){
# loopi <- 1
if(depth==1){
currentRareSigs <- NULL
commonSigsToUse <- commonSignatures
rareSigsToUse <- rareSignatures
}else if(depth>1){
currentRareSigs <- strsplit(candidatesAtPreviousDepth[loopi],split = ":")[[1]]
# add the current rare signatures to the common set and remove them from the rare set
commonSigsToUse <- cbind(commonSignatures,rareSignatures[,currentRareSigs,drop=F])
rareSigsToUse <- rareSignatures[,setdiff(colnames(rareSignatures),currentRareSigs),drop=F]
}
# now we search with the various methods
if(multiStepMode=="partialNMF" | multiStepMode=="constrainedFit"){
#
# compute residuals
E <- flexconstr_sigfit_multipleSamples(as.matrix(commonSigsToUse),as.matrix(currentCatalogue),allmut_tolratio = residualNegativeProp)
if(multiStepMode=="constrainedFit"){
R <- currentCatalogue - as.matrix(commonSigsToUse) %*% as.matrix(E)
}else if(multiStepMode=="partialNMF"){
R <- partialNMFresidual(currentCatalogue,commonSigsToUse,E,max.iter = 50)
}
# check for min residual
addSig <- TRUE
if(!is.null(minResidualMutations)) addSig <- sum(R)>=minResidualMutations
# now get which rare signatures are the most similar
positiveR <- R
positiveR[positiveR<0] <- 0
simMatrix <- computeCorrelationOfTwoSetsOfSigs(positiveR,rareSigsToUse)
# there may be more than one signature suitable
simMatrix <- unlist(simMatrix)
pos <- which(simMatrix>=minCosSimRareSig)
if(length(pos)!=0 & addSig){
if(depth==1){
candidateRareSigs[[colnames(catalogues)[i]]] <- colnames(rareSigsToUse)[pos]
candidateRareSigsCosSim[[colnames(catalogues)[i]]] <- simMatrix[pos]
names(candidateRareSigsCosSim[[colnames(catalogues)[i]]]) <- candidateRareSigs[[colnames(catalogues)[i]]]
whichSamplesMayHaveRareSigs <- c(whichSamplesMayHaveRareSigs,i)
}else if(depth>1){
candidateRareSigs[[colnames(catalogues)[i]]] <- c(candidateRareSigs[[colnames(catalogues)[i]]],paste(currentRareSigs,colnames(rareSigsToUse)[pos],sep=":"))
candidateRareSigsCosSim[[colnames(catalogues)[i]]] <- c(candidateRareSigsCosSim[[colnames(catalogues)[i]]],simMatrix[pos])
names(candidateRareSigsCosSim[[colnames(catalogues)[i]]]) <- candidateRareSigs[[colnames(catalogues)[i]]]
}
}
}else if(multiStepMode=="errorReduction" | multiStepMode=="cossimIncrease"){
# get first fit with all
quickFitCommon <- Fit(catalogues = currentCatalogue,
signatures = commonSigsToUse,
nboot = nboot,
exposureFilterType = "fixedThreshold",
threshold_percent = -1,
threshold_nmuts = -1,
threshold_p.value = threshold_p.value,
method = method,
useBootstrap = FALSE,
nparallel = nparallel,
randomSeed = randomSeed)
quickFit <- list()
for(j in 1:ncol(rareSigsToUse)){
# j <- 1
if(verbose) message("[info FitMS] multiStepMode ",multiStepMode,": depth ",depth,", sample ",i," of ",ncol(catalogues)," fitting rare signature ",j," of ",ncol(rareSigsToUse),"")
quickFit[[j]] <- Fit(catalogues = currentCatalogue,
signatures = cbind(commonSigsToUse,rareSigsToUse[,j,drop=F]),
nboot = nboot,
exposureFilterType = "fixedThreshold",
threshold_percent = -1,
threshold_nmuts = -1,
threshold_p.value = threshold_p.value,
method = method,
useBootstrap = FALSE,
nparallel = nparallel,
randomSeed = randomSeed)
}
commonError <- MAD(currentCatalogue,as.matrix(commonSigsToUse) %*% as.matrix(t(quickFitCommon$exposures[,1:(ncol(quickFitCommon$exposures)-1),drop=F])))
commonCosSim <- cos_sim(currentCatalogue,as.matrix(commonSigsToUse) %*% as.matrix(t(quickFitCommon$exposures[,1:(ncol(quickFitCommon$exposures)-1),drop=F])))
if(depth==1){
commonSigsOnlyCosSim[[colnames(catalogues)[i]]] <- commonCosSim
commonSigsOnlyError[[colnames(catalogues)[i]]] <- commonError
}
allError <- sapply(1:length(quickFit), function(j) {
MAD(currentCatalogue,as.matrix(cbind(commonSigsToUse,rareSigsToUse[,j,drop=F])) %*% as.matrix(t(quickFit[[j]]$exposures[,1:(ncol(quickFit[[j]]$exposures)-1),drop=F])))
})
allCosSim <- sapply(1:length(quickFit), function(j) {
cos_sim(currentCatalogue,as.matrix(cbind(commonSigsToUse,rareSigsToUse[,j,drop=F])) %*% as.matrix(t(quickFit[[j]]$exposures[,1:(ncol(quickFit[[j]]$exposures)-1),drop=F])))
})
# error
errorRed <- (commonError-allError)/commonError*100
cossimIncr <- allCosSim-commonCosSim
if(multiStepMode=="errorReduction"){
pos <- which(errorRed>=minErrorReductionPerc)
}else if(multiStepMode=="cossimIncrease"){
pos <- which(cossimIncr>=minCosSimIncrease)
}
if(length(pos)!=0){
if(depth==1){
candidateRareSigs[[colnames(catalogues)[i]]] <- colnames(rareSigsToUse)[pos]
candidateRareSigsCosSim[[colnames(catalogues)[i]]] <- allCosSim[pos]
candidateRareSigsError[[colnames(catalogues)[i]]] <- allError[pos]
names(candidateRareSigsCosSim[[colnames(catalogues)[i]]]) <- candidateRareSigs[[colnames(catalogues)[i]]]
names(candidateRareSigsError[[colnames(catalogues)[i]]]) <- candidateRareSigs[[colnames(catalogues)[i]]]
whichSamplesMayHaveRareSigs <- c(whichSamplesMayHaveRareSigs,i)
}else if(depth>1){
candidateRareSigs[[colnames(catalogues)[i]]] <- c(candidateRareSigs[[colnames(catalogues)[i]]],paste(currentRareSigs,colnames(rareSigsToUse)[pos],sep=":"))
candidateRareSigsCosSim[[colnames(catalogues)[i]]] <- c(candidateRareSigsCosSim[[colnames(catalogues)[i]]],allCosSim[pos])
candidateRareSigsError[[colnames(catalogues)[i]]] <- c(candidateRareSigsError[[colnames(catalogues)[i]]],allError[pos])
names(candidateRareSigsCosSim[[colnames(catalogues)[i]]]) <- candidateRareSigs[[colnames(catalogues)[i]]]
names(candidateRareSigsError[[colnames(catalogues)[i]]]) <- candidateRareSigs[[colnames(catalogues)[i]]]
}
}
}
# --- done with the methods
}
}
}
}
}
# at this point it would be good to have a remove duplicates step
allcomb <- function(x){
tmpx <- strsplit(x,split = ":")[[1]]
sapply(combinat::permn(tmpx),function(y) paste(y,collapse = ":"))
}
tmpcandidateRareSigs <- list()
tmpcandidateRareSigsCosSim <- list()
tmpcandidateRareSigsError <- list()
for (s in names(candidateRareSigs)){
# s <- names(candidateRareSigs)[1]
for (si in 1:length(candidateRareSigs[[s]])){
# si <- 1
if(!any(allcomb(candidateRareSigs[[s]][si]) %in% tmpcandidateRareSigs[[s]])){
tmpcandidateRareSigs[[s]] <- c(tmpcandidateRareSigs[[s]],candidateRareSigs[[s]][si])
tmpcandidateRareSigsCosSim[[s]] <- c(tmpcandidateRareSigsCosSim[[s]],candidateRareSigsCosSim[[s]][si])
tmpcandidateRareSigsError[[s]] <- c(tmpcandidateRareSigsError[[s]],candidateRareSigsError[[s]][si])
}
}
}
candidateRareSigs <- tmpcandidateRareSigs
candidateRareSigsCosSim <- tmpcandidateRareSigsCosSim
candidateRareSigsError <- tmpcandidateRareSigsError
samples <- list()
for(i in 1:ncol(catalogues)){
# i <- 1
sampleName <- colnames(catalogues)[i]
currentCatalogue <- catalogues[,sampleName,drop=F]
samples[[sampleName]] <- list()
samples[[sampleName]]$catalogue <- currentCatalogue
if(verbose) message("[info FitMS] fitting sample ",i," of ",ncol(catalogues),": ",sampleName)
# begin by fitting
samples[[sampleName]]$fitCommonOnly <- Fit(catalogues = currentCatalogue,
signatures = commonSignatures,
nboot = nboot,
exposureFilterType = exposureFilterType,
giniThresholdScaling = giniThresholdScaling,
giniThresholdScaling_nmuts = giniThresholdScaling_nmuts,
threshold_percent = threshold_percent,
threshold_nmuts = threshold_nmuts,
threshold_p.value = threshold_p.value,
method = method,
useBootstrap = useBootstrap,
nparallel = nparallel,
randomSeed = randomSeed)
# now check if I should try to fit some rare signatures as well
if(i %in% whichSamplesMayHaveRareSigs){
samples[[sampleName]]$fitWithRare <- list()
for (j in 1:length(candidateRareSigs[[sampleName]])){
# j <- 1
# consider the case in which >1 signatures are possible and separated by :
currentRareSigName <- strsplit(candidateRareSigs[[sampleName]][j],split = ":")[[1]]
currentRareSig <- rareSignatures[,currentRareSigName,drop=F]
samples[[sampleName]]$fitWithRare[[paste(currentRareSigName,collapse = ":")]] <- Fit(catalogues = currentCatalogue,
signatures = cbind(commonSignatures,currentRareSig),
nboot = nboot,
exposureFilterType = exposureFilterType,
giniThresholdScaling = giniThresholdScaling,
giniThresholdScaling_nmuts = giniThresholdScaling_nmuts,
threshold_percent = threshold_percent,
threshold_nmuts = threshold_nmuts,
threshold_p.value = threshold_p.value,
method = method,
useBootstrap = useBootstrap,
nparallel = nparallel,
randomSeed = randomSeed)
}
}
}
# collect all info
resObj <- list()
# add data and options used
resObj$fitAlgorithm <- "FitMS"
resObj$catalogues <- catalogues
resObj$organ <- organ
resObj$commonSignatures <- commonSignatures
resObj$rareSignatures <- rareSignatures
resObj$method <- method
resObj$exposureFilterType <- exposureFilterType
resObj$multiStepMode <- multiStepMode
resObj$giniThresholdScaling <- giniThresholdScaling
resObj$giniThresholdScaling_nmuts <- giniThresholdScaling_nmuts
if(exposureFilterType=="giniScaledThreshold"){
resObj$giniThresholdsTable <- getGiniThresholds(signatures = cbind(commonSignatures,rareSignatures),
giniThresholdScaling = giniThresholdScaling,
giniThresholdScaling_nmuts = giniThresholdScaling_nmuts)
}
resObj$minErrorReductionPerc <- minErrorReductionPerc
resObj$minCosSimIncrease <-minCosSimIncrease
resObj$threshold_percent <- threshold_percent
resObj$threshold_nmuts <- threshold_nmuts
resObj$residualNegativeProp <- residualNegativeProp
resObj$minResidualMutations <- minResidualMutations
resObj$minCosSimRareSig <- minCosSimRareSig
resObj$useBootstrap <- useBootstrap
resObj$nboot <- nboot
resObj$threshold_p.value <- threshold_p.value
resObj$commonSignatureTier <- commonSignatureTier
resObj$rareSignatureTier <- rareSignatureTier
resObj$maxRareSigsPerSample <- maxRareSigsPerSample
resObj$rareCandidateSelectionCriteria <- rareCandidateSelectionCriteria
if(!is.null(randomSeed)) resObj$randomSeed <- randomSeed
resObj$nparallel <- nparallel
# add all fit results and info
resObj$whichSamplesMayHaveRareSigs <- colnames(catalogues)[whichSamplesMayHaveRareSigs]
resObj$candidateRareSigs <- candidateRareSigs
resObj$candidateRareSigsCosSim <- candidateRareSigsCosSim
resObj$candidateRareSigsError <- candidateRareSigsError
resObj$commonSigsOnlyError <- commonSigsOnlyError
resObj$commonSigsOnlyCosSim <- commonSigsOnlyCosSim
resObj$samples <- samples
# compute fit merge
resObj <- fitMerge(resObj = resObj,
rareCandidateSelectionCriteria = rareCandidateSelectionCriteria)
return(resObj)
}
#' Signature Fit
#'
#' This function provides basic signature fit functionalities.
#' Fit a given set of mutational signatures into mutational catalogues to estimate
#' the activity/exposure of each of the given signatures in the catalogues.
#'
#' This is a standard interface to signature fit functions with/without bootstrap. The object returned by this
#' function can be passed to the plotFit() function for automated plotting of the results.
#'
#' A post fit exposure filter will reduce the false positive singature assignments by setting to zero exposure values that
#' are below a certain threshold. We provide two exposureFilterType methods: fixedThreshold and giniScaledThreshold. The
#' fixedThreshold method will set to zero exposures that are below a fixed threshold given as a percentage of the mutations
#' in a sample (parameter threshold_percent), while the method giniScaledThreshold will use a different threshold for each
#' signature, computed as (1-Gini(signature))*giniThresholdScaling, which will also be a percentage of the mutations in a sample.
#'
#' @param catalogues catalogues matrix, samples as columns, channels as rows
#' @param signatures mutational signatures to be fitted into the sample catalgues, signatures as columns and channels as rows
#' @param method KLD or NNLS
#' @param exposureFilterType use either fixedThreshold or giniScaledThreshold. When using fixedThreshold, exposures will be removed based on a fixed percentage with respect to the total number of mutations (threshold_percent will be used). When using giniScaledThreshold each signature will used a different threshold calculated as (1-Gini(signature))*giniThresholdScaling
#' @param threshold_percent threshold in percentage of total mutations in a sample, only exposures larger than threshold are considered. Set it to -1 to deactivate.
#' @param threshold_nmuts threshold in number of mutations in a sample, only exposures larger than threshold are considered.Set it to -1 to deactivate.
#' @param giniThresholdScaling scaling factor for the threshold type giniScaledThreshold, which is based on the Gini score of a signature. The threshold is computed as (1-Gini(signature))*giniThresholdScaling, and will be used as a percentage of mutations in a sample that the exposure of "signature" need to be larger than. Set it to -1 to deactivate.
#' @param giniThresholdScaling_nmuts scaling factor for the threshold type giniScaledThreshold, which is based on the Gini score of a signature. The threshold is computed as (1-Gini(signature))*giniThresholdScaling_nmuts, and will be used as number of mutations in a sample that the exposure of "signature" need to be larger than. Set to -1 to deactivate.
#' @param useBootstrap set to TRUE to use the signature fit with bootstrap method
#' @param nboot number of bootstraps to use, more bootstraps more accurate results (use only when useBootstrap=TRUE)
#' @param threshold_p.value p-value to determine whether an exposure is above the threshold_percent.
#' In other words, this is the empirical probability that the exposure is lower than the threshold (use only when useBootstrap=TRUE)
#' @param nparallel to use parallel specify >1
#' @param randomSeed set an integer random seed (use only when useBootstrap=TRUE)
#' @param verbose use FALSE to suppress messages
#' @return returns the activities/exposures of the signatures in the given sample and other information
#' @keywords mutational signatures fit
#' @export
#' @examples
#' res <- Fit(catalogues,getOrganSignatures("Breast"))
#' plotFit(res,"results/")
Fit <- function(catalogues,
signatures,
exposureFilterType = "fixedThreshold", # or "giniScaledThreshold"
giniThresholdScaling = 10,
giniThresholdScaling_nmuts = -1,
threshold_percent = 5,
threshold_nmuts = -1,
method = "KLD",
useBootstrap = FALSE,
nboot = 200,
threshold_p.value = 0.05,
nparallel = 1,
randomSeed = NULL,
verbose = FALSE){
# initialise return object
fitRes <- list()
fitRes$fitAlgorithm <- "Fit"
if(useBootstrap){
# compute the full results
resFitBoot <- SignatureFit_withBootstrap(cat = catalogues,
signature_data_matrix = signatures,
nboot = nboot,
exposureFilterType = exposureFilterType,
threshold_percent = threshold_percent,
threshold_nmuts = threshold_nmuts,
giniThresholdScaling = giniThresholdScaling,
giniThresholdScaling_nmuts = giniThresholdScaling_nmuts,
threshold_p.value = threshold_p.value,
method = method,
verbose = verbose,
doRound = FALSE,
nparallel = nparallel,
randomSeed = randomSeed,
showDeprecated = F)
fitRes$exposures <- resFitBoot$E_median_filtered
fitRes$unassigned_muts <- apply(catalogues,2,sum) - apply(fitRes$exposures,2,sum)
fitRes$unassigned_muts_perc <- fitRes$unassigned_muts/apply(catalogues,2,sum)*100
fitRes$bootstrap_exposures <- resFitBoot$boot_list
fitRes$bootstrap_exposures_samples <- resFitBoot$samples_list
fitRes$bootstrap_exposures_pvalues <- resFitBoot$E_p.values
fitRes$nboot <- nboot
fitRes$threshold_p.value <- threshold_p.value
# save also the bootstrap exposure correlations
if(ncol(signatures)>1){
fitRes$bootstrap_exposure_correlations <- list()
for(p in names(fitRes$bootstrap_exposures_samples)){
fitRes$bootstrap_exposure_correlations[[p]] <- suppressWarnings(cor(t(fitRes$bootstrap_exposures_samples[[p]]),method = "spearman"))
}
}
}else{
resFit <- SignatureFit(cat =catalogues,
signature_data_matrix = signatures,
method = method,
doRound = FALSE,
verbose = verbose,
showDeprecated = F)
if(exposureFilterType=="giniScaledThreshold"){
sigInvGini <- 1 - apply(signatures,2,giniCoeff)
giniThresholdPerc <- giniThresholdScaling*sigInvGini
giniThresholdNMUTS <- giniThresholdScaling_nmuts*sigInvGini
# set to zero differently for each signature
for(i in 1:length(giniThresholdPerc)) {
eselection <- rep(FALSE,ncol(resFit))
if(giniThresholdScaling >= 0) eselection <- eselection | resFit[i,]/apply(catalogues,2,sum)*100 <= giniThresholdPerc[i]
if(giniThresholdScaling_nmuts >= 0) eselection <- eselection | resFit[i,] <= giniThresholdNMUTS[i]
resFit[i,eselection] <- 0
}
}else if(exposureFilterType=="fixedThreshold"){
eselection <- matrix(FALSE,ncol = ncol(resFit),nrow = nrow(resFit),dimnames = list(rownames(resFit),colnames(resFit)))
if(threshold_percent >= 0) eselection <- eselection | resFit/matrix(apply(catalogues,2,sum),ncol = ncol(resFit),nrow = nrow(resFit),byrow = T)*100 <= threshold_percent
if(threshold_nmuts >= 0) eselection <- eselection | resFit <= threshold_nmuts
resFit[eselection] <- 0
}
fitRes$exposures <- resFit
fitRes$unassigned_muts <- apply(catalogues,2,sum) - apply(fitRes$exposures,2,sum)
fitRes$unassigned_muts_perc <- fitRes$unassigned_muts/apply(catalogues,2,sum)*100
fitRes$bootstrap_exposures <- NULL
fitRes$bootstrap_exposures_samples <- NULL
fitRes$bootstrap_exposures_pvalues <- NULL
fitRes$nboot <- NULL
fitRes$threshold_p.value <- NULL
}
fitRes$useBootstrap <- useBootstrap
fitRes$exposureFilterType <- exposureFilterType
fitRes$giniThresholdScaling <- giniThresholdScaling
fitRes$giniThresholdScaling_nmuts <- giniThresholdScaling_nmuts
if(exposureFilterType=="giniScaledThreshold") {
fitRes$giniThresholdsTable <- getGiniThresholds(signatures = signatures,
giniThresholdScaling = giniThresholdScaling,
giniThresholdScaling_nmuts = giniThresholdScaling_nmuts)
}
fitRes$threshold_percent <- threshold_percent
fitRes$threshold_nmuts <- threshold_nmuts
fitRes$catalogues <- catalogues
fitRes$signatures <- signatures
fitRes$method <- method
reconstructed <- as.matrix(signatures) %*% fitRes$exposures
cossim_catalogueVSreconstructed <- sapply(1:ncol(reconstructed),function(i) cos_sim(catalogues[,i],reconstructed[,i]))
names(cossim_catalogueVSreconstructed) <- colnames(catalogues)
fitRes$exposures <- t(rbind(fitRes$exposures,unassigned=fitRes$unassigned_muts))
fitRes$reconstructed_catalogues <- reconstructed
fitRes$cossim_catalogueVSreconstructed <- cossim_catalogueVSreconstructed
if(!is.null(randomSeed)) fitRes$randomSeed <- randomSeed
fitRes$nparallel <- nparallel
return(fitRes)
}
#' Selecting and merging signature fit results from Multi-Step signature fit
#'
#' This function is used to select candidate rare signature solutions from a result object obtained
#' using the multi-step signature fit FitMS function.
#'
#' When running FitMS, some samples may have multiple candidate rare signatures or rare signature combinations that
#' fit the sample reasonably well. This function selects the best rare signature for each sample based on
#' highest cosine similarity or lowest error (mean absolute difference of channels) and returns
#' an updated object with a summary exposures composed by each selected fit solution for each sample.
#'
#' The choice of rare signature can be changed using the parameter forceRareSigChoice.
#'
#' @param resObj result object obtained from the FitMS function
#' @param rareCandidateSelectionCriteria MaxCosSim or MinError. Whenever there is more than one rare signature that passes the multiStepMode criteria, then the best candidate rare signature is automatically selected using the rareCandidateSelectionCriteria. Candidate rare signatures can be manually selected using the function fitMerge. The parameter rareCandidateSelectionCriteria is set to MaxCosSim by default. Error is computed as the mean absolute deviation of channels.
#' @param forceRareSigChoice if NULL this function will select the rare signature candidate with the highest associated cosine similarity.
#' If no rare signature is found, then the solution with only the common signatures is selected.
#' To select specific candidates, specify them in the forceRareSigChoice list object, in the form forceRareSigChoice[["sample_name"]] <- "rareSigName".
#' To select the solution with only the common signatures for a sample use forceRareSigChoice[["sample_name"]] <- "common"
#' @return returns the updated resObj object with updated exposures and rareSigChoice objects.
#' If bootstrap was used, bootstraps of selected solutions can be found in the variable bootstrap_exposures_samples
#' @export
fitMerge <- function(resObj,
rareCandidateSelectionCriteria="MaxCosSim",
forceRareSigChoice=NULL){
# some checks
validSelectionCriteria <- c("MaxCosSim","MinError")
if(!resObj$rareCandidateSelectionCriteria %in% validSelectionCriteria){
message("[error fitMerge] rareCandidateSelectionCriteria ",rareCandidateSelectionCriteria," not valid, please select one of the following: ",paste(validSelectionCriteria,collapse = ", "),".")
return(resObj)
}
if(rareCandidateSelectionCriteria != resObj$rareCandidateSelectionCriteria){
message("[info fitMerge] changing rareCandidateSelectionCriteria from ",resObj$rareCandidateSelectionCriteria," to ",rareCandidateSelectionCriteria,", as requested.")
resObj$rareCandidateSelectionCriteria <- rareCandidateSelectionCriteria
}
if((resObj$multiStepMode=="partialNMF" | resObj$multiStepMode=="constrainedFit") & rareCandidateSelectionCriteria=="MinError"){
message("[warning FitMS] multiStepMode set to ",resObj$multiStepMode,": changing rareCandidateSelectionCriteria to MaxCosSim. ",
"The rareCandidateSelectionCriteria MinError is only used for multiStepMode errorReduction or cossimIncrease.")
rareCandidateSelectionCriteria <- "MaxCosSim"
resObj$rareCandidateSelectionCriteria <- rareCandidateSelectionCriteria
}
# build exposure matrix with all signatures in the columns
# will remove the signatures not present at the end
rareSigChoice <- list()
exposures_merge <- matrix(0,ncol = ncol(resObj$commonSignatures)+ncol(resObj$rareSignatures)+1,nrow = ncol(resObj$catalogues),
dimnames = list(colnames(resObj$catalogues),c(colnames(resObj$commonSignatures),colnames(resObj$rareSignatures),"unassigned")))
cossim_merge <- c()
reconstructed_merge <- data.frame(row.names = rownames(resObj$catalogues),
stringsAsFactors = F,
check.names = F)
bootstrap_exposures_samples <- list()
bootstrap_exposure_correlations <- list()
for (i in 1:ncol(resObj$catalogues)){
# i <- 1
currentSample <- colnames(resObj$catalogues)[i]
# check for attempts to force rare signature choice
forceRareSig <- NULL
forceCommon <- FALSE
if(!is.null(forceRareSigChoice[[currentSample]])){
if(forceRareSigChoice[[currentSample]]=="common"){
forceCommon <- TRUE
}else if(!is.null(resObj$samples[[currentSample]]$fitWithRare[[forceRareSigChoice[[currentSample]]]])){
forceRareSig <- forceRareSigChoice[[currentSample]]
}else{
message("[warning fitMerge] Attempt to force rare sig ",forceRareSigChoice[[currentSample]]," for sample ",currentSample," failed because the rare signature is not in the fitWithRare results for this sample.")
}
}
if(currentSample %in% resObj$whichSamplesMayHaveRareSigs & !forceCommon){
# select best candidate rare signature
highestCosSimSig <- resObj$candidateRareSigs[[currentSample]][which.max(resObj$candidateRareSigsCosSim[[currentSample]])]
lowestErrorSig <- resObj$candidateRareSigs[[currentSample]][which.min(resObj$candidateRareSigsError[[currentSample]])]
selectedSig <- highestCosSimSig
if(rareCandidateSelectionCriteria=="MinError") selectedSig <- lowestErrorSig
# check for forced selection
if(!is.null(forceRareSig)) selectedSig <- forceRareSig
rareSigChoice[[currentSample]] <- selectedSig
selectedExp <- t(resObj$samples[[currentSample]]$fitWithRare[[selectedSig]]$exposures)
exposures_merge[currentSample,rownames(selectedExp)] <- selectedExp
# collect bootstraps
if(resObj$useBootstrap) bootstrap_exposures_samples[[currentSample]] <- resObj$samples[[currentSample]]$fitWithRare[[selectedSig]]$bootstrap_exposures_samples[[1]]
if(resObj$useBootstrap & ncol(resObj$samples[[currentSample]]$fitWithRare[[selectedSig]]$signatures)>1){
bootstrap_exposure_correlations[[currentSample]] <- resObj$samples[[currentSample]]$fitWithRare[[selectedSig]]$bootstrap_exposure_correlations[[1]]
}
# collect cos sim
cossim_merge <- c(cossim_merge,resObj$samples[[currentSample]]$fitWithRare[[selectedSig]]$cossim_catalogueVSreconstructed)
# collect reconstructed
reconstructed_merge <- cbind(reconstructed_merge,resObj$samples[[currentSample]]$fitWithRare[[selectedSig]]$reconstructed_catalogues)
}else{
if(forceCommon) rareSigChoice[[currentSample]] <- "commonOnly"
selectedExp <- t(resObj$samples[[currentSample]]$fitCommonOnly$exposures)
exposures_merge[currentSample,rownames(selectedExp)] <- selectedExp
# collect bootstraps
if(resObj$useBootstrap) bootstrap_exposures_samples[[currentSample]] <- resObj$samples[[currentSample]]$fitCommonOnly$bootstrap_exposures_samples[[1]]
if(resObj$useBootstrap & ncol(resObj$samples[[currentSample]]$fitCommonOnly$signatures)>1){
bootstrap_exposure_correlations[[currentSample]] <- resObj$samples[[currentSample]]$fitCommonOnly$bootstrap_exposure_correlations[[1]]
}
# collect cos sim
cossim_merge <- c(cossim_merge,resObj$samples[[currentSample]]$fitCommonOnly$cossim_catalogueVSreconstructed)
# collect reconstructed
reconstructed_merge <- cbind(reconstructed_merge,resObj$samples[[currentSample]]$fitCommonOnly$reconstructed_catalogues)
}
}
resObj$exposures <- exposures_merge[,apply(exposures_merge, 2, sum)>0,drop=F]
resObj$rareSigChoice <- rareSigChoice
if(resObj$useBootstrap){
# save the bootstraps organised by samples
resObj$bootstrap_exposures_samples <- bootstrap_exposures_samples
# also save the bootstraps organised by bootstrap
nboots <- ncol(bootstrap_exposures_samples[[1]])
sample_names <- names(bootstrap_exposures_samples)
sigslist <- c()
for(sn in sample_names) sigslist <- union(sigslist,rownames(bootstrap_exposures_samples[[sn]]))
bootstrap_exposures <- list()
for(i in 1:nboots){
current_subs <- matrix(0,nrow = length(sigslist),ncol = length(sample_names),dimnames = list(sigslist,sample_names))
for(sn in sample_names) {
current_subs[rownames(bootstrap_exposures_samples[[sn]]),sn] <- bootstrap_exposures_samples[[sn]][,i]
}
bootstrap_exposures[[i]] <- current_subs
}
resObj$bootstrap_exposures <- bootstrap_exposures
resObj$bootstrap_exposure_correlations <- bootstrap_exposure_correlations
}else{
resObj$bootstrap_exposures_samples <- NULL
}
resObj$cossim_catalogueVSreconstructed <- cossim_merge
resObj$reconstructed_catalogues <- reconstructed_merge
return(resObj)
}
#' getSignaturesForFitting
#'
#' Given a tissue type/organ, this function returns the common and rare mutational signatures
#' to be used with FitMS, as suggested in Degasperi et al. 2022, Science paper.
#'
#' @param organ organs available are: "Biliary", "Bladder", "Bone_SoftTissue", "Breast", "CNS", "Colorectal", "Esophagus", "Head_neck", "Kidney", "Liver", "Lung", "Lymphoid", "Myeloid", "NET", "Oral_Oropharyngeal", "Ovary", "Pancreas", "Prostate", "Skin", "Stomach", "Uterus"
#' @param typemut only subs and DNV supported at the moment
#' @param commontier is either T1, T2 or T3. The default option is T1. For each organ, T1 indicates to use the common organ-specific signatures, while T2 indicates to use the corresponding reference signatures. In general, T1 should be more appropriate for organs where there are no mixed organ-specific signatures, e.g. GEL-Ovary_common_SBS1+18, while T2 might be more suitable for when such mixed signatures are present, so that each signature can be fitted, e.g. fitting the two signatures SBS1 and SBS18, instead of a single GEL-Ovary_common_SBS1+18. T3 is an intermediate option between T1 and T2, where only the mixed organ signatures are replaced with the corresponding reference signatures.
#' @param raretier is either T0, T1, T2, T3 or T4. The default option is T2. For each organ, T0 are rare signatures that were observed in the requested organ, including low quality signatures (QC amber and red signatures). T1 are high quality (QC green) rare signatures that were observed in the requested organ. T2-T4 signatures extend the rare signatures set to what has been observed also in other organs. T2 includes all QC green signatures found in other organs, with the additional restriction in the case of SBS that the additional signatures were classified as rare at least twice in Degasperi et al. 2022 Science. T3 includes all QC green signatures (if not SBS, T3=T2). T4 includes all signatures including QC amber and red. In general we advise to use the rare T2 tier.
#' @return list of signatures matrix
#' @export
getSignaturesForFitting <- function(organ,typemut="subs",commontier="T1",raretier="T2",verbose = TRUE){
sigs <- NULL
commonChoice <- paste0("common",commontier)
rareChoice <- paste0("rare",raretier)
signames_common <- NULL
signames_rare <- NULL
sigsForFittingTable <- NULL
allOrganAndReferenceSignatures <- NULL
if(typemut=="subs"){
sigsForFittingTable <- sigsForFittingSBSv2.03
allOrganAndReferenceSignatures <- cbind(organSignaturesSBSv2.03,referenceSignaturesSBSv2.03)
}else if(typemut=="DNV"){
sigsForFittingTable <- sigsForFittingDBSv1.01
allOrganAndReferenceSignatures <- cbind(organSignaturesDBSv1.01,referenceSignaturesDBSv1.01)
}else{
message("[error getSignaturesForFitting] Common/Rare signatures for fitting not available for mutation type \"",typemut, "\".")
return(NULL)
}
# OK get the signatures
organsAvail <- rownames(sigsForFittingTable)
if(organ %in% organsAvail){
sigs <- list()
if(commonChoice %in% colnames(sigsForFittingTable)) signames_common <- strsplit(sigsForFittingTable[organ,commonChoice],split = ",")[[1]]
if(commonChoice %in% colnames(sigsForFittingTable)) signames_rare <- strsplit(sigsForFittingTable[organ,rareChoice],split = ",")[[1]]
sigs[["common"]] <- allOrganAndReferenceSignatures[,signames_common,drop=F]
sigs[["rare"]] <- allOrganAndReferenceSignatures[,signames_rare,drop=F]
}else{
message("[error getSignaturesForFitting] Common/Rare signatures for fitting not available for mutation type ",typemut, ", organ ",
organ, ". Organ requested not available. Please select one of: ",paste(organsAvail,collapse = ", "))
return(NULL)
}
if(is.null(sigs)){
message("[error getSignaturesForFitting] Signatures not available for mutation type ",typemut, ", organ ",organ,", common tier ",commontier,", rare tier ",raretier, ".")
return(NULL)
}else{
if(ncol(sigs$common)==0 & verbose) message("[warning getSignaturesForFitting] Common signatures not available for mutation type ",typemut, ", organ ",organ,", common tier ",commontier,".")
if(ncol(sigs$rare)==0 & verbose) message("[warning getSignaturesForFitting] Rare signatures not available for mutation type ",typemut, ", organ ",organ,", rare tier ",raretier,".")
}
return(sigs)
}
flexconstr_sigfit <- function(P,m, mut_tol=0, allmut_tolratio=0.01){
G <- -P
H <- -m
# test to flexibilise the constraints on the residual part
allmut <-sum(m)
H<- -m-mut_tol*m-allmut_tolratio*allmut
# G<-NULL
# H<-NULL
# add positivity requirement for exposures e >=0
if (ncol(P)>1){
coef_nn_e<-diag(1,ncol(P))
rside_nn_e <- rep(0,ncol(P))
# summarising both constraints into matrices G and H
G <- rbind(G,coef_nn_e)
H <- c(H,rside_nn_e)
}else{
G <- c(G,1)
H <- c(H,0)
}
# solving least squares on (Pe-m)^2 with the constraints (-Pe>=-m-mut_tol-allmut_tolratio*) and (e >=0) => Gx>=h
e <- limSolve::lsei(A = P, B= m, G = G,
H = H, E=NULL, F=NULL, Wx = NULL, Wa = NULL, type = 2, tol = sqrt(.Machine$double.eps),
tolrank = NULL, fulloutput = TRUE, verbose = TRUE)
return(e)
}
flexconstr_sigfit_multipleSamples <- function(P,M, mut_tol=0, allmut_tolratio=0.01){
E <- list()
for(i in 1:ncol(M)){
m <- M[,i,drop=F]
e <- flexconstr_sigfit(P,m, mut_tol, allmut_tolratio)
E[[colnames(M)[i]]] <- as.matrix(e$X)
}
E <- do.call(cbind,E)
colnames(E) <- colnames(M)
return(E)
}
partialNMFresidual <- function(catalogues,commonSignatures,E,max.iter = 100){
Rlist <- list()
for (i in 1:ncol(E)){
# i <- 1
res.nnmf <- NNLM::nnmf(as.matrix(catalogues[,i,drop=F]),
init = list(W0 = as.matrix(commonSignatures),H1 = as.matrix(E[,i,drop=F])),
loss = "mkl",
method = "lee",
k = 1,
max.iter = max.iter,check.k = FALSE,show.warning = F,verbose = F)
currentR <- apply(res.nnmf$W[,1,drop=F],2,function(x)x/sum(x))
colnames(currentR) <- colnames(catalogues)[i]
Rlist[[i]] <- currentR
}
return(do.call(cbind,Rlist))
}
#' @export
giniCoeff <- function(x){
meanx <- mean(x)
sumxdiff <- 0
for (i in x) {
for (j in x){
sumxdiff <- sumxdiff + abs(i-j)
}
}
return((sumxdiff)/(2*(length(x)^2)*meanx))
}
#' Get Gini Thresholds
#'
#' Given a matrix of mutational signatures and Gini threshold scaling factors,
#' return the value of the thresholds.
#'
#' @param signatures matrix of mutational signatures, with signatures as columns and channels as rows
#' @param giniThresholdScaling scaling factor for relative threshold (percentage of mutations)
#' @param giniThresholdScaling_nmuts scaling factor for absolute threshold (number of mutations)
#' @export
getGiniThresholds <- function(signatures,
giniThresholdScaling,
giniThresholdScaling_nmuts,
verbose=TRUE){
# check if there is anything to do
nthresholds <- sum(c(giniThresholdScaling,giniThresholdScaling_nmuts)>=0)
if(nthresholds==0){
if(verbose) message("[warning getGiniThresholds] no threshold scaling factor is >= zero (all appear disabled), nothing to return.")
return(NULL)
}
# get 1 - Gini values
invGini <- 1 - apply(signatures,2,giniCoeff)
# setup return obj
giniThresholds <- NULL
if(giniThresholdScaling >= 0){
giniThresholds <- rbind(giniThresholds,t(data.frame(`giniThreshold.percent` = invGini*giniThresholdScaling,
check.rows = F,check.names = F,stringsAsFactors = F)))
}
if(giniThresholdScaling_nmuts >= 0){
giniThresholds <- rbind(giniThresholds,t(data.frame(`giniThreshold.nmuts` = invGini*giniThresholdScaling_nmuts,
check.rows = F,check.names = F,stringsAsFactors = F)))
}
return(t(giniThresholds))
}
MAD <- function(a,b){
mean(abs(as.matrix(a)-as.matrix(b)))
}
NMAD <- function(a,b){
mean(abs(as.matrix(a)-as.matrix(b)))/sum(as.matrix(a))
}
# dataMatrix can be either a catalogue or signature matrix
getTypeOfMutationsFromChannels <- function(dataMatrix){
SBSchannels <- c("A[C>A]A","A[C>A]C","A[C>A]G","A[C>A]T","C[C>A]A","C[C>A]C","C[C>A]G","C[C>A]T","G[C>A]A","G[C>A]C","G[C>A]G","G[C>A]T","T[C>A]A","T[C>A]C","T[C>A]G","T[C>A]T",
"A[C>G]A","A[C>G]C","A[C>G]G","A[C>G]T","C[C>G]A","C[C>G]C","C[C>G]G","C[C>G]T","G[C>G]A","G[C>G]C","G[C>G]G","G[C>G]T","T[C>G]A","T[C>G]C","T[C>G]G","T[C>G]T",
"A[C>T]A","A[C>T]C","A[C>T]G","A[C>T]T","C[C>T]A","C[C>T]C","C[C>T]G","C[C>T]T","G[C>T]A","G[C>T]C","G[C>T]G","G[C>T]T","T[C>T]A","T[C>T]C","T[C>T]G","T[C>T]T",
"A[T>A]A","A[T>A]C","A[T>A]G","A[T>A]T","C[T>A]A","C[T>A]C","C[T>A]G","C[T>A]T","G[T>A]A","G[T>A]C","G[T>A]G","G[T>A]T","T[T>A]A","T[T>A]C","T[T>A]G","T[T>A]T",
"A[T>C]A","A[T>C]C","A[T>C]G","A[T>C]T","C[T>C]A","C[T>C]C","C[T>C]G","C[T>C]T","G[T>C]A","G[T>C]C","G[T>C]G","G[T>C]T","T[T>C]A","T[T>C]C","T[T>C]G","T[T>C]T",
"A[T>G]A","A[T>G]C","A[T>G]G","A[T>G]T","C[T>G]A","C[T>G]C","C[T>G]G","C[T>G]T","G[T>G]A","G[T>G]C","G[T>G]G","G[T>G]T","T[T>G]A","T[T>G]C","T[T>G]G","T[T>G]T")
DBSchannelsZou <- c("AA>CC","AA>CG","AA>CT","AA>GC","AA>GG","AA>GT","AA>TC","AA>TG","AA>TT",
"AC>CA","AC>CG","AC>CT","AC>GA","AC>GG","AC>GT","AC>TA","AC>TG","AC>TT",
"AG>CA","AG>CC","AG>CT","AG>GA","AG>GC","AG>GT","AG>TA","AG>TC","AG>TT",
"AT>CA","AT>CC","AT>CG","AT>GA","AT>GC","AT>TA",
"CA>AC","CA>AG","CA>AT","CA>GC","CA>GG","CA>GT","CA>TC","CA>TG","CA>TT",
"CC>AA","CC>AG","CC>AT","CC>GA","CC>GG","CC>GT","CC>TA","CC>TG","CC>TT",
"CG>AA","CG>AC","CG>AT","CG>GA","CG>GC","CG>TA",
"GA>AC","GA>AG","GA>AT","GA>CC","GA>CG","GA>CT","GA>TC","GA>TG","GA>TT",
"GC>AA","GC>AG","GC>AT","GC>CA","GC>CG","GC>TA",
"TA>AC","TA>AG","TA>AT","TA>CC","TA>CG","TA>GC")
DBSchannelsAlexandrov <- c("AC>CA","AC>CG","AC>CT","AC>GA","AC>GG","AC>GT","AC>TA","AC>TG","AC>TT",
"AT>CA","AT>CC","AT>CG","AT>GA","AT>GC","AT>TA",
"CC>AA","CC>AG","CC>AT","CC>GA","CC>GG","CC>GT","CC>TA","CC>TG","CC>TT",
"CG>AT","CG>GC","CG>GT","CG>TA","CG>TC","CG>TT",
"CT>AA","CT>AC","CT>AG","CT>GA","CT>GC","CT>GG","CT>TA","CT>TC","CT>TG",
"GC>AA","GC>AG","GC>AT","GC>CA","GC>CG","GC>TA",
"TA>AT","TA>CG","TA>CT","TA>GC","TA>GG","TA>GT",
"TC>AA","TC>AG","TC>AT","TC>CA","TC>CG","TC>CT","TC>GA","TC>GG","TC>GT",
"TG>AA","TG>AC","TG>AT","TG>CA","TG>CC","TG>CT","TG>GA","TG>GC","TG>GT",
"TT>AA","TT>AC","TT>AG","TT>CA","TT>CC","TT>CG","TT>GA","TT>GC","TT>GG")
SVchannels <- c('clustered_del_1-10Kb', 'clustered_del_10-100Kb', 'clustered_del_100Kb-1Mb', 'clustered_del_1Mb-10Mb', 'clustered_del_>10Mb',
'clustered_tds_1-10Kb', 'clustered_tds_10-100Kb', 'clustered_tds_100Kb-1Mb', 'clustered_tds_1Mb-10Mb', 'clustered_tds_>10Mb',
'clustered_inv_1-10Kb', 'clustered_inv_10-100Kb', 'clustered_inv_100Kb-1Mb', 'clustered_inv_1Mb-10Mb', 'clustered_inv_>10Mb',
'clustered_trans',
'non-clustered_del_1-10Kb', 'non-clustered_del_10-100Kb', 'non-clustered_del_100Kb-1Mb', 'non-clustered_del_1Mb-10Mb', 'non-clustered_del_>10Mb',
'non-clustered_tds_1-10Kb', 'non-clustered_tds_10-100Kb', 'non-clustered_tds_100Kb-1Mb', 'non-clustered_tds_1Mb-10Mb', 'non-clustered_tds_>10Mb',
'non-clustered_inv_1-10Kb', 'non-clustered_inv_10-100Kb', 'non-clustered_inv_100Kb-1Mb', 'non-clustered_inv_1Mb-10Mb', 'non-clustered_inv_>10Mb',
'non-clustered_trans')
muttype <- "generic"
if(nrow(dataMatrix)==length(SBSchannels)){
if(all(rownames(dataMatrix)==SBSchannels)) muttype <- "subs"
}else if(nrow(dataMatrix)==length(DBSchannelsZou)){
if(all(rownames(dataMatrix)==DBSchannelsZou)) {
muttype <- "DNV"
}else if(all(rownames(dataMatrix)==DBSchannelsAlexandrov)){
muttype <- "DNV"
}
}else if(nrow(dataMatrix)==length(SVchannels)){
if(all(rownames(dataMatrix)==SVchannels)) muttype <- "rearr"
}
return(muttype)
}
# plotSignatures wrapper function, it will determine the type of signatures and plot using the appropriate function
#' Plot Signatures with automated detection of type of mutations
#'
#' This function checks the channels of the input matrix, determines the type of mutations and plots using the
#' most appropriate signature plot.
#'
#' @param signature_data_matrix matrix of signatures, signatures as columns and channels as rows
#' @param output_file set output file, should end with ".jpg", "png" or ".pdf". If output_file==null, output will not be to a file, but will still run the plot functions. The option output_file==null can be used to add this plot to a larger output file.
#' @param plot_sum whether the sum of the channels should be plotted. If plotting signatures this should be FALSE, but if plotting sample catalogues, this can be set to TRUE to display the number of mutations in each sample.
#' @param overall_title set the overall title of the plot
#' @param mar set the option par(mar=mar)
#' @param howManyInOnePage how many signatures or catalogues should be plotted on one page. Multiple pages are plotted if more signatures/catalogues to plot have been requested
#' @param ncolumns how many columns should be used to arrange the signatures/catalogues to plot
#' @export
plotSignatures <- function(signature_data_matrix,
output_file = NULL,
plot_sum = TRUE,
overall_title = "",
add_to_titles = NULL,
mar=NULL,
howManyInOnePage=100,
ncolumns=3){
if(ncol(signature_data_matrix)<3) ncolumns <- ncol(signature_data_matrix)
# identify the type of mutations
typeofmuts <- getTypeOfMutationsFromChannels(signature_data_matrix)
if(typeofmuts=="subs"){
plotSubsSignatures(signature_data_matrix = signature_data_matrix,
output_file = output_file,
plot_sum = plot_sum,
overall_title = overall_title,
add_to_titles = add_to_titles,
mar = mar,
howManyInOnePage = howManyInOnePage,
ncolumns = ncolumns)
}else if(typeofmuts=="rearr"){
plotRearrSignatures(signature_data_matrix = signature_data_matrix,
output_file = output_file,
plot_sum = plot_sum,
overall_title = overall_title,
add_to_titles = add_to_titles,
mar = mar,
howManyInOnePage = howManyInOnePage,
ncolumns = ncolumns)
}else if(typeofmuts=="DNV"){
plotDNVSignatures(signature_data_matrix = signature_data_matrix,
output_file = output_file,
plot_sum = plot_sum,
overall_title = overall_title,
add_to_titles = add_to_titles,
mar = mar,
howManyInOnePage = howManyInOnePage,
ncolumns = ncolumns)
}else{
plotGenericSignatures(signature_data_matrix = signature_data_matrix,
output_file = output_file,
plot_sum = plot_sum,
overall_title = overall_title,
add_to_titles = add_to_titles,
mar = mar,
howManyInOnePage = howManyInOnePage,
ncolumns = ncolumns)
}
}
drawCircle <- function(radius=1,
position=c(0,0),
col=NA, #this is a polygon parameter
border=NULL){ #this is a polygon parameter
displCoor <- position
arcLines <- getArcCoordinates(0,2*pi,radius = radius,nSegmentsForEachRadiant = 20)
polygon(displCoor[1]+c(arcLines$x),
displCoor[2]+c(arcLines$y),
col = col,
border = border)
}
polarToCartesian <- function(angle,
radius=1){
c(radius*sin(angle),radius*cos(angle))
}
getArcCoordinates <- function(fromAngle,
toAngle,
radius=1,
nSegmentsForEachRadiant=100){
if(fromAngle>toAngle) toAngle <- toAngle + 2*pi
angleToTravel <- toAngle - fromAngle
nsegments <- ceiling(angleToTravel*nSegmentsForEachRadiant)
angles <- seq(fromAngle,toAngle,length.out = nsegments)
resPos <- sapply(angles, function(x){
polarToCartesian(x,radius)
})
resList <- list()
resList$x <- resPos[1,]
resList$y <- resPos[2,]
return(resList)
}
#' Plot Matrix
#'
#' This function plots a matrix of values. Data is visualised as circles scaled with respect to the
#' largest value in the matrix, with the actual number shown. If thresholdMark is specified, then a
#' different colour is used for entries that are equal or above the threshold.
#'
#' @param dataMatrix a data matrix or data frame
#' @param output_file if an output file name is given (must be pdf), the matrix will be plotted to file
#' @param thresholdMark threshold for using a different colour to highlight entries above a threshold
#' @param ndigitsafterzero specify how many digits after the zero should be used to show the actual numbers
#' @param cex.numbers scale the text used for the numbers in the matrix
#' @param circlesColBasic colour used for the circles
#' @param circlesColHighlight colour used for the circles that pass the thresholdMark
#' @param sideVector optional vector that should be of the same length as the number of columns of dataMatrix.
#' Leave NULL to omit. If specified, the values will be plotted on the left, leaving a gap from the main plotted
#' matrix and using sideVectorLabel as the label for this column, as well as the colours in sideVectorColours
#' @param sideVectorLabel label to be used when sideVector is specified
#' @param sideVectorColours colours to be used for the sideVector circles. This vector should be one element
#' longer than sideVectorThresholds
#' @param sideVectorThresholds boundaries for using the sideVectorColours according to the values of sideVector.
#' Values need to be increasing. For example, values of sideVector lower than sideVectorThresholds[1] will be
#' coloured sideVectorColours[1], and values greater than sideVectorThresholds[length(sideVectorThresholds)]
#' will be coloured sideVectorColours[length(sideVectorThresholds)+1]
#' @param sideVectorNdigitsafterzero similar to ndigitsafterzero, but only affects sideVector values
#' @export
plotMatrix <- function(dataMatrix,
output_file = NULL,
thresholdMark = NULL,
ndigitsafterzero = 2,
cex.numbers = 0.7,
title = NULL,
circlesColBasic = "#A1CAF1",
circlesColHighlight = "#F6A600",
sideVector = NULL,
sideVectorLabel = NULL,
sideVectorColours = c("#cf0000","#F38400","#8DB600"),
sideVectorThresholds = c(0.8,0.9),
sideVectorNdigitsafterzero = 2){
useSideVector <- FALSE
if(!is.null(sideVector)){
if(!(length(sideVector)==ncol(dataMatrix))){
message("[warning plotMatrix] sideVector length does not match dataMatrix number of columns. Ignoring sideVector and moving on.")
sideVector <- NULL
}else{
useSideVector <- TRUE
sideVector[is.na(sideVector)] <- 0
}
}
maxncharSigs <- max(sapply(rownames(dataMatrix),nchar))
if(useSideVector){
if(!is.null(sideVectorLabel)){
maxncharSigs <- max(maxncharSigs,nchar(sideVectorLabel))
}
}
maxncharSamples <- max(sapply(colnames(dataMatrix),nchar))
mar1 <- 0.6*maxncharSigs+1.2
mar2 <- 0.6*maxncharSamples+1.2
mar3 <- 2
mar4 <- 2
width <- 0.5 + 0.3*(nrow(dataMatrix) + ifelse(useSideVector,2,0)) + 0.125*maxncharSamples
height <- 0.5 + 0.3*ncol(dataMatrix) + 0.125*maxncharSigs
if(!is.null(output_file)) cairo_pdf(filename = output_file,width = width,height = height)
par(mfrow=c(1,1))
par(mar=c(mar1,mar2,mar3,mar4))
plot(1, type="n", xlab="", ylab="", xlim=c(ifelse(useSideVector,-1.5,0.5),nrow(dataMatrix)+0.5), ylim=c(ncol(dataMatrix)+0.5,0.5),
xaxt = 'n', yaxt = 'n',bty = 'n',xaxs="i",yaxs="i")
abline(h=1:ncol(dataMatrix),lty=3,col="lightgrey",lwd=3)
abline(v=1:nrow(dataMatrix),lty=3,col="lightgrey",lwd=3)
axis(1,labels = rownames(dataMatrix),at = 1:nrow(dataMatrix),las=2,lwd = 0,lwd.ticks = 1,cex.axis = 1)
axis(2,labels = colnames(dataMatrix),at = 1:ncol(dataMatrix),las=2,lwd = 0,lwd.ticks = 1,cex.axis = 1)
toPlot <- dataMatrix
for(i in 1:ncol(dataMatrix)) toPlot[,i] <- sprintf(paste0("%.",ndigitsafterzero,"f"),dataMatrix[,i])
toPlot[toPlot=="0" | toPlot=="-0" | toPlot==paste0("0.",paste(rep("0",ndigitsafterzero),collapse = ""))] <- ""
if(all(is.na(dataMatrix))){
circleDim <- dataMatrix
circleDim[,] <- 0
}else{
circleDim <- dataMatrix/max(dataMatrix,na.rm = T)*5
}
for(i in 1:ncol(circleDim)) {
for(j in 1:nrow(circleDim)){
if(!is.na(dataMatrix[j,i])){
usecol <- circlesColBasic
if(!is.null(thresholdMark)) {
if(thresholdMark <= dataMatrix[j,i]) usecol <- circlesColHighlight
}
drawCircle(radius = circleDim[j,i]/10,position = c(j,i),col = usecol,border = NA)
}
}
}
for(i in 1:ncol(toPlot)) text(y = rep(i,nrow(toPlot)), x = 1:nrow(toPlot),labels = toPlot[,i],cex = cex.numbers)
if(useSideVector){
getColour <- function(val,coloursScale,boundariesScale){
found <- FALSE
z <- 1
if(val<boundariesScale[z]) found <- TRUE
while (!found & z<length(boundariesScale)) {
if(val >= boundariesScale[z] & val < boundariesScale[z+1]){
found <- TRUE
}
z <- z + 1
}
if(val>=boundariesScale[length(boundariesScale)]) z <- length(boundariesScale) + 1
return(coloursScale[z])
}
abline(v=-1,lty=3,col="lightgrey",lwd=3)
if(!is.null(sideVectorLabel)) axis(1,labels = sideVectorLabel,at = -1,las=2,lwd = 0,lwd.ticks = 1,cex.axis = 1)
circleDimSideVector <- sideVector/max(sideVector)*5
for(i in 1:length(circleDimSideVector)){
drawCircle(radius = circleDimSideVector[i]/10,
position = c(-1,i),
col = getColour(val = sideVector[i],
coloursScale = sideVectorColours,
boundariesScale = sideVectorThresholds),
border = NA)
}
sideVectorText <- sprintf(paste0("%.",sideVectorNdigitsafterzero,"f"),sideVector)
text(y = 1:length(sideVector), x = -1,labels = sideVectorText,cex = cex.numbers)
}
if(!is.null(title)) title(main = title)
if(!is.null(output_file)) dev.off()
}
#' Plot the results from the Fit or FitMS function
#'
#' Plotting of the results obtained with the Fit or FitMS function. Output adapts based on the options used in during fitting.
#' You can use this plot function instead of using plotFit or plotFitMS. The function plotFitResults will infer whether Fit or FitMS
#' was used and use the appropriate plot function.
#'
#' @param fitObj object obtained from the Fit or FitMS function
#' @param outdir output directory where the results should be saved/plotted
#' @export
#' @examples
#' res <- Fit(catalogues,getOrganSignatures("Breast"))
#' plotFitResults(res,"results/")
plotFitResults <- function(fitObj,
outdir = ""){
if(!is.null(fitObj$fitAlgorithm)){
if(fitObj$fitAlgorithm=="Fit"){
plotFit(fitObj = fitObj,
outdir = outdir)
}else if(fitObj$fitAlgorithm=="FitMS"){
plotFitMS(fitMSobj = fitObj,
outdir = outdir)
}else{
message("[error plotFitResults] unknown fitAlgorithm attribute, expecting Fit or FitMS. Input fitObj not recognised. ")
}
}else{
message("[error plotFitResults] missing fitAlgorithm attribute, fitObj not recognised.")
}
}
#' Plot the results from the Fit function
#'
#' Plotting of the results obtained with the Fit function. Output adapts based on the options used in the Fit function
#'
#' @param fitObj object obtained from the Fit function
#' @param outdir output directory where the results should be saved/plotted
#' @param samplesInSubdir if TRUE, move the sample files to a subdirectory named "samples"
#' @export
#' @examples
#' res <- Fit(catalogues,getOrganSignatures("Breast"))
#' plotFit(res,"results/")
plotFit <- function(fitObj,
outdir = "",
samplesInSubdir = TRUE){
# some checks on outdir
if(is.null(outdir)) {
message("[error plotFit] please specify outdir")
return(NULL)
}
if(outdir != ""){
if(substr(outdir,nchar(outdir),nchar(outdir)) != "/") outdir <- paste0(outdir,"/")
}
# now let's plot
if(outdir != "") dir.create(outdir,showWarnings = F,recursive = T)
#function to draw a legend for the heatmap of the correlation matrix
draw_legend <- function(col,xl,xr,yb,yt,textx){
par(xpd=TRUE)
rect(xl,yb,xr,yt)
rect(
xl,
head(seq(yb,yt,(yt-yb)/length(col)),-1),
xr,
tail(seq(yb,yt,(yt-yb)/length(col)),-1),
col=col,border = NA
)
text(x = textx, y = yt,labels = "1")
text(x = textx, y = (yt-yb)/2,labels = "0")
text(x = textx, y = yb,labels = "-1")
par(xpd=FALSE)
}
# build the info table
infoTable <- NULL
infoTable <- rbind(infoTable,data.frame(parameter="fitAlgorithm",value=fitObj$fitAlgorithm,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="nparallel",value=fitObj$nparallel,stringsAsFactors = F))
if(!is.null(fitObj$randomSeed)){
infoTable <- rbind(infoTable,data.frame(parameter="randomSeed",value=fitObj$randomSeed,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="randomSeed",value="NULL",stringsAsFactors = F))
}
infoTable <- rbind(infoTable,data.frame(parameter="nsamples",value=ncol(fitObj$catalogues),stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="nsignatures",value=ncol(fitObj$signatures),stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="nchannels",value=nrow(fitObj$catalogues),stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="method",value=fitObj$method,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="exposureFilterType",value=fitObj$exposureFilterType,stringsAsFactors = F))
if(!is.null(fitObj$threshold_percent)){
infoTable <- rbind(infoTable,data.frame(parameter="threshold_percent",value=fitObj$threshold_percent,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="threshold_percent",value="NULL",stringsAsFactors = F))
}
if(!is.null(fitObj$threshold_nmuts)){
infoTable <- rbind(infoTable,data.frame(parameter="threshold_nmuts",value=fitObj$threshold_nmuts,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="threshold_nmuts",value="NULL",stringsAsFactors = F))
}
if(!is.null(fitObj$giniThresholdScaling)){
infoTable <- rbind(infoTable,data.frame(parameter="giniThresholdScaling",value=fitObj$giniThresholdScaling,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="giniThresholdScaling",value="NULL",stringsAsFactors = F))
}
if(!is.null(fitObj$giniThresholdScaling_nmuts)){
infoTable <- rbind(infoTable,data.frame(parameter="giniThresholdScaling_nmuts",value=fitObj$giniThresholdScaling_nmuts,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="giniThresholdScaling_nmuts",value="NULL",stringsAsFactors = F))
}
infoTable <- rbind(infoTable,data.frame(parameter="useBootstrap",value=ifelse(fitObj$useBootstrap,"TRUE","FALSE"),stringsAsFactors = F))
if(!is.null(fitObj$nboot)){
infoTable <- rbind(infoTable,data.frame(parameter="nboot",value=fitObj$nboot,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="nboot",value="NULL",stringsAsFactors = F))
}
if(!is.null(fitObj$threshold_p.value)){
infoTable <- rbind(infoTable,data.frame(parameter="threshold_p.value",value=fitObj$threshold_p.value,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="threshold_p.value",value="NULL",stringsAsFactors = F))
}
# save it
writeTable(t = infoTable,file = paste0(outdir,"infoTable.tsv"),row.names = F)
if(!is.null(fitObj$giniThresholdsTable)){
writeTable(t = fitObj$giniThresholdsTable,file = paste0(outdir,"giniThresholdsTable.tsv"),row.names = T)
}
reconstructed <- fitObj$reconstructed_catalogues
#plot and save exposures
sums_exp <- apply(fitObj$exposures,1,sum)
denominator <- matrix(sums_exp,nrow = nrow(fitObj$exposures),ncol = ncol(fitObj$exposures),byrow = FALSE)
exposuresProp <- (fitObj$exposures/denominator*100)
# case of empty catalogues
exposuresProp[sums_exp==0,] <- 0
# #plot and save exposures
file_table_exp <- paste0(outdir,"exposures.tsv")
# plot to pdf
file_plot_exp <- paste0(outdir,"exposures.pdf")
file_plot_expProp <- paste0(outdir,"exposures_prop.pdf")
plotMatrix(as.data.frame(t(fitObj$exposures)),
output_file = file_plot_exp,
ndigitsafterzero = 0,
sideVector = fitObj$cossim_catalogueVSreconstructed,
sideVectorLabel = "cosine similarity")
plotMatrix(as.data.frame(t(exposuresProp)),
output_file = file_plot_expProp,
ndigitsafterzero = 0,
sideVector = fitObj$cossim_catalogueVSreconstructed,
sideVectorLabel = "cosine similarity")
write.table(fitObj$exposures,file = file_table_exp,
sep = "\t",col.names = TRUE,row.names = TRUE,quote = FALSE)
cossimdf <- as.data.frame(fitObj$cossim_catalogueVSreconstructed)
colnames(cossimdf) <- "cossim_catalogueVSreconstructed"
write.table(cossimdf,file = paste0(outdir,"cossim_catalogueVSreconstructed.tsv"),
sep = "\t",col.names = TRUE,row.names = TRUE,quote = FALSE)
# some aggregate stats
if(nrow(fitObj$exposures)>1){
# distribution of consensus exposures across the samples
simpleBoxPlot(dataToPlot = fitObj$exposures,
outfile = paste0(outdir,"exposuresAcrossSamplesBoxplot.pdf"),
charttitle = "Exposures across samples",
ylab = "mutations",
plotpoints = TRUE)
# proportion of samples with signatures
propSamplesTable <- apply(fitObj$exposures>0,2,sum)/nrow(fitObj$exposures)
simpleBarChart(dataToPlot = propSamplesTable,
charttitle = "Proportion of samples with signature",
ylim = c(0,1),
ylab="proportion",
outfile = paste0(outdir,"proportionOfSamplesWithSignatures.pdf"))
writeTable(data.frame(proportion=propSamplesTable),
file = paste0(outdir,"proportionOfSamplesWithSignatures.tsv"))
}
#provide a series of plots for each sample
if(samplesInSubdir){
outdir <- paste0(outdir,"samples/")
dir.create(outdir,recursive = TRUE,showWarnings = FALSE)
}
howmanyplots <- ncol(fitObj$catalogues)
for(p in 1:howmanyplots){
# p <- 1
currentSample <- colnames(fitObj$catalogues)[p]
fitIsEmpty <- sum(fitObj$exposures[p,])==0
unassigned_mut <- sprintf("%.2f",(fitObj$unassigned_muts_perc[p]))
cos_sim <- sprintf("%.2f",cos_sim(fitObj$catalogues[,p,drop=FALSE],reconstructed[,p,drop=FALSE]))
sigMatrix <- fitObj$catalogues[,p,drop=FALSE]
addToTitle <- c("Catalogue")
if(!fitIsEmpty){
sigMatrix <- cbind(sigMatrix,reconstructed[,p,drop=FALSE])
sigMatrix <- cbind(sigMatrix,fitObj$catalogues[,p,drop=FALSE] - reconstructed[,p,drop=FALSE])
addToTitle <- c(addToTitle,
paste0("Reconstructed (CosSim ",cos_sim,")"),
paste0("Difference (Unassigned ",unassigned_mut,"%)"))
}
colnames(sigMatrix) <- paste0(colnames(sigMatrix)," ",addToTitle)
# save it
writeTable(t = sigMatrix,file = paste0(outdir,"signatureFit_",p,"of",howmanyplots,"_",currentSample,"_pointEstimate.tsv"))
plotSignatures(signature_data_matrix = sigMatrix,add_to_title = NULL,
output_file = paste0(outdir,"signatureFit_",p,"of",howmanyplots,"_",currentSample,"_pointEstimate.pdf"))
if(fitObj$useBootstrap & !fitIsEmpty){
#4 bootstraps
consensusCol <- "#BE0032"
thresholdCol <- "#8DB600"
threshold_nmutsCol <- "#875692"
bootsCol <- "#A1CAF1"
thresholdText_prop <- NULL
thresholdText_nmuts <- NULL
nthresholds <- 0
if(fitObj$exposureFilterType=="fixedThreshold"){
if(fitObj$threshold_percent >= 0){
thresholdText_prop <- paste0("threshold %=",fitObj$threshold_percent,"%")
nthresholds <- nthresholds + 1
}
if(fitObj$threshold_nmuts >= 0){
thresholdText_nmuts <- paste0("threshold nmuts=",fitObj$threshold_nmuts," muts")
nthresholds <- nthresholds + 1
}
}else if(fitObj$exposureFilterType=="giniScaledThreshold") {
if(fitObj$giniThresholdScaling >= 0){
thresholdText_prop <- paste0("threshold %=(1-Gini)*",fitObj$giniThresholdScaling,"%")
nthresholds <- nthresholds + 1
}
if(fitObj$giniThresholdScaling_nmuts >= 0){
thresholdText_nmuts <- paste0("threshold nmuts=(1-Gini)*",fitObj$giniThresholdScaling_nmuts," muts")
nthresholds <- nthresholds + 1
}
}
maxnchar <- max(sapply(colnames(fitObj$signatures),nchar))
mar1 <- 0.6*maxnchar+1.2
mar2 <- 4
mar3 <- 5
mar4 <- 2
width <- max(4.5,0.5*ncol(fitObj$signatures)+1)
height <- 3 + 0.125*maxnchar
cairo_pdf(filename = paste0(outdir,"signatureFit_",p,"of",howmanyplots,"_",currentSample,"_Bootstrap.pdf"),width = width,height = height)
par(mfrow=c(1,1))
par(mar=c(mar1,mar2,mar3,mar4))
boxplot(t(fitObj$bootstrap_exposures_samples[[p]]),las=3,cex.axes=0.9,
ylab="n mutations",col="white",
ylim=c(0,max(fitObj$bootstrap_exposures_samples[[p]])),cex.main = 0.9,border="#848482",
main=paste0("Exposures, of ",rownames(fitObj$exposures)[p],"\n","p-value=",fitObj$threshold_p.value,", n=",fitObj$nboot))
if(fitObj$exposureFilterType=="fixedThreshold"){
if(fitObj$threshold_percent >= 0) abline(h=fitObj$threshold_percent/100*sum(fitObj$catalogues[,p,drop=FALSE]),col=thresholdCol,lwd = 2)
if(fitObj$threshold_nmuts >= 0) abline(h=fitObj$threshold_nmuts,col=threshold_nmutsCol,lwd = 2)
}else if(fitObj$exposureFilterType=="giniScaledThreshold") {
sigInvGini <- 1 - apply(fitObj$signatures,2,giniCoeff)
giniThresholdPerc <- fitObj$giniThresholdScaling*sigInvGini
giniThreshold <- giniThresholdPerc/100*sum(fitObj$catalogues[,p,drop=FALSE])
giniThresholdNMUTS <- fitObj$giniThresholdScaling_nmuts*sigInvGini
if(fitObj$giniThresholdScaling >= 0) for(si in 1:length(giniThreshold)) lines(x = c(si-0.5,si+0.5),y = rep(giniThreshold[si],2),col=thresholdCol,lwd = 2)
if(fitObj$giniThresholdScaling_nmuts >= 0) for(si in 1:length(giniThresholdNMUTS)) lines(x = c(si-0.5,si+0.5),y = rep(giniThresholdNMUTS[si],2),col=threshold_nmutsCol,lwd = 2)
}
points(1:(length(fitObj$exposures[p,])-1),fitObj$exposures[p,1:(ncol(fitObj$exposures)-1)],col=consensusCol,pch = 16)
legend(x="topleft",legend = c("consensus exposures"),col = consensusCol,pch = 16,cex = 0.8,bty = "n",inset = c(0,-0.13),xpd = TRUE)
if(!is.null(thresholdText_prop)) legend(x="topright",legend = thresholdText_prop,col = thresholdCol,lty = 1,cex = 0.8,bty = "n",inset = c(0,-0.13),xpd = TRUE,lwd = 2)
if(!is.null(thresholdText_nmuts)) legend(x="topright",legend = thresholdText_nmuts,col = threshold_nmutsCol,lty = 1,cex = 0.8,bty = "n",inset = c(0,ifelse(nthresholds==1,-0.13,-0.2)),xpd = TRUE,lwd = 2)
dev.off()
if(ncol(fitObj$signatures)>1){
#5 top correlated signatures
res.cor <- fitObj$bootstrap_exposure_correlations[[p]]
# save it
writeTable(t = res.cor,file = paste0(outdir,"signatureFit_",p,"of",howmanyplots,"_",currentSample,"_Bootstrap_Correlation.tsv"))
# plot only upper triangular
res.cor_triangular <- res.cor
res.cor_triangular[row(res.cor)+(ncol(res.cor)-col(res.cor))>=ncol(res.cor)] <- 0
res.cor_triangular_label <- matrix(sprintf("%0.2f",res.cor_triangular),nrow = nrow(res.cor_triangular))
res.cor_triangular_label[row(res.cor)+(ncol(res.cor)-col(res.cor))>=ncol(res.cor)] <- ""
mar1 <- 0.6*maxnchar+1.2
mar2 <- 0.6*maxnchar+1.2
mar3 <- 4
mar4 <- 4
width <- 1 + 0.3*ncol(fitObj$signatures) + 0.125*maxnchar
height <- 1 + 0.3*ncol(fitObj$signatures) + 0.125*maxnchar
cairo_pdf(filename = paste0(outdir,"signatureFit_",p,"of",howmanyplots,"_",currentSample,"_Bootstrap_Correlation.pdf"),width = width,height = height)
par(mfrow=c(1,1))
par(mar=c(mar1,mar2,mar3,mar4))
par(xpd=FALSE)
col<- colorRampPalette(c("blue", "white", "red"))(51)
image(res.cor_triangular,col = col,zlim = c(-1,1), axes=F,main="Exposures Correlation\n(spearman)")
extrabit <- 1/(ncol(fitObj$signatures)-1)/2
abline(h=seq(0-extrabit,1+extrabit,length.out = ncol(fitObj$signatures)+1),col="grey",lty=2)
abline(v=seq(0-extrabit,1+extrabit,length.out = ncol(fitObj$signatures)+1),col="grey",lty=2)
axis(2,at = seq(0,1,length.out = ncol(fitObj$signatures)),labels = colnames(fitObj$signatures),las=1,cex.lab=0.8)
axis(1,at = seq(0,1,length.out = ncol(fitObj$signatures)),labels = colnames(fitObj$signatures),las=2,cex.lab=0.8)
draw_legend(col,1+4*extrabit,1+4.5*extrabit,0,1,1+3*extrabit)
dev.off()
#6 some correlation plots
howmanycorrtoplot <- min(3,ncol(fitObj$signatures)*(ncol(fitObj$signatures)-1)/2)
cairo_pdf(filename = paste0(outdir,"signatureFit_",p,"of",howmanyplots,"_",currentSample,"_Bootstrap_CorrelationExamples.pdf"),width = 2.2*howmanycorrtoplot,height = 2.5)
par(mfrow=c(1,howmanycorrtoplot))
vals <- res.cor_triangular[order(abs(res.cor_triangular),decreasing = TRUE)]
for (j in 1:howmanycorrtoplot){
pos <- which(vals[j]==res.cor_triangular,arr.ind = TRUE)
mainpar <- paste0("Exposures across bootstraps, n=",fitObj$nboot,"\nspearman correlation ",sprintf("%.2f",vals[j]))
plot(fitObj$bootstrap_exposures_samples[[p]][pos[1],],fitObj$bootstrap_exposures_samples[[p]][pos[2],],
xlab = colnames(fitObj$signatures)[pos[1]],
ylab = colnames(fitObj$signatures)[pos[2]],
main=mainpar,col=bootsCol,pch = 16,cex.main = 0.9)
}
dev.off()
}
}
}
}
#' Plot the results from the FitMS function
#'
#' Plotting of the results obtained with the FitMS function. Output adapts based on the options used in the FitMS function
#'
#' @param fitObj object obtained from the FitMS function
#' @param outdir output directory where the results should be saved/plotted
#' @export
#' @examples
#' res <- FitMS(catalogues,"Breast")
#' plotFitMS(res,"results/")
plotFitMS <- function(fitMSobj,
outdir = ""){
# some checks on outdir
if(is.null(outdir)) {
message("[error plotFitMS] please specify outdir")
return(NULL)
}
if(outdir != ""){
if(substr(outdir,nchar(outdir),nchar(outdir)) != "/") outdir <- paste0(outdir,"/")
}
# now let's plot
if(outdir != "") dir.create(outdir,showWarnings = F,recursive = T)
# build the info table
infoTable <- NULL
infoTable <- rbind(infoTable,data.frame(parameter="fitAlgorithm",value=fitMSobj$fitAlgorithm,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="nparallel",value=fitMSobj$nparallel,stringsAsFactors = F))
if(!is.null(fitMSobj$randomSeed)){
infoTable <- rbind(infoTable,data.frame(parameter="randomSeed",value=fitMSobj$randomSeed,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="randomSeed",value="NULL",stringsAsFactors = F))
}
infoTable <- rbind(infoTable,data.frame(parameter="nsamples",value=ncol(fitMSobj$catalogues),stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="ncommonSignatures",value=ncol(fitMSobj$commonSignatures),stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="nrareSignatures",value=ncol(fitMSobj$rareSignatures),stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="nchannels",value=nrow(fitMSobj$catalogues),stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="method",value=fitMSobj$method,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="multiStepMode",value=fitMSobj$multiStepMode,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="maxRareSigsPerSample",value=fitMSobj$maxRareSigsPerSample,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="rareCandidateSelectionCriteria",value=fitMSobj$rareCandidateSelectionCriteria,stringsAsFactors = F))
if(!is.null(fitMSobj$organ)){
infoTable <- rbind(infoTable,data.frame(parameter="organ",value=fitMSobj$organ,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="organ",value="NULL",stringsAsFactors = F))
}
infoTable <- rbind(infoTable,data.frame(parameter="commonSignatureTier",value=fitMSobj$commonSignatureTier,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="rareSignatureTier",value=fitMSobj$rareSignatureTier,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="minErrorReductionPerc",value=fitMSobj$minErrorReductionPerc,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="minCosSimIncrease",value=fitMSobj$minCosSimIncrease,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="residualNegativeProp",value=fitMSobj$residualNegativeProp,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="minCosSimRareSig",value=fitMSobj$minCosSimRareSig,stringsAsFactors = F))
infoTable <- rbind(infoTable,data.frame(parameter="exposureFilterType",value=fitMSobj$exposureFilterType,stringsAsFactors = F))
if(!is.null(fitMSobj$threshold_percent)){
infoTable <- rbind(infoTable,data.frame(parameter="threshold_percent",value=fitMSobj$threshold_percent,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="threshold_percent",value="NULL",stringsAsFactors = F))
}
if(!is.null(fitMSobj$threshold_nmuts)){
infoTable <- rbind(infoTable,data.frame(parameter="threshold_nmuts",value=fitMSobj$threshold_nmuts,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="threshold_nmuts",value="NULL",stringsAsFactors = F))
}
if(!is.null(fitMSobj$giniThresholdScaling)){
infoTable <- rbind(infoTable,data.frame(parameter="giniThresholdScaling",value=fitMSobj$giniThresholdScaling,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="giniThresholdScaling",value="NULL",stringsAsFactors = F))
}
if(!is.null(fitMSobj$giniThresholdScaling_nmuts)){
infoTable <- rbind(infoTable,data.frame(parameter="giniThresholdScaling_nmuts",value=fitMSobj$giniThresholdScaling_nmuts,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="giniThresholdScaling_nmuts",value="NULL",stringsAsFactors = F))
}
infoTable <- rbind(infoTable,data.frame(parameter="useBootstrap",value=ifelse(fitMSobj$useBootstrap,"TRUE","FALSE"),stringsAsFactors = F))
if(!is.null(fitMSobj$nboot)){
infoTable <- rbind(infoTable,data.frame(parameter="nboot",value=fitMSobj$nboot,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="nboot",value="NULL",stringsAsFactors = F))
}
if(!is.null(fitMSobj$threshold_p.value)){
infoTable <- rbind(infoTable,data.frame(parameter="threshold_p.value",value=fitMSobj$threshold_p.value,stringsAsFactors = F))
}else{
infoTable <- rbind(infoTable,data.frame(parameter="threshold_p.value",value="NULL",stringsAsFactors = F))
}
# save it
writeTable(t = infoTable,file = paste0(outdir,"infoTable.tsv"),row.names = F)
if(!is.null(fitMSobj$giniThresholdsTable)){
writeTable(t = fitMSobj$giniThresholdsTable,file = paste0(outdir,"giniThresholdsTable.tsv"),row.names = T)
}
# now plot some generic info
if(length(fitMSobj$whichSamplesMayHaveRareSigs)>0){
summaryFits <- NULL
rowi <- 1
for (i in 1:length(fitMSobj$whichSamplesMayHaveRareSigs)){
# i <- 1
currentSample <- fitMSobj$whichSamplesMayHaveRareSigs[i]
if(fitMSobj$multiStepMode=="errorReduction" | fitMSobj$multiStepMode=="cossimIncrease"){
summaryFits <- rbind(summaryFits,data.frame(sample=currentSample,
candidate="commonSigsOnly",
cossim=fitMSobj$commonSigsOnlyCosSim[[currentSample]],
error=fitMSobj$commonSigsOnlyError[[currentSample]],
selected=fitMSobj$rareSigChoice[[currentSample]]=="commonOnly",
stringsAsFactors = F,row.names = rowi))
}
for (j in 1:length(fitMSobj$candidateRareSigs[[currentSample]])){
if(fitMSobj$multiStepMode=="errorReduction" | fitMSobj$multiStepMode=="cossimIncrease"){
summaryFits <- rbind(summaryFits,data.frame(sample=currentSample,
candidate=fitMSobj$candidateRareSigs[[currentSample]][j],
cossim=fitMSobj$candidateRareSigsCosSim[[currentSample]][j],
error=fitMSobj$candidateRareSigsError[[currentSample]][j],
selected=fitMSobj$rareSigChoice[[currentSample]]==fitMSobj$candidateRareSigs[[currentSample]][j],
stringsAsFactors = F,row.names = rowi))
}else{
summaryFits <- rbind(summaryFits,data.frame(sample=currentSample,
candidate=fitMSobj$candidateRareSigs[[currentSample]][j],
cossim=fitMSobj$candidateRareSigsCosSim[[currentSample]][j],
selected=fitMSobj$rareSigChoice[[currentSample]]==fitMSobj$candidateRareSigs[[currentSample]][j],
stringsAsFactors = F,row.names = rowi))
}
rowi <- rowi+1
}
}
write.table(summaryFits,file = paste0(outdir,"candidateRareSigs.tsv"),
sep = "\t",col.names = TRUE,row.names = FALSE,quote = FALSE)
}
#plot and save exposures
sums_exp <- apply(fitMSobj$exposures,1,sum)
denominator <- matrix(sums_exp,nrow = nrow(fitMSobj$exposures),ncol = ncol(fitMSobj$exposures),byrow = FALSE)
exposuresProp <- (fitMSobj$exposures/denominator*100)
# case of empty catalogues
exposuresProp[sums_exp==0,] <- 0
file_plot_exp <- paste0(outdir,"exposures.pdf")
file_plot_expProp <- paste0(outdir,"exposures_prop.pdf")
file_table_exp <- paste0(outdir,"exposures.tsv")
plotMatrix(as.data.frame(t(fitMSobj$exposures)),
output_file = file_plot_exp,
ndigitsafterzero = 0,
sideVector = fitMSobj$cossim_catalogueVSreconstructed,
sideVectorLabel = "cosine similarity")
plotMatrix(as.data.frame(t(exposuresProp)),
output_file = file_plot_expProp,
ndigitsafterzero = 0,
sideVector = fitMSobj$cossim_catalogueVSreconstructed,
sideVectorLabel = "cosine similarity")
write.table(fitMSobj$exposures,file = file_table_exp,
sep = "\t",col.names = TRUE,row.names = TRUE,quote = FALSE)
# some aggregate stats
if(nrow(fitMSobj$exposures)>1){
# distribution of consensus exposures across the samples
simpleBoxPlot(dataToPlot = fitMSobj$exposures,
outfile = paste0(outdir,"exposuresAcrossSamplesBoxplot.pdf"),
charttitle = "Exposures across samples",
ylab = "mutations",
plotpoints = TRUE)
# proportion of samples with signatures
propSamplesTable <- apply(fitMSobj$exposures>0,2,sum)/nrow(fitMSobj$exposures)
simpleBarChart(dataToPlot = propSamplesTable,
charttitle = "Proportion of samples with signature",
ylim = c(0,1),
ylab="proportion",
outfile = paste0(outdir,"proportionOfSamplesWithSignatures.pdf"))
writeTable(data.frame(proportion=propSamplesTable),
file = paste0(outdir,"proportionOfSamplesWithSignatures.tsv"))
}
cossimdf <- as.data.frame(fitMSobj$cossim_catalogueVSreconstructed)
colnames(cossimdf) <- "cossim_catalogueVSreconstructed"
write.table(cossimdf,file = paste0(outdir,"cossim_catalogueVSreconstructed.tsv"),
sep = "\t",col.names = TRUE,row.names = TRUE,quote = FALSE)
# now for each sample plot the chosen solution and the alternative solutions
for (s in colnames(fitMSobj$catalogues)){
# s <- colnames(fitMSobj$catalogues)[1]
if(s %in% fitMSobj$whichSamplesMayHaveRareSigs){
currentOutDir <- paste0(outdir,"selectedSolution_",s,"/")
selectedSolution <- fitMSobj$rareSigChoice[[s]]
otherSolutions <- setdiff(fitMSobj$candidateRareSigs[[s]],selectedSolution)
if(selectedSolution=="commonOnly"){
plotFit(fitMSobj$samples[[s]]$fitCommonOnly,outdir = currentOutDir,samplesInSubdir = F)
}else{
plotFit(fitMSobj$samples[[s]]$fitWithRare[[selectedSolution]],outdir = currentOutDir,samplesInSubdir = F)
}
# plot other candidates
currentOutDir <- paste0(outdir,"otherSolutions_",s,"/commonOnly/")
if(!selectedSolution=="commonOnly") plotFit(fitMSobj$samples[[s]]$fitCommonOnly,outdir = currentOutDir,samplesInSubdir = F)
if(length(otherSolutions)>0){
for (i in 1:length(otherSolutions)){
# i <- 1
currentOutDir <- paste0(outdir,"otherSolutions_",s,"/rareSigs_",i,"/")
plotFit(fitMSobj$samples[[s]]$fitWithRare[[otherSolutions[i]]],outdir = currentOutDir,samplesInSubdir = F)
}
}
}else{
currentOutDir <- paste0(outdir,"selectedSolution_",s,"/")
plotFit(fitMSobj$samples[[s]]$fitCommonOnly,outdir = currentOutDir,samplesInSubdir = F)
}
}
}
vectorToJSON <- function(v,
vectorname,
nindent=0){
indent <- rep("\t",nindent)
cat(indent,"\"",vectorname,"\": ",sep = "")
if(!is.null(names(v))){
cat("{\n",sep = "")
for (i in 1:length(v)) {
cat(indent,"\t\"",names(v)[i],"\": ",sep = "")
if(is.na(v[i])){
cat("null",sep = "")
}else if(typeof(v[i])=="double"){
cat(v[i],sep = "")
}else if (typeof(v[i])=="character"){
cat("\"",v[i],"\"",sep = "")
}
if(i<length(v)) cat(",")
cat("\n")
}
cat(indent,"}",sep = "")
}else{
cat("[",sep = "")
for (i in 1:length(v)) {
if(is.na(v[i])){
cat("null",sep = "")
}else if(typeof(v[i])=="double"){
cat(v[i],sep = "")
}else if (typeof(v[i])=="character"){
cat("\"",v[i],"\"",sep = "")
}
if(i<length(v)) cat(", ")
}
cat("]",sep = "")
}
}
#internal fuction for plotting a table in JSON
tableToJSON <- function(tab,
tablename,
nindent = 0){
indent <- rep("\t",nindent)
cat(indent,"\"",tablename,"\": {\n",sep = "")
for(i in 1:ncol(tab)){
sname <- colnames(tab)[i]
cat(indent,"\t\"",sname,"\": {\n",sep = "")
for(j in 1:nrow(tab)){
rname <- rownames(tab)[j]
val <- tab[j,i]
if(is.na(val)) val <- "null"
cat(indent,"\t\t\"",rname,"\": ",val,sep = "")
if(j<nrow(tab)){
cat(",\n")
}else{
cat("\n")
}
}
cat(indent,"\t}",sep = "")
if(i<ncol(tab)){
cat(",\n")
}else{
cat("\n")
}
}
cat(indent,"}",sep = "")
}
#' Export the results from the Fit function to JSON
#'
#' Exports of the results obtained with the Fit function to JSON. Output adapts based on the options used in the Fit function
#'
#' @param fitObj object obtained from the Fit function
#' @param filename file name where to save the JSON string
#' @param nindent number of tabs of indentation to be used, default=0. This is useful in case one wants to embed the output in a larger JSON document
#' @param disableFileWrite if TRUE the JSON lines are simply printed to screen. This option is used when fitToJSON is called from another function like fitMStoJSON, which itself opens a sink() to write to.
#' @param blockname if specified, the first line will show the given name as the block name, which is useful if the output is embedded in a larger JSON file
#' @export
#' @examples
#' res <- Fit(catalogues,getOrganSignatures("Breast"))
#' fitToJSON(res,"export.json")
fitToJSON <- function(fitObj,
filename = "export.json",
nindent = 0,
disableFileWrite = FALSE,
blockname = NULL){
#plot consensus exposures file with metadata
indent <- rep("\t",nindent)
if(!disableFileWrite) sink(file = filename)
if(is.null(blockname)){
cat(indent,"{\n",sep = "")
}else{
cat(indent,"\"",blockname,"\": {\n",sep = "")
}
cat(indent,"\t\"fitAlgorithm\": \"",fitObj$fitAlgorithm,"\",\n",sep = "")
cat(indent,"\t\"nparallel\": ",fitObj$nparallel,",\n",sep = "")
if(!is.null(fitObj$randomSeed)){
cat(indent,"\t\"randomSeed\": ",fitObj$randomSeed,",\n",sep = "")
}else{
cat(indent,"\t\"randomSeed\": null,\n",sep = "")
}
cat(indent,"\t\"nsamples\": ",nrow(fitObj$exposures),",\n",sep = "")
cat(indent,"\t\"nsignatures\": ",ncol(fitObj$signatures),",\n",sep = "")
cat(indent,"\t\"nchannels\": ",nrow(fitObj$catalogues),",\n",sep = "")
cat(indent,"\t\"method\": \"",fitObj$method,"\",\n",sep = "")
cat(indent,"\t\"exposureFilterType\": \"",fitObj$exposureFilterType,"\",\n",sep = "")
if(!is.null(fitObj$threshold_percent)){
cat(indent,"\t\"threshold_percent\": ",fitObj$threshold_percent,",\n",sep = "")
}else{
cat(indent,"\t\"threshold_percent\": null,\n",sep = "")
}
if(!is.null(fitObj$threshold_nmuts)){
cat(indent,"\t\"threshold_nmuts\": ",fitObj$threshold_nmuts,",\n",sep = "")
}else{
cat(indent,"\t\"threshold_nmuts\": null,\n",sep = "")
}
if(!is.null(fitObj$giniThresholdScaling)){
cat(indent,"\t\"giniThresholdScaling\": ",fitObj$giniThresholdScaling,",\n",sep = "")
}else{
cat(indent,"\t\"giniThresholdScaling\": null,\n",sep = "")
}
if(!is.null(fitObj$giniThresholdScaling_nmuts)){
cat(indent,"\t\"giniThresholdScaling_nmuts\": ",fitObj$giniThresholdScaling_nmuts,",\n",sep = "")
}else{
cat(indent,"\t\"giniThresholdScaling_nmuts\": null,\n",sep = "")
}
cat(indent,"\t\"useBootstrap\": ",ifelse(fitObj$useBootstrap,"true","false"),",\n",sep = "")
if(!is.null(fitObj$nboot)){
cat(indent,"\t\"nboot\": ",fitObj$nboot,",\n",sep = "")
}else{
cat(indent,"\t\"nboot\": null,\n",sep = "")
}
if(!is.null(fitObj$threshold_p.value)){
cat(indent,"\t\"threshold_p.value\": ",fitObj$threshold_p.value,",\n",sep = "")
}else{
cat(indent,"\t\"threshold_p.value\": null,\n",sep = "")
}
tableToJSON(fitObj$catalogues,tablename = "catalogues",nindent = nindent + 1)
cat(",\n")
tableToJSON(fitObj$signatures,tablename = "signatures",nindent = nindent + 1)
cat(",\n")
if(!is.null(fitObj$giniThresholdsTable)){
tableToJSON(fitObj$giniThresholdsTable,tablename = "giniThresholdsTable",nindent = nindent + 1)
cat(",\n")
}else{
cat(indent,"\t\"giniThresholdsTable\": null,\n",sep = "")
}
tableToJSON(t(fitObj$exposures),tablename = "exposures",nindent = nindent + 1)
cat(",\n")
tableToJSON(fitObj$reconstructed_catalogues,tablename = "reconstructed_catalogues",nindent = nindent + 1)
cat(",\n")
vectorToJSON(fitObj$unassigned_muts,"unassigned_muts",nindent = nindent + 1)
cat(",\n")
vectorToJSON(fitObj$unassigned_muts_perc,"unassigned_muts_perc",nindent = nindent + 1)
cat(",\n")
vectorToJSON(fitObj$cossim_catalogueVSreconstructed,"cossim_catalogueVSreconstructed",nindent = nindent + 1)
cat(",\n")
if(all(!is.null(fitObj$bootstrap_exposures_samples))){
cat(indent,"\t\"bootstrap_exposures_samples\": {\n",sep = "")
for (i in 1:length(fitObj$bootstrap_exposures_samples)) {
tableToJSON(fitObj$bootstrap_exposures_samples[[i]],tablename = rownames(fitObj$exposures)[i],nindent = nindent + 2)
if(i<length(fitObj$bootstrap_exposures_samples)) cat(",")
cat("\n")
}
cat(indent,"\t}",sep = "")
}else{
cat(indent,"\t\"bootstrap_exposures_samples\": null",sep = "")
}
cat(",\n")
if(all(!is.null(fitObj$bootstrap_exposure_correlations)) & ncol(fitObj$signatures)>1){
cat(indent,"\t\"bootstrap_exposure_correlations\": {\n",sep = "")
for (i in 1:length(fitObj$bootstrap_exposure_correlations)) {
tableToJSON(fitObj$bootstrap_exposure_correlations[[i]],tablename = names(fitObj$bootstrap_exposure_correlations)[i],nindent = nindent + 2)
if(i<length(fitObj$bootstrap_exposure_correlations)) cat(",")
cat("\n")
}
cat(indent,"\t}",sep = "")
}else{
cat(indent,"\t\"bootstrap_exposure_correlations\": null",sep = "")
}
cat(",\n")
if(all(!is.null(fitObj$bootstrap_exposures_pvalues))){
tableToJSON(fitObj$bootstrap_exposures_pvalues,tablename = "bootstrap_exposures_pvalues",nindent = nindent + 1)
}else{
cat(indent,"\t\"bootstrap_exposures_pvalues\": null",sep = "")
}
cat("\n")
cat(indent,"}",sep = "")
if(!disableFileWrite) {
cat("\n")
sink()
}
}
#' Export the results from the FitMS function to JSON
#'
#' Exports of the results obtained with the FitMS function to JSON.
#'
#' @param fitObj object obtained from the FitMS function
#' @param filename file name where to save the JSON string
#' @param nindent number of tabs of indentation to be used, default=0. This is useful in case one wants to embed the output in a larger JSON document
#' @param disableFileWrite if TRUE the JSON lines are simply printed to screen. This option is used when fitToJSON is called from another function
#' @param blockname if specified, the first line will show the given name as the block name, which is useful if the output is embedded in a larger JSON file
#' @export
#' @examples
#' res <- FitMS(catalogues,"Breast")
#' fitMStoJSON(res,"export.json")
fitMStoJSON <- function(fitObj,
filename = "export.json",
nindent = 0,
disableFileWrite = FALSE,
blockname = NULL){
#plot consensus exposures file with metadata
indent <- rep("\t",nindent)
if(!disableFileWrite) sink(file = filename)
if(is.null(blockname)){
cat(indent,"{\n",sep = "")
}else{
cat(indent,"\"",blockname,"\": {\n",sep = "")
}
cat(indent,"\t\"fitAlgorithm\": \"",fitObj$fitAlgorithm,"\",\n",sep = "")
cat(indent,"\t\"nparallel\": ",fitObj$nparallel,",\n",sep = "")
if(!is.null(fitObj$randomSeed)){
cat(indent,"\t\"randomSeed\": ",fitObj$randomSeed,",\n",sep = "")
}else{
cat(indent,"\t\"randomSeed\": null,\n",sep = "")
}
cat(indent,"\t\"nsamples\": ",ncol(fitObj$catalogues),",\n",sep = "")
cat(indent,"\t\"ncommonSignatures\": ",ncol(fitObj$commonSignatures),",\n",sep = "")
cat(indent,"\t\"nrareSignatures\": ",ncol(fitObj$rareSignatures),",\n",sep = "")
cat(indent,"\t\"nchannels\": ",nrow(fitObj$catalogues),",\n",sep = "")
cat(indent,"\t\"method\": \"",fitObj$method,"\",\n",sep = "")
cat(indent,"\t\"multiStepMode\": \"",fitObj$multiStepMode,"\",\n",sep = "")
cat(indent,"\t\"maxRareSigsPerSample\": ",fitObj$maxRareSigsPerSample,",\n",sep = "")
cat(indent,"\t\"rareCandidateSelectionCriteria\": \"",fitObj$rareCandidateSelectionCriteria,"\",\n",sep = "")
if(!is.null(fitObj$organ)){
cat(indent,"\t\"organ\": \"",fitObj$organ,"\",\n",sep = "")
}else{
cat(indent,"\t\"organ\": null,\n",sep = "")
}
cat(indent,"\t\"commonSignatureTier\": \"",fitObj$commonSignatureTier,"\",\n",sep = "")
cat(indent,"\t\"rareSignatureTier\": \"",fitObj$rareSignatureTier,"\",\n",sep = "")
cat(indent,"\t\"minErrorReductionPerc\": ",fitObj$minErrorReductionPerc,",\n",sep = "")
cat(indent,"\t\"minCosSimIncrease\": ",fitObj$minCosSimIncrease,",\n",sep = "")
cat(indent,"\t\"residualNegativeProp\": ",fitObj$residualNegativeProp,",\n",sep = "")
cat(indent,"\t\"minCosSimRareSig\": ",fitObj$minCosSimRareSig,",\n",sep = "")
cat(indent,"\t\"exposureFilterType\": \"",fitObj$exposureFilterType,"\",\n",sep = "")
if(!is.null(fitObj$threshold_percent)){
cat(indent,"\t\"threshold_percent\": ",fitObj$threshold_percent,",\n",sep = "")
}else{
cat(indent,"\t\"threshold_percent\": null,\n",sep = "")
}
if(!is.null(fitObj$threshold_nmuts)){
cat(indent,"\t\"threshold_nmuts\": ",fitObj$threshold_nmuts,",\n",sep = "")
}else{
cat(indent,"\t\"threshold_nmuts\": null,\n",sep = "")
}
if(!is.null(fitObj$giniThresholdScaling)){
cat(indent,"\t\"giniThresholdScaling\": ",fitObj$giniThresholdScaling,",\n",sep = "")
}else{
cat(indent,"\t\"giniThresholdScaling\": null,\n",sep = "")
}
if(!is.null(fitObj$giniThresholdScaling_nmuts)){
cat(indent,"\t\"giniThresholdScaling_nmuts\": ",fitObj$giniThresholdScaling_nmuts,",\n",sep = "")
}else{
cat(indent,"\t\"giniThresholdScaling_nmuts\": null,\n",sep = "")
}
cat(indent,"\t\"useBootstrap\": ",ifelse(fitObj$useBootstrap,"true","false"),",\n",sep = "")
if(!is.null(fitObj$nboot)){
cat(indent,"\t\"nboot\": ",fitObj$nboot,",\n",sep = "")
}else{
cat(indent,"\t\"nboot\": null,\n",sep = "")
}
if(!is.null(fitObj$threshold_p.value)){
cat(indent,"\t\"threshold_p.value\": ",fitObj$threshold_p.value,",\n",sep = "")
}else{
cat(indent,"\t\"threshold_p.value\": null,\n",sep = "")
}
if(length(fitObj$whichSamplesMayHaveRareSigs)>0){
vectorToJSON(fitObj$whichSamplesMayHaveRareSigs,"whichSamplesMayHaveRareSigs",nindent = nindent + 1)
cat(",\n")
}else{
cat(indent,"\t\"whichSamplesMayHaveRareSigs\": null,\n",sep = "")
}
if(length(fitObj$commonSigsOnlyCosSim)>0){
cat(indent,"\t\"commonSigsOnlyCosSim\": {\n",sep = "")
for (i in 1:length(fitObj$commonSigsOnlyCosSim)){
cat(indent,"\t\t\"",names(fitObj$commonSigsOnlyCosSim)[i],"\": ",fitObj$commonSigsOnlyCosSim[[names(fitObj$commonSigsOnlyCosSim)[i]]],sep = "")
if(i < length(fitObj$commonSigsOnlyCosSim)) cat(",")
cat("\n")
}
cat(indent,"\t},\n",sep = "")
}
if(length(fitObj$commonSigsOnlyError)>0){
cat(indent,"\t\"commonSigsOnlyError\": {\n",sep = "")
for (i in 1:length(fitObj$commonSigsOnlyError)){
cat(indent,"\t\t\"",names(fitObj$commonSigsOnlyError)[i],"\": ",fitObj$commonSigsOnlyError[[names(fitObj$commonSigsOnlyError)[i]]],sep = "")
if(i < length(fitObj$commonSigsOnlyError)) cat(",")
cat("\n")
}
cat(indent,"\t},\n",sep = "")
}
if(length(fitObj$rareSigChoice)>0){
cat(indent,"\t\"rareSigChoice\": {\n",sep = "")
for (i in 1:length(fitObj$rareSigChoice)){
cat(indent,"\t\t\"",names(fitObj$rareSigChoice)[i],"\": \"",fitObj$rareSigChoice[[names(fitObj$rareSigChoice)[i]]],"\"",sep = "")
if(i < length(fitObj$rareSigChoice)) cat(",")
cat("\n")
}
cat(indent,"\t},\n",sep = "")
}else{
cat(indent,"\t\"rareSigChoice\": null,\n",sep = "")
}
if(length(fitObj$candidateRareSigsCosSim)>0){
cat(indent,"\t\"candidateRareSigsCosSim\": {\n",sep = "")
for (i in 1:length(fitObj$candidateRareSigsCosSim)){
sampleName <- names(fitObj$candidateRareSigsCosSim)[i]
candidates <- fitObj$candidateRareSigsCosSim[[sampleName]]
cat(indent,"\t\t\"",sampleName,"\": {\n",sep = "")
for (j in 1:length(candidates)) {
cat(indent,"\t\t\t\"",names(candidates)[j],"\": ",candidates[j],sep = "")
if(j < length(candidates)) cat(",")
cat("\n")
}
cat(indent,"\t\t}",sep = "")
if(i < length(fitObj$rareSigChoice)) cat(",")
cat("\n")
}
cat(indent,"\t},\n",sep = "")
}else{
cat(indent,"\t\"candidateRareSigsCosSim\": null,\n",sep = "")
}
if(length(fitObj$candidateRareSigsError)>0){
cat(indent,"\t\"candidateRareSigsError\": {\n",sep = "")
for (i in 1:length(fitObj$candidateRareSigsError)){
sampleName <- names(fitObj$candidateRareSigsError)[i]
candidates <- fitObj$candidateRareSigsError[[sampleName]]
cat(indent,"\t\t\"",sampleName,"\": {\n",sep = "")
for (j in 1:length(candidates)) {
cat(indent,"\t\t\t\"",names(candidates)[j],"\": ",candidates[j],sep = "")
if(j < length(candidates)) cat(",")
cat("\n")
}
cat(indent,"\t\t}",sep = "")
if(i < length(fitObj$rareSigChoice)) cat(",")
cat("\n")
}
cat(indent,"\t},\n",sep = "")
}else{
cat(indent,"\t\"candidateRareSigsError\": null,\n",sep = "")
}
tableToJSON(fitObj$catalogues,tablename = "catalogues",nindent = nindent + 1)
cat(",\n")
tableToJSON(fitObj$commonSignatures,tablename = "commonSignatures",nindent = nindent + 1)
cat(",\n")
tableToJSON(fitObj$rareSignatures,tablename = "rareSignatures",nindent = nindent + 1)
cat(",\n")
if(!is.null(fitObj$giniThresholdsTable)){
tableToJSON(fitObj$giniThresholdsTable,tablename = "giniThresholdsTable",nindent = nindent + 1)
cat(",\n")
}else{
cat(indent,"\t\"giniThresholdsTable\": null,\n",sep = "")
}
tableToJSON(t(fitObj$exposures),tablename = "exposures",nindent = nindent + 1)
cat(",\n")
tableToJSON(fitObj$reconstructed_catalogues,tablename = "reconstructed_catalogues",nindent = nindent + 1)
cat(",\n")
vectorToJSON(fitObj$cossim_catalogueVSreconstructed,"cossim_catalogueVSreconstructed",nindent = nindent + 1)
cat(",\n")
if(all(!is.null(fitObj$bootstrap_exposures_samples))){
cat(indent,"\t\"bootstrap_exposures_samples\": {\n",sep = "")
for (i in 1:length(fitObj$bootstrap_exposures_samples)) {
tableToJSON(fitObj$bootstrap_exposures_samples[[i]],tablename = names(fitObj$bootstrap_exposures_samples)[i],nindent = nindent + 2)
if(i<length(fitObj$bootstrap_exposures_samples)) cat(",")
cat("\n")
}
cat(indent,"\t}",sep = "")
}else{
cat(indent,"\t\"bootstrap_exposures_samples\": null",sep = "")
}
cat(",\n")
if(all(!is.null(fitObj$bootstrap_exposure_correlations))){
if(length(fitObj$bootstrap_exposure_correlations)>0){
cat(indent,"\t\"bootstrap_exposure_correlations\": {\n",sep = "")
for (i in 1:length(fitObj$bootstrap_exposure_correlations)) {
tableToJSON(fitObj$bootstrap_exposure_correlations[[i]],tablename = names(fitObj$bootstrap_exposure_correlations)[i],nindent = nindent + 2)
if(i<length(fitObj$bootstrap_exposure_correlations)) cat(",")
cat("\n")
}
cat(indent,"\t}",sep = "")
}else{
cat(indent,"\t\"bootstrap_exposure_correlations\": null",sep = "")
}
}else{
cat(indent,"\t\"bootstrap_exposure_correlations\": null",sep = "")
}
cat(",\n")
cat(indent,"\t\"all_fits_samples\": {\n",sep = "")
for (i in 1:length(fitObj$samples)) {
sampleName <- names(fitObj$samples)[i]
contentNames <- names(fitObj$samples[[sampleName]])
cat(indent,"\t\t\"",sampleName,"\": {\n",sep = "")
fitToJSON(fitObj$samples[[sampleName]]$fitCommonOnly,disableFileWrite = T,blockname = "fitCommonOnly",nindent = nindent + 3)
if("fitWithRare" %in% contentNames){
cat(",\n")
rareNames <- names(fitObj$samples[[sampleName]][["fitWithRare"]])
cat(indent,"\t\t\t\"fitWithRare\": {\n",sep = "")
for(j in 1:length(rareNames)){
fitToJSON(fitObj$samples[[sampleName]]$fitWithRare[[rareNames[j]]],disableFileWrite = T,blockname = rareNames[j],nindent = nindent + 4)
if(j<length(rareNames)) cat(",")
cat("\n")
}
cat(indent,"\t\t\t}",sep = "")
}
cat("\n")
cat(indent,"\t\t}",sep = "")
if(i<length(fitObj$samples)) cat(",")
cat("\n")
}
cat(indent,"\t}",sep = "")
cat("\n")
cat(indent,"}\n",sep = "")
if(!disableFileWrite) sink()
}
#' Write the results from the Fit or FitMS function to JSON
#'
#' Writing of the results obtained with the Fit or FitMS function to JSON.
#'
#' @param fitObj object obtained from the Fit or FitMS function
#' @param compress if TRUE then compress the JSON file. This is recommended as a plain JSON text file can be quite large. Set to TRUE as default
#' @param filename output file name of the JSON file. If the parameter compress is TRUE then the ".zip" postfix will be added
#' @export
#' @examples
#' res <- Fit(catalogues,getOrganSignatures("Breast"))
#' writeFitResultsToJSON(res,"results/")
writeFitResultsToJSON <- function(fitObj,
compress = TRUE,
filename = "fitData.json"){
if(!is.null(fitObj$fitAlgorithm)){
if(fitObj$fitAlgorithm=="Fit"){
fitToJSON(fitObj = fitObj,
filename = filename)
}else if(fitObj$fitAlgorithm=="FitMS"){
fitMStoJSON(fitObj = fitObj,
filename = filename)
}else{
message("[error plotFitResults] unknown fitAlgorithm attribute ",fitObj$fitAlgorithm,", expecting Fit or FitMS. Input fitObj not recognised. ")
return(NULL)
}
if(file.exists(filename) & compress){
# zip the file
current_wd <- getwd()
outdir <- dirname(filename)
filebasename <- basename(filename)
setwd(outdir)
zip(zipfile = paste0(filebasename,".zip"),
files = filebasename)
unlink(filebasename)
setwd(current_wd)
}
}else{
message("[error plotFitResults] missing fitAlgorithm attribute, fitObj not recognised.")
return(NULL)
}
}
#' Save fit object to file
#'
#' This function saves a Fit or FitMS object to an R data file using a standard name
#'
#' @param fitObj object obtained from the Fit of FitMS function
#' @param filename file name where to save the fit object
#' @export
#' @examples
#' fitObj <- FitMS(catalogues,organ="Breast")
#' saveFitToFile(fitObj,"fit.rData")
saveFitToFile <- function(fitObj,filename,verbose=T){
if(verbose) message("[info saveFitToFile] saving ",fitObj$fitAlgorithm," object to file ",filename)
save(file = filename,fitObj)
}
#' Load fit object from file
#'
#' This function loads a Fit or FitMS object from an R data file.
#' In order to work, the file must have been generated using the saveFitToFile function,
#' which will save the fit object with a standard name.
#'
#' @param filename file name from which to load the fit object
#' @export
#' @examples
#' fitObj <- loadFitFromFile("fit.rData")
loadFitFromFile <- function(filename,verbose=T){
load(file = filename)
if(verbose) message("[info loadFitToFile] loaded ",fitObj$fitAlgorithm," object from file ",filename)
return(fitObj)
}
simpleBoxPlot <- function(dataToPlot,
outfile,
charttitle="",
ylab="",
pointsize=16,
plotpoints=TRUE){
# group names
groupNames <- colnames(dataToPlot)
# get the max size of the labels
maxnchar <- max(sapply(groupNames,function(x){
strwidth(x,units = "inch",ps = par(ps=pointsize))
}))
# plotting
plotheight <- 4
mar1 <- maxnchar+0.5
mar2 <- 1.5
mar3 <- 1
mar4 <- 0.5
width <- 0.8*length(groupNames)+mar2+mar4
height <- mar3 + mar1 + plotheight
cairo_pdf(filename = outfile,width = width,height = height,pointsize = pointsize)
par(mai=c(mar1,mar2,mar3,mar4),mgp=c(4,1,0))
bxres <- boxplot(dataToPlot,
las=2,
cex.axes=0.9,
lwd = 2,
ylab=ylab,
cex.main = 0.9,
names = groupNames,
col = "#FFFFFF",
border="#848482",
main=charttitle)
if(plotpoints){
for (pi in 1:ncol(dataToPlot)) {
dp <- dataToPlot[,colnames(dataToPlot)[pi]]
pointpos <- runif(length(dp),min = 0.2,max = 0.8)-0.5+pi
points(pointpos,dp,pch=16,col="#0067A599")
}
}
dev.off()
}
simpleBarChart <- function(dataToPlot,
outfile,
ylim = NULL,
ylab="",
charttitle="",
pointsize=16){
# group names
groupNames <- names(dataToPlot)
# get the max size of the labels
maxnchar <- max(sapply(groupNames,function(x){
strwidth(x,units = "inch",ps = par(ps=pointsize))
}))
# plotting
plotheight <- 4
mar1 <- maxnchar+0.5
mar2 <- 1.5
mar3 <- 1
mar4 <- 0.5
width <- 0.8*length(groupNames)+mar2+mar4
height <- mar3 + mar1 + plotheight
cairo_pdf(filename = outfile,width = width,height = height,pointsize = pointsize)
par(mai=c(mar1,mar2,mar3,mar4),mgp=c(4,1,0))
barplot(dataToPlot,
ylim = ylim,
ylab = ylab,
col = "#0067A599",
border = NA,
names.arg = groupNames,
las=2,
main = charttitle)
dev.off()
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.