#' HRDetect Pipeline
#'
#' Run the HRDetect pipeline. This function allows for flexible input
#' specification to the HRDetect pipeline that computes the HRDetect
#' BRCAness probability score as published in Davies et al. 2017.
#' It requires an input data frame "data_matrix", which contains a sample
#' in each row and one of six necessary features in each column.
#' The six features can be computed by the pipeline if the necessary input files are provided.
#' The six features are:
#' 1) proportion of deletions at microhomology (del.mh.prop),
#' 2) number of mutations of substitution signature 3 (SNV3),
#' 3) number of mutations of rearrangemet signature 3 (SV3),
#' 4) number of mutations of rearrangemet signature 5 (SV5),
#' 5) HRD LOH index (hrd),
#' 6) number of mutations of substitution signature 8 (SNV8).
#' For example, if the HRD LOH index has already been calculated, these can be
#' added to the input data_matrix, or if the SNV catalogues have already been calculated,
#' these can be supplied using the SNV_catalogues parameter while setting SNV3 and SNV8 columns
#' as "NA". Also, it is possible to provide different data for different samples. For example,
#' one can provide SNV3 and SNV8 number of mutations for some samples in data_matrix, while setting
#' SNV3 and SNV8 to NA for other samples, and providing either SNV catalogues and/or SNV VCF
#' files for these samples. The function will return the HRDetect BRCAness probability score for
#' all the samples for which enough data are available to calculate all six necessary features.
#' Along with the score, the contribution of each feature to the score will be provided. In addition,
#' an updated data_matrix and other other data that have been calculated during the execution of the pipeline
#' will be returned as well.
#'
#' Single Nucleotide Variations. Columns in data_matrix relative to SNV are SNV3 and SNV8. Values
#' corresponding to number of SNV3 and SNV8 mutations in each sample can be provided in the data frame data_matrix.
#' Alternatively, an SNV_catalogue data frame can be used to provide 96-channel SNV catalogues for the samples
#' (96-channels as rows and samples as columns). The 30 consensus COSMIC signatures will be fitted to
#' the catalogues using a bootstrapping approach (Huang et al. 2017) and estimates for SNV3 and SNV8 will be added to the data_matrix.
#' Alternatively, SNV_catalogues can be constructed providing a list of either SNV VCF files or SNV TAB files.
#'
#' Structural Variants (Rearrangements). Columns in data_matrix relative to SV are SV3 and SV5. Values
#' corresponding to number of SV3 and SV5 rearrangements in each sample can be provided in the data frame data_matrix.
#' Alternatively, an SV_catalogue data frame can be used to provide 32-channel SV catalogues for the samples
#' (32-channels as rows and samples as columns). The 6 Breast Cancer Rearrangement signatures (Nik-Zainal et al. 2016) will be fitted to
#' the catalogues using a bootstrapping approach (Huang et al. 2017) and estimates for SV3 and SV5 will be added to the data_matrix.
#' Alternatively, SV_catalogues can be constructed providing a list of SV BEDPE files.
#'
#' Deletions at Micro-homology (Indels). The column in data_matrix corresponding to the proportion of deletions at micro-homology is del.mh.prop.
#' The proportion of deletions at micro-homology for the samples can be calculated by the pipeline if the user provides Indels VCF files.
#'
#' HRD-LOH index (CNV). The column in data_matrix corresponding to the HRD-LOH index is hrd.
#' The HRD-LOH index for the samples can be calculated by the pipeline if the user provides copy numbers TAB files.
#'
#' @param data_matrix data frame containing a sample for each row and the six necessary features as columns. Columns should be labelled with the following names: del.mh.prop, SNV3, SV3, SV5, hrd, SNV8. Row names of the data frame should correspond to the sample names. If the values of the features need to be computed, set them to NA and provide additional data (e.g. catalogues, VCF/BEDPE/TAB files as specified in this documentation page).
#' @param genome.v genome version to use when constructing the SNV catalogue and classifying indels. Set it to either "hg19" or "hg38".
#' @param SNV_vcf_files list of file names corresponding to SNV VCF files to be used to construct 96-channel substitution catalogues. This should be a named vector, where the names indicate the sample name, so that each file can be matched to the corresponding row in the data_matrix input. The files should only contain SNV and should already be filtered according to the user preference, as all SNV in the file will be used and no filter will be applied.
#' @param SNV_tab_files list of file names corresponding to SNV TAB files to be used to construct 96-channel substitution catalogues. This should be a named vector, where the names indicate the sample name, so that each file can be matched to the corresponding row in the data_matrix input. The files should only contain SNV and should already be filtered according to the user preference, as all SNV in the file will be used and no filter will be applied. The files should contain a header in the first line with the following columns: chr, position, REF, ALT.
#' @param SNV_catalogues data frame containing 96-channel substitution catalogues. A sample for each column and the 96-channels as rows. Row names should have the correct channel names (see for example tests/testthat/test.snv.tab) and the column names should be the sample names so that each catalogue can be matched with the corresponding row in the data_matrix input.
#' @param Indels_vcf_files list of file names corresponding to Indels VCF files to be used to classify Indels and compute the proportion of indels at micro-homology. This should be a named vector, where the names indicate the sample name, so that each file can be matched to the corresponding row in the data_matrix input. The files should only contain indels (no SNV) and should already be filtered according to the user preference, as all indels in the file will be used and no filter will be applied.
#' @param CNV_tab_files list of file names corresponding to CNV TAB files (similar to ASCAT format) to be used to compute the HRD-LOH index. This should be a named vector, where the names indicate the sample name, so that each file can be matched to the corresponding row in the data_matrix input. The files should contain a header in the first line with the following columns: 'seg_no', 'Chromosome', 'chromStart', 'chromEnd', 'total.copy.number.inNormal', 'minor.copy.number.inNormal', 'total.copy.number.inTumour', 'minor.copy.number.inTumour'
#' @param SV_bedpe_files list of file names corresponding to SV (Rearrangements) BEDPE files to be used to construct 32-channel rearrangement catalogues. This should be a named vector, where the names indicate the sample name, so that each file can be matched to the corresponding row in the data_matrix input. The files should contain a rearrangement for each row (two breakpoint positions should be on one row as determined by a pair of mates of paired-end sequencing) and should already be filtered according to the user preference, as all rearrangements in the file will be used and no filter will be applied. The files should contain a header in the first line with the following columns: "chrom1", "start1", "end1", "chrom2", "start2", "end2" and "sample" (sample name). In addition, either two columns indicating the strands of the mates, "strand1" (+ or -) and "strand2" (+ or -), or one column indicating the structural variant class, "svclass": translocation, inversion, deletion, tandem-duplication. The column "svclass" should correspond to (Sanger BRASS convention): inversion (strands +/- or -/+ and mates on the same chromosome), deletion (strands +/+ and mates on the same chromosome), tandem-duplication (strands -/- and mates on the same chromosome), translocation (mates are on different chromosomes)..
#' @param SV_catalogues data frame containing 32-channel substitution catalogues. A sample for each column and the 32-channels as rows. Row names should have the correct channel names (see for example tests/testthat/test.cat) and the column names should be the sample names so that each catalogue can be matched with the corresponding row in the data_matrix input.
#' @param signature_type either "COSMIC" or one of the following organs: "Biliary", "Bladder", "Bone_SoftTissue", "Breast", "Cervix", "CNS", "Colorectal", "Esophagus", "Head_neck", "Kidney", "Liver", "Lung", "Lymphoid", "Ovary", "Pancreas", "Prostate", "Skin", "Stomach", "Uterus"
#' @param bootstrap_scores perform HRDetect score with bootstrap. This requires mutations or catalogues for subs/rearr to compute the bootstrap fit, and indels mutations to bootstrap the indels classification. HRD-LOH can still be provided using the input data_matrix.
#' @param nparallel how many parallel threads to use.
#' @return return a list that contains $data_matrix (updated input data_matrix with additional computed features), $hrdetect_output (data frame with HRDetect BRCAness Probability and contribution of the features), $SNV_catalogues (input SNV_catalogues updated with additional computed substitution catalogues if any), $SV_catalogues (input SV_catalogues updated with additional computed rearrangement catalogues if any)
#' @export
#' @references Davies, H., Glodzik, D., Morganella, S., Yates, L. R., Staaf, J., Zou, X., ... Nik-Zainal, S. (2017). HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nature Medicine, 23(4), 517–525. https://doi.org/10.1038/nm.4292
#' @references Nik-Zainal, S., Davies, H., Staaf, J., Ramakrishna, M., Glodzik, D., Zou, X., ... Stratton, M. R. (2016). Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature, 534(7605), 1–20. https://doi.org/10.1038/nature17676
#' @references Huang, X., Wojtowicz, D., & Przytycka, T. M. (2017). Detecting Presence Of Mutational Signatures In Cancer With Confidence. bioRxiv, (October). https://doi.org/10.1101/132597
#'
#' @export
HRDetect_pipeline <- function(data_matrix=NULL,
genome.v="hg19",
SNV_vcf_files=NULL,
SNV_tab_files=NULL,
SNV_catalogues=NULL,
Indels_vcf_files=NULL,
CNV_tab_files=NULL,
SV_bedpe_files=NULL,
SV_catalogues=NULL,
signature_type="COSMIC",
cosmic_siglist=NULL,
bootstrap_scores=FALSE,
nparallel=1){
#if multiple parallel cores are used, set it here
doMC::registerDoMC(nparallel)
#check that the matrix has correct features (columns)
col_needed <- c("del.mh.prop", "SNV3", "SV3", "SV5", "hrd", "SNV8")
if (!is.null(data_matrix)){
if (!length(intersect(col_needed,colnames(data_matrix)))==length(col_needed)){
stop("[error HRDetect_pipeline] incorrect data_matrix columns specified, you need the following columns: \"del.mh.prop\", \"SNV3\", \"SV3\", \"SV5\", \"hrd\", \"SNV8\"")
}
}else{
#there is no data_matrix, just build it from files
#get the sample names from all the file lists
sample_names <- unique(unlist(sapply(list(SNV_vcf_files,
SNV_tab_files,
SNV_catalogues,
Indels_vcf_files,
CNV_tab_files,
SV_bedpe_files,
SV_catalogues), function(x) names(x))))
if(!is.null(sample_names)){
data_matrix <- matrix(NA,ncol = 6,nrow = length(sample_names),dimnames = list(sample_names,col_needed))
}else{
stop("[error HRDetect_pipeline] no input data provided.")
}
}
samples_list <- rownames(data_matrix)
#check that the signature_type option is valid
#get available organs
available_organs <- intersect(unique(sapply(strsplit(colnames(all_organ_sigs_subs),split="_"),function(x) paste(x[1:(length(x)-1)],collapse = "_"))),
unique(sapply(strsplit(colnames(all_organ_sigs_rearr),split="_"),function(x) paste(x[1:(length(x)-1)],collapse = "_"))))
if(!(signature_type %in% c("COSMIC",available_organs))){
stop(paste0("[error HRDetect_pipeline] invalid signature_type ",signature_type,"."))
}
#use either COSMIC sigs or tissue-specific sigs
if(signature_type=="COSMIC"){
if (is.null(cosmic_siglist)) cosmic_siglist <- 1:30
sigstofit_subs <- cosmic30[,cosmic_siglist,drop=FALSE]
sigstofit_rearr <- RS.Breast560
}else{
sigstofit_subs <- all_organ_sigs_subs[,colnames(all_organ_sigs_subs)[grep(pattern = paste0("^",signature_type),colnames(all_organ_sigs_subs))]]
sigstofit_rearr <- all_organ_sigs_rearr[,colnames(all_organ_sigs_rearr)[grep(pattern = paste0("^",signature_type),colnames(all_organ_sigs_rearr))]]
}
#initialise bootstrap fit variables to NULL
bootstrap_fit_subs <- NULL
bootstrap_fit_rearr <- NULL
#initialise final exposures output table
exposures_subs <- NULL
exposures_rearr <- NULL
#initialise indels classification table
indels_classification_table <- NULL
#initialise hrdetect bootstrap tables
hrdetect_bootstrap_table <- NULL
q_5_50_95 <- NULL
#check whether SNV related columns are NA or incomplete and if so check whether the catalogues
#are available for sig fit, and if not check whether the vcf (or tab) files are available for building catalogues
message("[info HRDetect_pipeline] Single Nucleotide Variations")
SNV_cols <- c("SNV3","SNV8")
need_to_compute_SNVexposures <- any(is.na(data_matrix[,SNV_cols]))
if(need_to_compute_SNVexposures){
message("[info HRDetect_pipeline] Some samples in the input data_matrix do not have the exposures for SNV3 and SNV8, checking if the user supplied SNV catalogues, VCF files or TAB files for those samples.")
#find out which samples that have no exposures have vcf file or catalogue
incomplete_samples_pos <- which(apply(data_matrix[,SNV_cols,drop=FALSE],1,function(x) any(is.na(x))))
incomplete_samples <- rownames(data_matrix)[incomplete_samples_pos]
if (!is.null(SNV_catalogues)){
incomplete_samples_with_catalogueSNV <- intersect(incomplete_samples,colnames(SNV_catalogues))
}else{
#there is no SNV catalogue given, so no incomplete sample has a catalogue
incomplete_samples_with_catalogueSNV <- character(0)
}
if (!is.null(SNV_vcf_files)){
incomplete_samples_with_vcfSNV <- intersect(incomplete_samples,names(SNV_vcf_files))
}else{
#there is no SNV vcf files given, so no incomplete sample has a vcf file
incomplete_samples_with_vcfSNV <- character(0)
}
if (!is.null(SNV_tab_files)){
incomplete_samples_with_tabSNV <- intersect(incomplete_samples,names(SNV_tab_files))
}else{
#there is no SNV tab files given, so no incomplete sample has a tab file
incomplete_samples_with_tabSNV <- character(0)
}
#now, check that if a sample has both catalogue and vcf (or tab) file, there is no need to compute the catalogue
incomplete_samples_with_vcfSNV <- setdiff(incomplete_samples_with_vcfSNV,incomplete_samples_with_catalogueSNV)
incomplete_samples_with_tabSNV <- setdiff(incomplete_samples_with_tabSNV,incomplete_samples_with_catalogueSNV)
#also if a sample has both vcf and tab file, use the vcf file
incomplete_samples_with_tabSNV <- setdiff(incomplete_samples_with_tabSNV,incomplete_samples_with_vcfSNV)
#initialise the SNV catalogues data frame if necessary
mut.order <- c("A[C>A]A","A[C>A]C","A[C>A]G","A[C>A]T","C[C>A]A","C[C>A]C","C[C>A]G","C[C>A]T","G[C>A]A","G[C>A]C","G[C>A]G","G[C>A]T","T[C>A]A","T[C>A]C","T[C>A]G","T[C>A]T","A[C>G]A","A[C>G]C","A[C>G]G","A[C>G]T","C[C>G]A","C[C>G]C","C[C>G]G","C[C>G]T","G[C>G]A","G[C>G]C","G[C>G]G","G[C>G]T","T[C>G]A","T[C>G]C","T[C>G]G","T[C>G]T","A[C>T]A","A[C>T]C","A[C>T]G","A[C>T]T","C[C>T]A","C[C>T]C","C[C>T]G","C[C>T]T","G[C>T]A","G[C>T]C","G[C>T]G","G[C>T]T","T[C>T]A","T[C>T]C","T[C>T]G","T[C>T]T","A[T>A]A","A[T>A]C","A[T>A]G","A[T>A]T","C[T>A]A","C[T>A]C","C[T>A]G","C[T>A]T","G[T>A]A","G[T>A]C","G[T>A]G","G[T>A]T","T[T>A]A","T[T>A]C","T[T>A]G","T[T>A]T","A[T>C]A","A[T>C]C","A[T>C]G","A[T>C]T","C[T>C]A","C[T>C]C","C[T>C]G","C[T>C]T","G[T>C]A","G[T>C]C","G[T>C]G","G[T>C]T","T[T>C]A","T[T>C]C","T[T>C]G","T[T>C]T","A[T>G]A","A[T>G]C","A[T>G]G","A[T>G]T","C[T>G]A","C[T>G]C","C[T>G]G","C[T>G]T","G[T>G]A","G[T>G]C","G[T>G]G","G[T>G]T","T[T>G]A","T[T>G]C","T[T>G]G","T[T>G]T")
if(is.null(SNV_catalogues)){
SNV_catalogues <- data.frame(row.names = mut.order)
}else{
SNV_catalogues <- SNV_catalogues[mut.order,,drop=FALSE]
}
#compute the SNV catalogue of samples with VCF files where necessary
if (length(incomplete_samples_with_vcfSNV)>0){
message("[info HRDetect_pipeline] VCF files will be converted to SNV catalogues for the following samples: ",paste(incomplete_samples_with_vcfSNV,collapse = " "))
cat_list <- foreach::foreach(sample=incomplete_samples_with_vcfSNV) %dopar% {
res <- vcfToSNVcatalogue(SNV_vcf_files[sample],genome.v = genome.v)
colnames(res$catalogue) <- sample
res$catalogue
}
#add new SNV catalogues to the catalogues matrix
for (i in 1:length(cat_list)){
newcat <- cat_list[[i]]
SNV_catalogues <- cbind(SNV_catalogues,newcat)
incomplete_samples_with_catalogueSNV <- c(incomplete_samples_with_catalogueSNV,colnames(newcat))
}
}
#compute the SNV catalogue of samples with TAB files where necessary
if (length(incomplete_samples_with_tabSNV)>0){
message("[info HRDetect_pipeline] TAB files will be converted to SNV catalogues for the following samples: ",paste(incomplete_samples_with_tabSNV,collapse = " "))
cat_list <- foreach::foreach(sample=incomplete_samples_with_tabSNV) %dopar% {
subs <- read.table(file = SNV_tab_files[sample],
sep = "\t",header = TRUE,check.names = FALSE,stringsAsFactors = FALSE)
res <- tabToSNVcatalogue(subs,genome.v = genome.v)
colnames(res$catalogue) <- sample
res$catalogue
}
#add new SNV catalogues to the catalogues matrix
for (i in 1:length(cat_list)){
newcat <- cat_list[[i]]
SNV_catalogues <- cbind(SNV_catalogues,newcat)
incomplete_samples_with_catalogueSNV <- c(incomplete_samples_with_catalogueSNV,colnames(newcat))
}
}
#compute exposures for the samples in the catalogueSNV that do not have exposures yet
#consider only the catalogues needed
SNV_catalogues_toFit <- SNV_catalogues[,incomplete_samples_with_catalogueSNV,drop=FALSE]
#run sigfit with bootstrap if there are samples in the SNV_catalogue
if(length(incomplete_samples_with_catalogueSNV)>0){
message("[info HRDetect_pipeline] Substitution signatures exposures will be estiamated for the following samples: ",paste(incomplete_samples_with_catalogueSNV,collapse = " "))
message("[info HRDetect_pipeline] Running Signature fit with 100 bootstraps. Increase sparsity by removing exposures with 5% threshold of total mutations and 0.05 threshold of p-value, i.e. exposure of a signature in a sample is set to zero if the probability of having less than 5% of total mutations assigned to that signature is greather than 0.05.")
bootstrap_fit_subs <- SignatureFit_withBootstrap(SNV_catalogues_toFit,
sigstofit_subs,
nboot = 100,
threshold_percent = 5,
threshold_p.value = 0.05,
verbose = FALSE,
nparallel = nparallel)
#add the resulting exposures to the data_matrix
exposures_subs <- bootstrap_fit_subs$E_median_filtered
exposures_subs[is.nan(exposures_subs)] <- 0
if(signature_type=="COSMIC"){
if("Signature.3" %in% rownames(exposures_subs)){
data_matrix[colnames(exposures_subs),"SNV3"] <- exposures_subs["Signature.3",]
}else{
data_matrix[colnames(exposures_subs),"SNV3"] <- 0
}
if("Signature.8" %in% rownames(exposures_subs)){
data_matrix[colnames(exposures_subs),"SNV8"] <- exposures_subs["Signature.8",]
}else{
data_matrix[colnames(exposures_subs),"SNV8"] <- 0
}
# data_matrix[colnames(exposures_subs),SNV_cols] <- t(exposures_subs[c("Signature.3","Signature.8"),])
}else{
#convert to reference signatures
exposures_subs <- t(conversion_matrix_subs[colnames(sigstofit_subs),]) %*% exposures_subs
exposures_subs <- exposures_subs[apply(exposures_subs, 1,sum)>0,,drop=FALSE]
#add to data_matrix if present
if("Ref.Sig.3" %in% rownames(exposures_subs)){
data_matrix[colnames(exposures_subs),"SNV3"] <- exposures_subs["Ref.Sig.3",]
}else{
data_matrix[colnames(exposures_subs),"SNV3"] <- 0
}
if("Ref.Sig.8" %in% rownames(exposures_subs)){
data_matrix[colnames(exposures_subs),"SNV8"] <- exposures_subs["Ref.Sig.8",]
}else{
data_matrix[colnames(exposures_subs),"SNV8"] <- 0
}
}
}
}
#check whether SV related columns are NA or incomplete and if so check whether the catalogues
#are available for sig fit, and if not check whether the BEDPE files are available for building catalogues
message("[info HRDetect_pipeline] Structural Variants (Rearrangements)")
SV_cols <- c("SV3","SV5")
need_to_compute_SVexposures <- any(is.na(data_matrix[,SV_cols]))
if(need_to_compute_SVexposures){
message("[info HRDetect_pipeline] Some samples in the input data_matrix do not have the exposures for SV3 and SV5, checking if the user supplied SV catalogues or BEDPE files for those samples.")
#find out which samples that have no exposures have BEDPE file or catalogue
incomplete_samples_pos <- which(apply(data_matrix[,SV_cols,drop=FALSE],1,function(x) any(is.na(x))))
incomplete_samples <- rownames(data_matrix)[incomplete_samples_pos]
if (!is.null(SV_catalogues)){
incomplete_samples_with_catalogueSV <- intersect(incomplete_samples,colnames(SV_catalogues))
}else{
#there is no SV catalogue given, so no incomplete sample has a catalogue
incomplete_samples_with_catalogueSV <- character(0)
}
if (!is.null(SV_bedpe_files)){
incomplete_samples_with_bedpeSV <- intersect(incomplete_samples,names(SV_bedpe_files))
}else{
#there is no SV bedpe files given, so no incomplete sample has a bedpe file
incomplete_samples_with_bedpeSV <- character(0)
}
#now, check that if a sample has both catalogue and BEDPE file, there is no need to compute the catalogue
incomplete_samples_with_bedpeSV <- setdiff(incomplete_samples_with_bedpeSV,incomplete_samples_with_catalogueSV)
#initialise the SV catalogues data frame if necessary
catalogue.labels <- c('clustered_del_1-10Kb', 'clustered_del_10-100Kb', 'clustered_del_100Kb-1Mb', 'clustered_del_1Mb-10Mb', 'clustered_del_>10Mb', 'clustered_tds_1-10Kb', 'clustered_tds_10-100Kb', 'clustered_tds_100Kb-1Mb', 'clustered_tds_1Mb-10Mb', 'clustered_tds_>10Mb', 'clustered_inv_1-10Kb', 'clustered_inv_10-100Kb', 'clustered_inv_100Kb-1Mb', 'clustered_inv_1Mb-10Mb', 'clustered_inv_>10Mb', 'clustered_trans', 'non-clustered_del_1-10Kb', 'non-clustered_del_10-100Kb', 'non-clustered_del_100Kb-1Mb', 'non-clustered_del_1Mb-10Mb', 'non-clustered_del_>10Mb', 'non-clustered_tds_1-10Kb', 'non-clustered_tds_10-100Kb', 'non-clustered_tds_100Kb-1Mb', 'non-clustered_tds_1Mb-10Mb', 'non-clustered_tds_>10Mb', 'non-clustered_inv_1-10Kb', 'non-clustered_inv_10-100Kb', 'non-clustered_inv_100Kb-1Mb', 'non-clustered_inv_1Mb-10Mb', 'non-clustered_inv_>10Mb', 'non-clustered_trans')
if(is.null(SV_catalogues)){
SV_catalogues <- data.frame(row.names = catalogue.labels)
}else{
SV_catalogues <- SV_catalogues[catalogue.labels,,drop=FALSE]
}
#compute the SV catalogue of samples with BEDPE files where necessary
if (length(incomplete_samples_with_bedpeSV)>0){
message("[info HRDetect_pipeline] BEDPE files will be converted to Rearrangement catalogues for the following samples: ",paste(incomplete_samples_with_bedpeSV,collapse = " "))
cat_list <- foreach::foreach(sample=incomplete_samples_with_bedpeSV) %dopar% {
sv_bedpe <- read.table(SV_bedpe_files[sample],sep = "\t",header = TRUE,
stringsAsFactors = FALSE,check.names = FALSE,comment.char = "")
reslist <- bedpeToRearrCatalogue(sv_bedpe)
res <- reslist$rearr_catalogue
colnames(res) <- sample
res
}
#add new SV catalogues to the catalogues matrix
for (i in 1:length(cat_list)){
newcat <- cat_list[[i]]
SV_catalogues <- cbind(SV_catalogues,newcat)
incomplete_samples_with_catalogueSV <- c(incomplete_samples_with_catalogueSV,colnames(newcat))
}
}
#compute exposures for the samples in the catalogueSV that do not have exposures yet
#consider only the catalogues needed
SV_catalogues_toFit <- SV_catalogues[,incomplete_samples_with_catalogueSV,drop=FALSE]
#run sigfit with bootstrap if there are samples in the SV_catalogue
if(length(incomplete_samples_with_catalogueSV)>0){
message("[info HRDetect_pipeline] Rearrangement signatures exposures will be estiamated for the following samples: ",paste(incomplete_samples_with_catalogueSV,collapse = " "))
message("[info HRDetect_pipeline] Running Signature fit with 100 bootstraps. Increase sparsity by removing exposures with 5% threshold of total mutations and 0.05 threshold of p-value, i.e. exposure of a signature in a sample is set to zero if the probability of having less than 5% of total mutations assigned to that signature is greather than 0.05.")
bootstrap_fit_rearr <- SignatureFit_withBootstrap(SV_catalogues_toFit,
sigstofit_rearr,
nboot = 100,
threshold_percent = 5,
threshold_p.value = 0.05,
verbose = FALSE,
nparallel = nparallel)
#add the resulting exposures to the data_matrix
exposures_rearr <- bootstrap_fit_rearr$E_median_filtered
exposures_rearr[is.nan(exposures_rearr)] <- 0
if(signature_type=="COSMIC"){
data_matrix[colnames(exposures_rearr),SV_cols] <- t(exposures_rearr[c("RS3","RS5"),])
}else{
#convert to reference signatures
exposures_rearr <- t(conversion_matrix_rearr[colnames(sigstofit_rearr),]) %*% exposures_rearr
exposures_rearr <- exposures_rearr[apply(exposures_rearr, 1,sum)>0,,drop=FALSE]
#add to data_matrix if present
if("RefSig R3" %in% rownames(exposures_rearr)){
data_matrix[colnames(exposures_rearr),"SV3"] <- exposures_rearr["RefSig R3",]
}else{
data_matrix[colnames(exposures_rearr),"SV3"] <- 0
}
if("RefSig R5" %in% rownames(exposures_rearr)){
data_matrix[colnames(exposures_rearr),"SV5"] <- exposures_rearr["RefSig R5",]
}else{
data_matrix[colnames(exposures_rearr),"SV5"] <- 0
}
if("RefSig R9" %in% rownames(exposures_rearr)){
data_matrix[colnames(exposures_rearr),"SV5"] <- data_matrix[colnames(exposures_rearr),"SV5"] + exposures_rearr["RefSig R9",]
}
}
}
}
#check whether small deletion at micro-homology (MH) related column (del.mh.prop) is NA or incomplete, and if so
#check whether the VCF indel files are available for computing the proportion of indels with MH
message("[info HRDetect_pipeline] Deletions at Micro-homology (Indels)")
MH_cols <- c("del.mh.prop")
need_to_compute_MH <- any(is.na(data_matrix[,MH_cols]))
if(need_to_compute_MH){
message("[info HRDetect_pipeline] Some samples in the input data_matrix do not have the proportion of deletions at micro-homology, checking if the user supplied VCF indels files for those samples.")
#find out which samples that have no del.mh.prop have vcf indel file
incomplete_samples_pos <- which(apply(data_matrix[,MH_cols,drop=FALSE],1,function(x) any(is.na(x))))
incomplete_samples <- rownames(data_matrix)[incomplete_samples_pos]
if (!is.null(Indels_vcf_files)){
incomplete_samples_with_vcfIndels <- intersect(incomplete_samples,names(Indels_vcf_files))
}else{
#there is no Indels vcf files given, so no incomplete sample has a vcf file
incomplete_samples_with_vcfIndels <- character(0)
}
#compute del.mh.prop for the samples with vcf Indels file that do not have del.mh.prop yet
if(length(incomplete_samples_with_vcfIndels)>0){
message("[info HRDetect_pipeline] Proportion of Indels with MH will be computed for the following samples: ",paste(incomplete_samples_with_vcfIndels,collapse = " "))
mh_list <- foreach::foreach(sample=incomplete_samples_with_vcfIndels) %dopar% {
res <- vcfToIndelsClassification(Indels_vcf_files[sample],sample,genome.v = genome.v)
res$count_proportion
}
#combine in one table and add to data_matrix
indels_classification_table <- do.call(rbind,mh_list)
rownames(indels_classification_table) <- indels_classification_table[,"sample"]
data_matrix[rownames(indels_classification_table),MH_cols] <- indels_classification_table[,"del.mh.prop"]
}
}
#check whether HRD-LOH related column (hrd) is NA or incomplete, and if so
#check whether the tab CNV files are available for computing the HRD-LOH index
message("[info HRDetect_pipeline] HRD-LOH index (CNV)")
hrd_cols <- c("hrd")
need_to_compute_hrd <- any(is.na(data_matrix[,hrd_cols]))
if(need_to_compute_hrd){
message("[info HRDetect_pipeline] Some samples in the input data_matrix do not have the HRD-LOH index, checking if the user supplied tab CNV files for those samples.")
#find out which samples that have no HRD-LOH have tab CNV file
incomplete_samples_pos <- which(apply(data_matrix[,hrd_cols,drop=FALSE],1,function(x) any(is.na(x))))
incomplete_samples <- rownames(data_matrix)[incomplete_samples_pos]
if (!is.null(CNV_tab_files)){
incomplete_samples_with_tabCNV <- intersect(incomplete_samples,names(CNV_tab_files))
}else{
#there is no CNV tab files given, so no incomplete sample has a CNV tab file
incomplete_samples_with_tabCNV <- character(0)
}
#compute HRD-LOH for the samples with CNV tab file that do not have hrd yet
if(length(incomplete_samples_with_tabCNV)>0){
message("[info HRDetect_pipeline] HRD-LOH will be computed for the following samples: ",paste(incomplete_samples_with_tabCNV,collapse = " "))
hrd_list <- foreach::foreach(sample=incomplete_samples_with_tabCNV) %dopar% {
ascat.df <- read.table(CNV_tab_files[sample],sep = "\t",header = TRUE,
stringsAsFactors = FALSE,check.names = FALSE)
res <- ascatToHRDLOH(ascat.df,sample)
names(res) <- sample
res
}
#add resulting hrd to the data_matrix
for (i in 1:length(hrd_list)){
res <- hrd_list[[i]]
data_matrix[names(res),hrd_cols] <- res
}
}
}
#now, compute HRDetect score for all complete cases, so first notify of incomplete cases if any
incomplete_cases <- rownames(data_matrix)[!complete.cases(data_matrix)]
if(length(incomplete_cases)>0){
message("[info HRDetect_pipeline] Some samples do not have data necessary to compute the HRDetect score (check output $data_matrix). Will not compute HRDetect score for the following samples: ",paste(incomplete_cases,collapse = " "))
hrdetect_input <- data_matrix[complete.cases(data_matrix),,drop=FALSE]
}else{
hrdetect_input <- data_matrix
}
if(nrow(hrdetect_input)>0){
message("[info HRDetect_pipeline] Computing HRDetect score and feature contributions for the following samples: ",paste(rownames(hrdetect_input),collapse = " "))
hrdetect_output <- applyHRDetectDavies2017(hrdetect_input, attachContributions = TRUE)
#attempt to run HRDetect with bootstrap if requested
if(bootstrap_scores){
message("[info HRDetect_pipeline] HRDetect boostrap scores requested!")
#check whether we have all we need and for which samples
bootstrap_samples <- c()
#bootstrap exposures are required
if(!is.null(bootstrap_fit_subs)) bootstrap_samples <- colnames(bootstrap_fit_subs$E_median_filtered)
if(!is.null(bootstrap_fit_rearr)) bootstrap_samples <- intersect(bootstrap_samples,colnames(bootstrap_fit_rearr$E_median_filtered))
#indels classification tables are required
if(!is.null(indels_classification_table)) bootstrap_samples <- intersect(bootstrap_samples,rownames(indels_classification_table))
#If samples are in the hrdetect_input table then they also have the HRD-LOH score
bootstrap_samples <- intersect(bootstrap_samples,rownames(hrdetect_input))
if(length(bootstrap_samples)>0){
message("[info HRDetect_pipeline] Computing HRDetect boostrap for the following samples: ",paste(bootstrap_samples,collapse = ", "))
#run hrdetect bootstrap scores for the bootstrap_samples
nboots_hr <- 1000
hrdetect_bootstrap_table <- list()
for (j in 1:nboots_hr){
tmp_data_matrix <- data_matrix[bootstrap_samples,,drop=FALSE]
tmp_data_matrix[1:nrow(tmp_data_matrix),1:ncol(tmp_data_matrix)] <- NA
#1) sample del.mh.prop
deletions_muts <- t(indels_classification_table[,c("del.mh.count","del.rep.count","del.none.count")])[,bootstrap_samples,drop=FALSE]
ndeletions <- apply(deletions_muts,2,sum)
samples_deletions_muts <- generateRandMuts(deletions_muts)
samples_ndeletions <- apply(samples_deletions_muts,2,sum)
samples_deletions_prop <- t(samples_deletions_muts/matrix(rep(samples_ndeletions,3),nrow = nrow(samples_deletions_muts),ncol = ncol(samples_deletions_muts),byrow = TRUE))
colnames(samples_deletions_prop) <- c("del.mh.prop","del.rep.prop","del.none.prop")
tmp_data_matrix[bootstrap_samples,"del.mh.prop"] <- samples_deletions_prop[bootstrap_samples,"del.mh.prop"]
#2) sample HRD-LOH index
tmp_data_matrix[bootstrap_samples,"hrd"] <- rpois(length(bootstrap_samples),data_matrix[bootstrap_samples,"hrd"])
#3) sample subs exposures
subs_boots <- bootstrap_fit_subs$boot_list
s1 <- sample(length(subs_boots),size = 1)
current_subs <- t(subs_boots[[s1]])
current_subs[is.na(current_subs)] <- 0
#sparsity correction 5%
sel <- t(apply(current_subs,1,function(x) x<sum(x)*0.05))
current_subs[sel] <- 0
if(signature_type=="COSMIC"){
if("Signature.3" %in% colnames(current_subs)){
tmp_data_matrix[bootstrap_samples,"SNV3"] <- current_subs[bootstrap_samples,"Signature.3"]
}else{
tmp_data_matrix[bootstrap_samples,"SNV3"] <- 0
}
if("Signature.8" %in% colnames(current_subs)){
tmp_data_matrix[bootstrap_samples,"SNV8"] <- current_subs[bootstrap_samples,"Signature.8"]
}else{
tmp_data_matrix[bootstrap_samples,"SNV8"] <- 0
}
}else{
#convert to ref sig
exposures_RefSigs <- t(as.matrix(current_subs) %*% as.matrix(conversion_matrix_subs[colnames(current_subs),]))
exposures_RefSigs <- exposures_RefSigs[apply(exposures_RefSigs, 1,sum)>0,,drop=FALSE]
if("Ref.Sig.3" %in% rownames(exposures_RefSigs)){
tmp_data_matrix[bootstrap_samples,"SNV3"] <- exposures_RefSigs["Ref.Sig.3",bootstrap_samples]
}else{
tmp_data_matrix[bootstrap_samples,"SNV3"] <- 0
}
if("Ref.Sig.8" %in% rownames(exposures_RefSigs)){
tmp_data_matrix[bootstrap_samples,"SNV8"] <- exposures_RefSigs["Ref.Sig.8",bootstrap_samples]
}else{
tmp_data_matrix[bootstrap_samples,"SNV8"] <- 0
}
}
#4) sample rearr exposures
rearr_boots <- bootstrap_fit_rearr$boot_list
s1 <- sample(length(rearr_boots),size = 1)
current_rearr <- t(rearr_boots[[s1]])
current_rearr[is.na(current_rearr)] <- 0
#sparsity correction 5%
sel <- t(apply(current_rearr,1,function(x) x<sum(x)*0.05))
current_rearr[sel] <- 0
if(signature_type=="COSMIC"){
tmp_data_matrix[bootstrap_samples,"SV3"] <- current_rearr[bootstrap_samples,"RS3"]
tmp_data_matrix[bootstrap_samples,"SV5"] <- current_rearr[bootstrap_samples,"RS5"]
}else{
#convert to ref sig
exposures_RefSigs <- t(as.matrix(current_rearr) %*% as.matrix(conversion_matrix_rearr[colnames(current_rearr),]))
exposures_RefSigs <- exposures_RefSigs[apply(exposures_RefSigs, 1,sum)>0,,drop=FALSE]
if("RefSig R3" %in% rownames(exposures_RefSigs)){
tmp_data_matrix[bootstrap_samples,"SV3"] <- exposures_RefSigs["RefSig R3",bootstrap_samples]
}else{
tmp_data_matrix[bootstrap_samples,"SV3"] <- 0
}
if("RefSig R5" %in% rownames(exposures_RefSigs)){
tmp_data_matrix[bootstrap_samples,"SV5"] <- exposures_RefSigs["RefSig R5",bootstrap_samples]
}else{
tmp_data_matrix[bootstrap_samples,"SV5"] <- 0
}
if("RefSig R9" %in% rownames(exposures_RefSigs)){
tmp_data_matrix[bootstrap_samples,"SV5"] <- tmp_data_matrix[bootstrap_samples,"SV5"] + exposures_RefSigs["RefSig R9",bootstrap_samples]
}
}
hrdetect_bootstrap_table[[j]] <- t(applyHRDetectDavies2017(data_matrix = tmp_data_matrix,attachContributions = FALSE))
}
hrdetect_bootstrap_table <- do.call(rbind,hrdetect_bootstrap_table)
rownames(hrdetect_bootstrap_table) <- 1:nrow(hrdetect_bootstrap_table)
q_5_50_95 <- t(apply(hrdetect_bootstrap_table,2,function(x) quantile(x,c(0.05,0.5,0.95))))
message("[info HRDetect_pipeline] HRDetect with bootstrap successful!")
}else{
message("[info HRDetect_pipeline] Not enough data to run HRDetect boostrap scores.")
}
}
message("[info HRDetect_pipeline] HRDetect pipeline completed!")
}else{
message("[info HRDetect_pipeline] Impossible to compute any HRDetect score, as no sample seems to have all the necessary data. Check output $data_matrix to see what has been computed with the data provided.")
hrdetect_output <- NULL
}
#--- return results ---
res <- list()
res$data_matrix <- data_matrix
res$SV_catalogues <- SV_catalogues
res$SNV_catalogues <- SNV_catalogues
res$hrdetect_output <- hrdetect_output
res$exposures_subs <- exposures_subs
res$exposures_rearr <- exposures_rearr
res$bootstrap_fit_subs <- bootstrap_fit_subs
res$bootstrap_fit_rearr <- bootstrap_fit_rearr
res$indels_classification_table <- indels_classification_table
res$hrdetect_bootstrap_table <- hrdetect_bootstrap_table
res$q_5_50_95 <- q_5_50_95
return(res)
}
#' Apply HRDetect (Davies et al. 2017)
#'
#' Apply HRDetect with the coeffcients from the Davies et al. 2017 publication.
#' This function requires a data frame with six features (columns) for each sample (rows),
#' and a list of features indicating which columns of the given matrix correspond to the required features.
#' Required features are:
#' 1) proportion of deletions with microhomology (del.mh.prop),
#' 2) number of mutations of substitution signature 3 (SNV3),
#' 3) number of mutations of rearrangemet signature 3 (SV3),
#' 4) number of mutations of rearrangemet signature 5 (SV5),
#' 5) HRD LOH index (hrd),
#' 6) number of mutations of substitution signature 8 (SNV8).
#' The function will return the HRDetect BRCAness probabilities, along with (optionally) the contributions
#' of each of the six features in each samples. The contributions are the normalised (log transoform and standardise)
#' values of the features multiplied for the corresponding HRDetect logistic model coefficient.
#'
#' @param data_matrix data frame containing a row for each sample and at least the columns specified in the features_names parameter
#' @param features_names list of column names of the matrix data_matrix. These indicate the features to be passed to HRDetect and should correspond to (in the exact order):
#' 1) proportion of deletions with microhomology (del.mh.prop),
#' 2) number of mutations of substitution signature 3 (SNV3),
#' 3) number of mutations of rearrangemet signature 3 (SV3),
#' 4) number of mutations of rearrangemet signature 5 (SV5),
#' 5) HRD LOH index (hrd),
#' 6) number of mutations of substitution signature 8 (SNV8).
#' @param attachContributions set to TRUE if you would like to have the contributions of the individual features to the samples HRDetect BRCAness probabilities.
#' @export
#' @references Davies, H., Glodzik, D., Morganella, S., Yates, L. R., Staaf, J., Zou, X., ... Nik-Zainal, S. (2017). HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nature Medicine, 23(4), 517–525. https://doi.org/10.1038/nm.4292
#' @examples
#' BRCAprob <- applyHRDetectDavies2017(data_matrix,features_names)
applyHRDetectDavies2017 <- function(data_matrix,features_names=c("del.mh.prop","SNV3","SV3","SV5","hrd","SNV8"),attachContributions=FALSE){
#data_matrix: rows are samples and columns are features
#features_names: give the names of the features in data_matrix that correspond, in order, to:
# 1) proportion of deletions with microhomology
# 2) number of mutations of substitution signature 3
# 3) number of mutations of rearrangemet signature 3
# 4) number of mutations of rearrangemet signature 5
# 5) HRD LOH index
# 6) number of mutations of substitution signature 8
features_mean <- c(0.218,
2.096,
1.260,
1.935,
2.195,
4.390)
features_sd <- c(0.090,
3.555,
1.657,
1.483,
0.750,
3.179)
features_weight <- c(2.398,
1.611,
1.153,
0.847,
0.667,
0.091)
intercept <- -3.364
hrdetect_input <- data_matrix[,features_names,drop=FALSE]
#log and scale
hrdetect_input <- log(hrdetect_input + 1)
hrdetect_input <- hrdetect_input - matrix(rep(features_mean,nrow(hrdetect_input)),nrow = nrow(hrdetect_input),byrow = TRUE)
hrdetect_input <- hrdetect_input/matrix(rep(features_sd,nrow(hrdetect_input)),nrow = nrow(hrdetect_input),byrow = TRUE)
#get the score!
linear_part <- rep(intercept,nrow(hrdetect_input)) + apply(hrdetect_input*matrix(rep(features_weight,nrow(hrdetect_input)),nrow = nrow(hrdetect_input),byrow = TRUE),1,sum)
BRCA_prob <- 1/(1+exp(-linear_part))
BRCA_prob <- matrix(BRCA_prob,nrow = length(BRCA_prob),ncol = 1,dimnames = list(names(BRCA_prob),"Probability"))
if (attachContributions){
non_zero <- features_names
contrib <- hrdetect_input[,non_zero,drop=FALSE] * matrix(features_weight,ncol = length(non_zero),nrow = nrow(hrdetect_input),byrow = TRUE)
intercept_matrix <- matrix(intercept,ncol = 1,nrow = nrow(hrdetect_input),dimnames = list(rownames(hrdetect_input),"intercept"))
BRCA_prob <- cbind(intercept_matrix,contrib,BRCA_prob)
}
return(BRCA_prob)
}
#------------------------
#' plot_HRDLOH_HRDetect_Contributions
#'
#' Function for plotting HRDetect results and contributions to HRDetect score.
#'
#' @param file_name name of the output file (jpg)
#' @param HRDLOH_index HRD index score
#' @param hrdetect_output output of HRDetect, containing contributions of each feature
#' @export
plot_HRDLOH_HRDetect_Contributions <- function(file_name,HRDLOH_index,hrdetect_output){
col_needed <- c("del.mh.prop", "SNV3", "SV3", "SV5", "hrd", "SNV8")
#details plot
hrdetect_output <- as.data.frame(hrdetect_output)
reorder <- order(hrdetect_output$Probability,decreasing = TRUE)
nsamples <- nrow(hrdetect_output)
contributions <- hrdetect_output
jpeg(filename = file_name,
width = max(200+80*nsamples,2400),
height = 1200,
res = 200)
par(mfrow=c(1,1))
mat <- matrix(c(1,2,3),ncol = 1)
layout(mat, c(1), c(0.8,0.65,1))
par(mar=c(1,6,3,4))
barplot(height = HRDLOH_index[reorder],
ylim = c(0,max(HRDLOH_index)*1.1),
ylab = "HRD-LOH score",
cex.axis = 1.2,
cex = 1.5,
cex.lab=1.5)
title(main=paste0("HRD-LOH score and BRCA1/BRCA2 deficiency score"), cex.main=2)
abline(a=15,b=0,col="red")
par(mar=c(1,6,1,4))
barplot(height = hrdetect_output$Probability[reorder],
ylim = c(0,1),ylab = "BRCA1/BRCA2\n def. score",
names.arg = "",
cex.axis = 1.2,
cex.lab=1.5,
cex.names = 1.5)
abline(a=0.7,b=0,col="red")
#text(x = -0.05,y = 0.75,labels = "0.7",col = "red")
matrix_to_plot <- t(hrdetect_output[reorder,col_needed])
par(mar=c(8,6,1,4))
barplot(height = matrix_to_plot,
beside = TRUE,
#names.arg = row.names(hrdetect_res),
names.arg = rownames(hrdetect_output)[reorder],
las = 3,
legend.text = c("deletion with MH",
"Substitution Sig. 3",
"Rearrangement Sig. 3",
"Rearrangement Sig. 5",
"HRD-LOH score",
"Substitution Sig. 8"),
args.legend = list(x ='bottom', bty='n', inset=c(0,-0.9),horiz=TRUE,cex=1.5),
ylim = c(min(matrix_to_plot)*1.1,max(matrix_to_plot)*1.1),
col=c("blue","red","black","green","orange","yellow"),
ylab = "BRCA1/BRCA2\n def. contribution",
cex.axis = 1.2,
cex.lab = 1.5,
cex.names = 1)
dev.off()
}
#----------------------
#' plot_HRDetect_overall
#'
#' Overall plot of scores obtained from HRDetect.
#'
#' @param file_name name of the output file (jpg)
#' @param hrdetect_output output of HRDetect, containing contributions of each feature
#' @export
plot_HRDetect_overall <- function(file_name,hrdetect_output){
hrdetect_output <- as.data.frame(hrdetect_output)
#barplot
plot_colours <- rep("grey",nrow(hrdetect_output))
reorder <- order(hrdetect_output$Probability,decreasing = TRUE)
plot_colours <- plot_colours[reorder]
jpeg(filename = file_name,
width = 1500,
height = 900,
res = 200)
par(xpd=FALSE)
bp <- barplot(height = hrdetect_output$Probability[reorder],
names.arg = "",
#las = 2,
main = paste0("HRDetect probability score"),
ylim = c(0,1),
border = 0,
space = 0,
col = plot_colours,ylab = "score")
abline(h=0.7,col="red")
par(xpd=TRUE)
start <- 0
for(i in 1:length(plot_colours)){
rect(start, -0.1, start+1, -0.02,col = plot_colours[start+1],lwd = 0)
start <- start + 1
}
# legend("bottom",c("BRCA1","BRCA2","BRCA1 Meth","BRCA2.note","BRCA1/MUTYH","PALB2"),inset=c(0,-0.2), horiz = TRUE, col=c("red","blue","green","purple","orange","yellow"),bty = "n", cex = 0.7,lwd=5)
par(xpd=FALSE)
dev.off()
}
#----------------------
#' plot_HRDetect_BootstrapScores
#'
#' Overall plot of scores obtained from HRDetect with bootstrap.
#'
#' @param outdir output directory for the plot
#' @param hrdetect_res output of HRDetect_pipeline, ran with bootstrap_scores=TRUE
#' @param main plot title. If not specified: "Distribution of HRDetect scores"
#' @param mar plot margin. If not specified: c(9,4,3,2)
#' @param pwidth plot width in pixels. If not specified: max(1000,400+120*number_of_samples)
#' @param pheight plot height in pixels. If not specified: 1000
#' @param pres plot resolution. If not specified: 200
#' @export
plot_HRDetect_BootstrapScores <- function(outdir,
hrdetect_res,
main="Distribution of HRDetect scores",
mar=c(9,4,3,2),
pwidth=NULL,
pheight=1000,
pres=200){
#boxplot
bootstrap_samples <- rownames(hrdetect_res$q_5_50_95)
samples_labels <- bootstrap_samples
if(is.null(pwidth)) pwidth <- max(1000,400+120*length(bootstrap_samples))
o <- order(hrdetect_res$hrdetect_output[bootstrap_samples,"Probability"],decreasing = TRUE)
jpeg(filename = paste0(outdir,"/HRDetect_bootstrap.jpg"),width = pwidth,height = pheight,res = pres)
par(mar=mar)
boxplot(hrdetect_res$hrdetect_bootstrap_table, xaxt="n", yaxt="n",ylim = c(0,1),border="white")
grid()
lines(x=c(0,ncol(hrdetect_res$hrdetect_bootstrap_table)+1),y=c(0.7,0.7),lty=2,col="red")
boxplot(hrdetect_res$hrdetect_bootstrap_table[,o],ylim = c(0,1),main="",
ylab="HRDetect score",las=3,names=samples_labels[o],add = TRUE)
thisvalues <- hrdetect_res$hrdetect_bootstrap_table
# take the x-axis indices and add a jitter, proportional to the N in each level
for(i in 1:ncol(thisvalues)){
myjitter<-jitter(rep(i, length(thisvalues[,o[i]])), amount=0.2)
points(myjitter, thisvalues[,o[i]], pch=20, col=rgb(0,0,1,.1))
rect(xleft = i+0.15,ybottom = hrdetect_res$q_5_50_95[o[i],"5%"],xright = i+0.22,ytop = hrdetect_res$q_5_50_95[o[i],"95%"],col = "grey")
}
points(hrdetect_res$hrdetect_output[bootstrap_samples,"Probability"][o],pch=23,col="black",bg="white")
title(main = main,line = 1.8)
legend(x="topleft",legend = c("Davies et al. 2017"),pch = 23,col="black",bg="green",cex = 0.8,bty = "n",inset = c(0,-0.145),xpd = TRUE)
legend(x="topright",legend = c("5-95% CI"),col="black",fill="grey",cex = 0.8,bty = "n",inset = c(0,-0.145),xpd = TRUE)
legend(x="topleft",legend = c("classification threshold, Davies et al. 2017"),lty = 2,col="red",cex = 0.8,bty = "n",inset = c(0,-0.095),xpd = TRUE)
legend(x="topright",legend = c(paste0("n=",nrow(hrdetect_res$hrdetect_bootstrap_table))),lty = 2,col="white",cex = 0.8,bty = "n",inset = c(0,-0.095),xpd = TRUE)
dev.off()
}
#' plot_HRDetect_Contributions
#'
#' Function for plotting HRDetect results and contributions to HRDetect score.
#'
#' @param file_name name of the output file (jpg)
#' @param par_title title of the plot (optional)
#' @param hrdetect_output output of HRDetect, containing contributions of each feature
#' @export
plot_HRDetect_Contributions <- function(file_name,
hrdetect_output,
par_title = "HRDetect Score and Contributions"){
col_needed <- c("del.mh.prop", "SNV3", "SV3", "SV5", "hrd", "SNV8")
#details plot
hrdetect_output <- as.data.frame(hrdetect_output)
reorder <- order(hrdetect_output$Probability,decreasing = TRUE)
nsamples <- nrow(hrdetect_output)
contributions <- hrdetect_output
jpeg(filename = file_name,
width = max(200+120*nsamples,2400),
height = 1200,
res = 170)
par(mfrow=c(1,1),oma=c(0,0,1,0))
mat <- matrix(c(1,2),ncol = 1)
layout(mat, c(1), c(0.65,1))
par(mar=c(1,6,1,1))
barplot(height = hrdetect_output$Probability[reorder],
ylim = c(0,1),ylab = "BRCA1/BRCA2\n def. score",
names.arg = "",
cex.axis = 1.2,
cex.lab=1.2,
cex.names = 1.5)
title(main=paste0(par_title), cex.main=1.6)
abline(a=0.7,b=0,col="red")
text(x = 0.75*nrow(hrdetect_output),y = 0.77,labels = "classification threshold",col = "red",cex = 1.5)
matrix_to_plot <- t(hrdetect_output[reorder,col_needed])
par(mar=c(10,6,1,1))
barplot(height = matrix_to_plot,
beside = TRUE,
#names.arg = row.names(hrdetect_res),
names.arg = rownames(hrdetect_output)[reorder],
las = 3,
legend.text = c("deletion with MH",
"Substitution Sig. 3",
"Rearrangement Sig. 3",
"Rearrangement Sig. 5",
"HRD-LOH score",
"Substitution Sig. 8"),
args.legend = list(x ='bottom', bty='n', inset=c(0,-1.0),horiz=TRUE,cex=0.9),
# ylim = c(min(matrix_to_plot)*1.1,max(matrix_to_plot)*1.1),
ylim = c(-3.5,4.5),
col=c("blue","red","black","green","orange","yellow"),
ylab = "BRCA1/BRCA2\n def. contribution",
cex.axis = 1.2,
cex.lab = 1.2,
cex.names = 0.8)
dev.off()
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.