scarHRD
is an R package which determines the levels of homologous recombination deficiency (telomeric allelic imbalance, loss off heterozygosity, number of large-scale transitions) based on NGS (WES, WGS) data.
The first genomic scar based homologous recombination deficiency measures were produced using SNP arrays. Since this technology has been largely replaced by next generation sequencing it has become important to develop algorithms that derive the same type of genomic scar-scores from next generation sequencing (WXS, WGS) data. In order to perform this analysis, here we introduce the scarHRD
R package and show that using this method the SNP-array based and next generation sequencing based derivation of HRD scores show good correlation.
scarHRD
can be installed via devtools from github:
library(devtools) install_github('sztup/scarHRD',build_vignettes = TRUE)
Please cite the following paper: manuscript submitted.
A typical workflow of determining the genomic scar scores for a tumor sample has the following steps:
Call allele specific copy number profile on paired normal-tumor BAM files. This step has to be executed before running scarHRD. We recommend using Sequenza [@pmid25319062] http://www.cbs.dtu.dk/biotools/sequenza/ for copy number segmentation, Other tools (e.g. ASCAT [@pmid20837533]) may also be used in this step.
Determine the scar scores with scarHRD R package
The scarHRD input may be a detailed segmentation file from Sequenza:
example1<-system.file("extdata", "test1.small.seqz.gz", package = "scarHRD") a<-read.table(example1, header=T) head(a)
or a simplified file, including the total, and allele-specific copy-number:
example2<-system.file("extdata", "test1.small.seqz.gz", package = "scarHRD") a<-read.table(example2, header=T) head(a)
scar_score("/examples/test1.small.seqz.gz",reference = "grch38", seqz=TRUE) scar_score("/examples/test2.txt",reference = "grch38", seqz=FALSE)
reference
-- the reference genome used, grch38
or grch37
The HRD-LOH score was described based on investigation in SNP-array-based copy number profiles of ovarian cancer [@pmid22933060]. In this paper the authors showed that the samples with deficient BRCA1, BRCA2 have higher HRD-LOH scores compared to BRCA-intact samples, thus this measurement may be a reliable tool to estimate the sample's homologous recombination capacity.
The definition of a sample's HRD-LOH score is the number of 15 Mb exceeding LOH regions which do not cover the whole chromosome..
In the first paper publishing HRD-LOH-score (Abkevich et al., 2012) the authors examine the correlation between HRD-LOH-score and HR deficiency calculated for different LOH region length cut-offs. In that paper the cut-off of 15 Mb approximately in the middle of the interval was arbitrarily selected for further analysis. The authors argue that the rational for this selection rather than selecting the cut-off with the lowest p-value is that the latter cut-off is more sensitive to statistical noise present in the data.A large scale transition is defined as a chromosomal break between adjacent regions of at least 10 Mb, with a distance between them not larger than 3Mb...
The number of telomeric allelic imbalances is the number AIs that extend to the telomeric end of a chromosome..
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.