knitr::opts_chunk$set(echo = TRUE)
devtools::load_all(".")
library(HRDX)
data("sub01.ploidy", package = "HRDX")
data("sub01.segments", package = "HRDX")

\

Sample Data

HRDX comes with a sample data set, the result of running Sequenza VX.X on TCGA subject Y. "sub01.segments" contains information on the segments of the genome, and copynumber status. "sub01.ploidy" contains ploidy estimates. This data can be used for HRD or aneuploid calculation.

head(sub01.segments)
print(sub01.ploidy)

\

HRD Scoring

Preprocessing

The package needs to define the centromere and telomere positions for each chromosome; the overall chromosome size; the length of each segment; the length of the gaps between segments; the p, q, and cross arms; and allelic imbalance. This is all achieved using the function "preprocessHRD". All positions are stored in the reference data within the package; both grch37 and grch38 are supported. The other stuff is calculated from the data.

seq.dat <- sub01.segments[ sub01.segments$chromosome == "chr1",]
preprocessHRD( seq.dat, "grch38" )

\

Quality Control

Part of preprocessing is identifying errors in defining segments. For example, on chromosome 14:

cols <- c(1,2,3,10,11,12)
seq.dat <- sub01.segments[ sub01.segments$chromosome == "chr14", ]
seq.dat[,cols]

notice that the last two segments have a very small gap between them, with identical copynumber status. This suggests that this is really one full segment being incorrectly identified as two. The function combineSegs is called within preprocessHRD to correct these:

seq.dat <- preprocessHRD( seq.dat, "grch38" )
seq.dat[,cols]

After the preprocessing function runs, the last two segments are combined into one.

\ NTAI

is normalized by removing CN segments (why)


HRD Measures

Once the data has been preprocessed, you can calculate HRD measures LST, LOH, TAI, and NTAI. Each metric is retrieved with a corresponding function.

\newline \
LST

LST is the number of large state transition events.

seq.dat <- preprocessHRD( sub01.segments, "grch38" )
getLST( seq.dat )

\newline \
LOH

LOH is the number of loss of heterozygosity events.

getLOH( seq.dat )

\
NTAI

Raw NTAI is the number of nontelomeric allelic imbalance events. NTAI is normalized by removing main copynumber segments.

CN.dat <- getCNt( seq.dat )
getNTAI.raw( seq.dat )
getNTAI.norm( seq.dat, CN.dat)

\
HRD Score

An HRD Score is some linear combination of raw or normalized values LST, LOH, NTAI (or TAI). It is intended to return the HRD score without having to specify all the individual arguments. Optionally, you can rescale the data to fit a range of 0 - 100. For a single subject, an HRD score can be easily be calculated from raw data in just a few steps:

seq.dat <- preprocessHRD( sub01.segments, "grch38" )
CN.dat  <- getCNt( seq.dat )

HRD <- getHRD.Score( seq.dat, CN.dat )
print(HRD)

\
Ploidy Normalization

HRD will increase with ploidy, but the relationship may vary by cancer type. It is common to observe a logarithmic relationship between the two. In this case, ploidy-effects can be accounted for by fitting a linear model of ploidy on the log of the HRD score, and taking the residuals. This correction is implemented in the function ploidyNorm.log.


Aneuploidy

Aneuploidy processing uses different criteria than HRD. Therefore, you should run getAneuploidy on the raw data- dont run preprocessHRD.

ap.chr1 <- getAneuploidy(sub01.segments, sub01.ploidy, 1)

# summary data
ap.chr1[[1]]

# full data
ap.chr1[[2]]
ap.dat <- getAneuploidyGenome(sub01.segments, sub01.ploidy)
ap.dat[[1]]
getAneuploidyScore(ap.dat[[1]])

plotAneuploidy(ap.dat[[2]])


maxwell-lab/HRDex documentation built on May 3, 2020, 9:01 p.m.