Normalization of read depth from whole exome sequencing under the case-control setting

Description

Fits a Poisson log-linear model that normalizes the read depth data from whole exome sequencing. Includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. If the WES is designed under case-control setting, CODEX estimates the exon-wise Poisson latent factor using only the read depths in the control cohort, and then computes the sample-wise latent factor terms for the case samples by regression.

Usage

1
normalize2(Y_qc, gc_qc, K, normal_index)

Arguments

Y_qc

Read depth matrix after quality control procedure returned from qc

gc_qc

Vector of GC content for each exon after quality control procedure returned from qc

K

Number of latent Poisson factors. Can be an integer if optimal solution has been chosen or a vector of integers so that AIC, BIC, and RSS are computed for choice of optimal k.

normal_index

Indices of control samples.

Value

Yhat

Normalized read depth matrix

AIC

AIC for model selection

BIC

BIC for model selection

RSS

RSS for model selection

K

Number of latent Poisson factors

Author(s)

Yuchao Jiang yuchaoj@wharton.upenn.edu

See Also

qc, choiceofK

Examples

1
2
3
4
5
6
7
8
Y_qc <- qcObjDemo$Y_qc
gc_qc <- qcObjDemo$gc_qc
normObj <- normalize2(Y_qc, gc_qc, K = 1:5, normal_index = seq(1, 45, 2))
Yhat <- normObj$Yhat
AIC <- normObj$AIC
BIC <- normObj$BIC
RSS <- normObj$RSS
K <- normObj$K