Fits a Poisson loglinear model that normalizes the read depth data from whole exome sequencing. Includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. If the WES is designed under casecontrol setting, CODEX estimates the exonwise Poisson latent factor using only the read depths in the control cohort, and then computes the samplewise latent factor terms for the case samples by regression.
1  normalize2(Y_qc, gc_qc, K, normal_index)

Y_qc 
Read depth matrix after quality control procedure returned from

gc_qc 
Vector of GC content for each exon after quality control procedure returned
from 
K 
Number of latent Poisson factors. Can be an integer if optimal solution has been chosen or a vector of integers so that AIC, BIC, and RSS are computed for choice of optimal k. 
normal_index 
Indices of control samples. 
Yhat 
Normalized read depth matrix 
AIC 
AIC for model selection 
BIC 
BIC for model selection 
RSS 
RSS for model selection 
K 
Number of latent Poisson factors 
Yuchao Jiang yuchaoj@wharton.upenn.edu
qc
,
choiceofK
1 2 3 4 5 6 7 8 
Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.