normalize_scope: Normalization of read depth with latent factors using...

View source: R/normalize_scope.R

normalize_scopeR Documentation

Normalization of read depth with latent factors using Expectation-Maximization algorithm under the case-control setting

Description

Fit a Poisson generalized linear model to normalize the raw read depth data from single-cell DNA sequencing, with latent factors under the case-control setting. Model GC content bias using an expectation-maximization algorithm, which accounts for the different copy number states.

Usage

normalize_scope(Y_qc, gc_qc, K, norm_index, T, ploidyInt,
                beta0, minCountQC = 20)

Arguments

Y_qc

read depth matrix after quality control

gc_qc

vector of GC content for each bin after quality control

K

Number of latent Poisson factors

norm_index

indices of normal/diploid cells

T

a vector of integers indicating number of CNV groups. Use BIC to select optimal number of CNV groups. If T = 1, assume all reads are from normal regions so that EM algorithm is not implemented. Otherwise, we assume there is always a CNV group of heterozygous deletion and a group of null region. The rest groups are representative of different duplication states.

ploidyInt

a vector of initialized ploidy return from initialize_ploidy. Users are also allowed to provide prior-knowledge ploidies as the input and to manually tune a few cells that have poor fitting

beta0

a vector of initialized bin-specific biases returned from CODEX2 without latent factors

minCountQC

the minimum read coverage required for normalization and EM fitting. Defalut is 20

Value

A list with components

Yhat

A list of normalized read depth matrix with EM

alpha.hat

A list of absolute copy number matrix

fGC.hat

A list of EM estimated GC content bias matrix

beta.hat

A list of EM estimated bin-specific bias vector

g.hat

A list of estimated Poisson latent factor

h.hat

A list of estimated Poisson latent factor

AIC

AIC for model selection

BIC

BIC for model selection

RSS

RSS for model selection

K

Number of latent Poisson factors

Author(s)

Rujin Wang rujin@email.unc.edu

Examples

Gini <- get_gini(Y_sim)

# first-pass CODEX2 run with no latent factors
normObj.sim <- normalize_codex2_ns_noK(Y_qc = Y_sim,
                                        gc_qc = ref_sim$gc,
                                        norm_index = which(Gini<=0.12))
Yhat.noK.sim <- normObj.sim$Yhat
beta.hat.noK.sim <- normObj.sim$beta.hat
fGC.hat.noK.sim <- normObj.sim$fGC.hat
N.sim <- normObj.sim$N

# Ploidy initialization
ploidy.sim <- initialize_ploidy(Y = Y_sim, 
                                Yhat = Yhat.noK.sim, 
                                ref = ref_sim)
ploidy.sim

normObj.scope.sim <- normalize_scope(Y_qc = Y_sim, gc_qc = ref_sim$gc,
                                    K = 1, ploidyInt = ploidy.sim,
                                    norm_index = which(Gini<=0.12), T = 1:5,
                                    beta0 = beta.hat.noK.sim)
Yhat.sim <- normObj.scope.sim$Yhat[[which.max(normObj.scope.sim$BIC)]]
fGC.hat.sim <- normObj.scope.sim$fGC.hat[[which.max(normObj.scope.sim$BIC)]]


rujinwang/SCOPE documentation built on Jan. 1, 2023, 5:40 a.m.