View source: R/binomRegMethModel.R
binomRegMethModel  R Documentation 
This function fits a (dispersionadjusted) binomial regression model to regional methylation data, and reports the estimated smooth covariate effects and regional pvalues for the test of DMRs (differentially methylation regions). Over or under dispersion across loci is accounted for in the model by the combination of a multiplicative dispersion parameter (or scale parameter) and a samplespecific random effect.
This method can deal with outcomes, i.e. the number of
methylated reads in a region, that are contaminated by known
false methylation calling rate (p0
) and false nonmethylation
calling rate (1p1
).
The covariate effects are assumed to smoothly vary across
genomic regions. In order to estimate them, the algorithm first
represents the functional parameters by a linear combination of a set of
restricted cubic splines (with dimension n.k
), and a smoothness
penalization term which depends on the smoothing parameters lambdas
is also added to control smoothness. The estimation is performed by an
iterated EM algorithm. Each M step constitutes an outer Newton's iteration to
estimate smoothing parameters lambdas
and an inner PIRLS iteration to
estimate spline coefficients alpha
for the covariate effects.
Currently, the computation in the M step depends the implementation of
gam()
in package mgcv
.
binomRegMethModel( data, n.k, p0 = 0.003, p1 = 0.9, Quasi = TRUE, epsilon = 10^(6), epsilon.lambda = 10^(3), maxStep = 200, binom.link = "logit", method = "REML", covs = NULL, RanEff = TRUE, reml.scale = FALSE, scale = 2, verbose = TRUE )
data 
a data frame with rows as individual CpGs appearing in all the
samples. The first 4 columns should contain the information of 
n.k 
a vector of basis dimensions for the intercept and
individual covariates. 
p0 
the probability of observing a methylated read when the underlying
true status is unmethylated. 
p1 
the probability of observing a methylated read when the underlying
true status is methylated. 
Quasi 
whether a Quasilikelihood estimation approach will be used; in other words, whether a multiplicative dispersion is added in the model or not. 
epsilon 
numeric; stopping criterion for the closeness of estimates of spline coefficients from two consecutive iterations. 
epsilon.lambda 
numeric; stopping criterion for the closeness of
estimates of smoothing parameter 
maxStep 
the algorithm will step if the iteration steps exceed

binom.link 
the link function used in the binomial regression model; the default is the logit link. 
method 
the method used to estimate the smoothing
parameters. The default is the 'REML' method which is generally better than
prediction based criterion 
covs 
a vector of covariate names. The covariates with names in

RanEff 
whether samplelevel random effects are added or not 
reml.scale 
whether a REMLbased scale (dispersion) estimator is used. The default is Fletcherbased estimator. 
scale 
negative values mean scale parameter should be estimated; if a positive value is provided, a fixed scale will be used. 
verbose 
logical indicates if the algorithm should provide progress report information. The default value is TRUE. 
This function return a list
including objects:
est
: estimates of the spline basis coefficients alpha
lambda
: estimates of the smoothing parameters for each
functional parameters
est.pi
: predicted methylation levels for each row in the input
data
ite.points
: estimates of est
, lambda
at each EM
iteration
cov1
: estimated variancecovariance matrix of the basis
coefficients alphas
reg.out
: regional testing output obtained using Fletcherbased
dispersion estimate; an additional 'ID' row would appear if RanEff is TRUE
reg.out.reml.scale
:regional testing output obtained using
REMLbased dispersion estimate;
reg.out.gam
:regional testing output obtained using
(Fletcherbased) dispersion estimate from mgcv package;
phi_fletcher
: Fletcherbased estimate of the (multiplicative)
dispersion parameter;
phi_reml
: REMLbased estimate of the (multiplicative)
dispersion parameter;
phi_gam
: Estimated dispersion parameter reported by mgcv;
SE.out
: a matrix of the estimated pointwise Standard Errors
(SE); number of rows are the number of unique CpG sites in the input data and
the number of columns equal to the total number of covariates fitted in the
model (the first one is the intercept);
SE.out.REML.scale
: a matrix of the estimated pointwise Standard
Errors (SE); the SE calculated from the REMLbased dispersion estimates
uni.pos
: the genomic postions for each row of CpG sites in the
matrix SE.out
;
Beta.out
: a matrix of the estimated covariate effects beta(t),
where t denotes the genomic positions;
ncovs
: number of functional paramters in the model (including
the intercept);
sigma00
: estimated variance for the random effect if RanEff is
TRUE; NA if RanEff is FALSE.
Kaiqiong Zhao
gam
## data(RAdat) RAdat.f < na.omit(RAdat[RAdat$Total_Counts != 0, ]) out < binomRegMethModel( data=RAdat.f, n.k=rep(5,3), p0=0.003307034, p1=0.9, epsilon=10^(6), epsilon.lambda=10^(3), maxStep=200 )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.