Description Usage Arguments Details Value Author(s) References Examples
Fit a penalized exponential tilt model (ETM) to identify differentially
methylated loci between cases and controls. ETM is able to detect any
differences in means only, in variances only or in both means and
variances.
A penalized exponential tilt model using combined lasso and Laplacian penalties
is applied to high-dimensional DNA methylation data with case-control
association studies. When CpG sites are correlated with each other within
the same gene or the same genetic region, Laplacian matrix can be imposed
into the penalty function to encourage grouping effects among linked CpG
sites. The selection probability of an individual CpG site is computed based
on a finite number of resamplings.
1 2 3 4 |
x |
Observed DNA methylation beta values consisting of n samples and p CpG sites. It should be (n x p) design matrix without an intercept. |
y |
The phenotype outcome coded as 1 for cases and 0 for the controls. |
cx |
The covariates such as age and gender. It should be (n x m) matrix, where m is the number of the covariates. |
alpha |
The penalty mixing parameter with 0≤α≤ 1 and default is 0.1. See details. |
maxit |
Maximum number of passes over the data for all regularization values, and default is 10^5. For fast computation, use a smaller value than the default value. |
thre |
Convergence threshold for coordinate descent algorithm.
The default value is |
group |
The integer vector describing the size of genes or genetic
regions. The length of |
lambda |
A sequence of regularization tuning parameter can be
specified. Typical usage is to have the program compute its own
|
.
type |
A type of network within each group when |
etm |
A type of an exponential tilt model. |
psub |
The proportion of subsamples used for resamplings, and
|
nlam |
The number of |
kb |
The number of burn-out replications before resamplings to properly
adjust a sequence of |
K |
The number of resamplings, and default is 100. |
The exponential tilt model based on a logistic regression is defined as
\log\frac{p(x_i)}{1-p(x_i)} = β_0+h_1(x_i)^{T}β_1+h_2(x_i)^{T}β_2,
where h_1(\cdot) and h_2(\cdot) are pre-specified functions.
For example h_1(x)=x and h_2(x)=x^2 if etm
is
normal
and h_1(x)=-\log(x) and h_2(x)=-\log(1-x) if
etm
is beta
.
The penalty function of pETM
is defined as
α||β||_1+(1-α)(β^{T}Lβ)/2,
where L is a Laplacian matrix describing a group structure of
CpG sites. This penalty is equivalent to the Lasso penalty if
alpha=1
. When group
is not defined, L is replaced by
an identity matrix. In this case, pETM
performs an elastic-net
regularization procedure since the second term of the penalty simply
reduces to the squared l_2 norm of β.
If group sizes of CpG sites are listed in group
, it is assumed
that CpG sites within the same genes are linked with each other like
a ring or a fully connected network. In this case, the Laplacian matrix
forms a block-wise diagonal matrix. The ring network assumes only
adjacent CpG sites within the same genes are linked with each other,
while every CpG sites within the same genes are linked with each other
for fully connected network. For a big gene, ring network is recommended
for computational speed-up.
The selection result is summarized as the selection probability of
individual CpG sites. The psub
portions of n
samples are
randomly selected without replacement K
times. For each
subsample of (x
,cx
,y
), pETM
is applied to
find non-zero coefficients of CpG sites along with nlam
lambda
values. The selection probability of each CpG site is then computed
based on the maximum proportion of non-zero regression coefficients
among K
replications.
selprob |
The selection probabilities of p CpG sites |
topsp |
The selection probability of each CpG site is listed in descending order along with the name of CpG sites. |
lambda |
The actual sequence of |
valid.K |
The actual number of resamplings used |
Hokeun Sun <hsun@pusan.ac.kr>
H. Sun and S. Wang (2012)
Penalized Logistic Regression for High-dimensional DNA Methylation
Data with Case-Control Studies, Bioinformatics 28(10), 1368–1375
H. Sun and S. Wang (2013)
Network-based Regularization for Matched Case-Control
Analysis of High-dimensional DNA Methylation Data, Statistics in Medicine
32(12), 2127–2139
H. Sun and S. Wang (2016)
Penalized Exponential Tilt Model for Analysis of High-dimensional
DNA Methylation Data, Manuscript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | n <- 100
p <- 500
x <- matrix(rnorm(n*p), n, p)
y <- rep(0:1, c(50,50))
# a total of 200 genes each of which consists of 1, 2, or 5 CpG sites
gr <- rep(c(1,2,5), c(50,100,50))
# ordinary penalized logistic regression
g1 <- pETM(x, y, group=gr, K=10)
# penalized exponential tilt model based on Gaussian distribution
g2 <- pETM(x, y, group=gr, etm = "normal", K=10)
# penalized exponential tilt model based on Beta distribution
x2 <- matrix(runif(n*p), n, p)
g3 <- pETM(x2, y, group=gr, etm = "beta", K=10)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.