episelect: Model selection

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/episelect.R

Description

Estimate the optimal regularization parameter at EM convergence based on different information criteria .

Usage

1
episelect(epi.object, criteria = NULL, ebic.gamma = 0.5, loglik_Y = FALSE, ncores = NULL)

Arguments

epi.object

An object with S3 class "epi"

criteria

Model selection criteria. "ebic" and "aic" are available. BIC model selection can be calculated by fixing ebic.gamma = 0.

ebic.gamma

The tuning parameter for ebic. Theebic.gamma = 0 results in bic model selection. The default value is 0.5.

loglik_Y

Model selection based on either log-likelihood of observed data (loglik_Y = TRUE), or the joint log-likelihood of observed and latent variables (loglik_Y = FALSE).

ncores

The number of cores to use for the calculations. Using ncores = NULL automatically detects number of available cores and runs the computations in parallel.

Details

This function computes extended Bayesian information criteria (ebic), Bayesian information criteria, Akaike information criterion (aic) at EM convergence based on observed or joint log-likelihood. The observed log-likelihood can be obtained through

\ell_Y(\widehat{Θ}_λ) = Q(\widehat{Θ}_λ | \widehat{Θ}^{(m)}) - H (\widehat{Θ}_λ | \widehat{Θ}^{(m)}),

Where Q can be calculated from epistasis function and H function is

H(\widehat{Θ}_λ | \widehat{Θ}^{(m)}_λ) = E_z[\ell_{Z | Y}(\widehat{Θ}_λ) | Y; \widehat{Θ}_λ] = E_z[\log f(z)| Y ;\widehat{Θ}_λ ] - \log p(y).

The "ebic" and "aic" model selection criteria can be obtained as follow

ebic(λ) = -2 \ell(\widehat{Θ}_λ) + ( \log n + 4 γ \log p) df(λ)

aic(λ) = -2 \ell(\widehat{Θ}_λ) + 2 df(λ)

where df refers to the number of non-zeros offdiagonal elements of \hat{Θ}_λ, and γ \in [0, 1]. Typical value for for ebic.gamma is 1/2, but it can also be tuned by experience. Fixing ebic.gamma = 0 results in bic model selection.

Value

An object with S3 class "episelect" is returned:

opt.path

The optimal graph selected from the graph path

opt.theta

The optimal precision matrix from the graph path

opt.Sigma

The optimal covariance matrix from the graph path

ebic.scores

Extended BIC scores for regularization parameter selection at the EM convergence.

opt.index

The index of optimal regularization parameter.

opt.rho

The selected regularization parameter.

and anything else that is included in the input epi.object.

Author(s)

Pariya Behrouzi and Ernst C.Wit
Maintainer: Pariya Behrouzi pariya.behrouzi@gmail.com

References

1. P. Behrouzi and E. C. Wit. Detecting Epistatic Selection with Partially Observed Genotype Data Using Copula Graphical Models. arXiv, 2016.
2. Ibrahim, Joseph G., Hongtu Zhu, and Niansheng Tang. "Model selection criteria for missing-data problems using the EM algorithm." Journal of the American Statistical Association (2012). 3. D. Witten and J. Friedman. New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics, to appear, 2011.
4. J. Friedman, T. Hastie and R. Tibshirani. Sparse inverse covariance estimation with the lasso, Biostatistics, 2007.
5. Foygel, R. and M. Drton (2010). Extended bayesian information criteria for Gaussian graphical models. In Advances in Neural Information Processing Systems, pp. 604-612.

See Also

epistasis

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Not run: 
#simulate data
D <- episim(p=50, n=100, k= 3, adjacent = 3, alpha = 0.06 , beta = 0.06)
plot(D)

#detect epistatic selection path
out  <-  epistasis(D$data, method="gibbs", n.rho= 5, ncores= 1)

#different graph selection methods
sel.ebic1 <- episelect(out, criteria="ebic")
plot(sel.ebic1)

sel.ebic2 <- episelect(out, criteria="ebic", loglik_Y=TRUE)
plot(sel.ebic2)

sel.aic <- episelect(out, criteria="aic")
plot(sel.aic)

sel.bic <- episelect(out, criteria="ebic", ebic.gamma = 0)
plot(sel.bic)

## End(Not run)

epistasis documentation built on May 2, 2019, 5:09 a.m.