Description Usage Arguments Details Value References Examples
Graph structure search and estimation for Gaussian covariance and concentration graph models.
1 2 3 4 5 6 7 8 9 10 11 | searchGGM(data = NULL,
S = NULL, N = NULL,
model = c("covariance", "concentration"),
search = c("step-forw", "step-back", "ga"),
penalty = c("bic", "ebic", "erdos", "power"),
beta = NULL,
start = NULL,
regularize = FALSE, regHyperPar = NULL,
ctrlStep = ctrlSTEP(), ctrlGa = ctrlGA(), ctrlIcf = ctrlICF(),
parallel = FALSE,
verbose = FALSE, ...)
|
data |
A dataframe or matrix, where rows correspond to observations and columns to variables. Categorical variables are not allowed. |
S |
The sample covariance matrix of the data. If |
N |
The number of observations. If |
model |
The type of Gaussian graphical model. Default is |
search |
The type of structure search algorithm. If |
penalty |
The penalty function used to define a criterion for scoring the candidate graph configurations. Default is |
beta |
The hyperparameter of the penalty function. See "Details" and |
start |
A starting matrix for the estimation algorithm. If |
regularize |
A logical argument indicating if Bayesian regularization should be performed. Default to |
regHyperPar |
A list of hyper parameters for Bayesian regularization. Only used when |
ctrlStep |
A list of control parameters used in the stepwise search; see also |
ctrlGa |
A list of control parameters for the genetic algorithm; see also |
ctrlIcf |
A list of control parameters employed in the algorithm for estimation of graphical model parameters; see also |
parallel |
A logical argument indicating if parallel computation should be used for structure search. If TRUE, all the available cores are used. The argument could also be set to a numeric integer value specifying the number of cores to be employed. |
verbose |
A logical argument controlling whether iterations of the structure searching and estimation procedure need to be shown or not. |
... |
Additional internal arguments not to be provided by the user. |
The function performs graph association structure search and maximum penalized likelihood estimation of the optimal Gaussian graphical model given the data provided in input.
A Gaussian covariance graph model is estimated if model = "covariance", while estimation of a Gaussian covariance graph model is performed if model = "concentration". A Gaussian covariance graph model postulates that some variables are marginally independent according to the inferred graph structure. On the other hand, in a Gaussian concentration graph model, variables are conditionally independent given their neighbors in the inferred graph. See also fitGGM.
Search for the optimal graph structure and parameter estimation is carried out by maximization of a Gaussian penalized likelihood, given as follows:
Covariance: argmax_(Sigma, A) L(X | Sigma, A) - P_beta(A) with Sigma in C_G(A)
Concentration: argmax_(Omega, A) L(X | Omega, A) - P_beta(A) with Omega in C_G(A)
where C_G(A) is the collection of sparse positive definite matrices whose zero patterns are given by graph G represented by the adjacency matrix A.
The penalty function P_beta(A) depends on the structure of graph G through the adjacency matrix A and a parameter beta; see penalty on how to specify the penalization term and for further information.
For this type of penalized log-likelihood, graph structure search and parameter estimation is a maximization combinatorial problem. For a given candidate structure (i.e. adjacency matrix), association parameters in the covariance or concentration matrix are estimated using the estimation algorithms implemented in fitGGM. Regarding structure search, this can be carried out either using a greedy forward-stepwise or a greedy backward-stepwise algorithm, by setting search = "step-forw" or search = "step-back" respectively. Alternatively, a stochastic search via genetic algorithm can be used by setting search = "ga". The procedure for the forward stepwise search is described in Fop et al. (2018), and the backward is implemented in a similar way; the genetic algorithm procedure relies on the GA package. All the structure searching methods can be run in parallel on a multi-core machine by setting the argument parallel = TRUE.
An object of class 'fitGGM' containing the optimal estimated marginal or conditional independence Gaussian graphical model.
The output is a list containing:
sigma |
The estimated covariance matrix. |
omega |
The estimated concentration (inverse covariance) matrix. |
graph |
The adjacency matrix corresponding to the optimal marginal or conditional independence graph. |
model |
Estimated model type, whether |
loglikPen |
Value of the maximized penalized log-likelihood. |
loglik |
Value of the maximized log-likelihood. |
nPar |
Number of estimated parameters. |
N |
Number of observations. |
V |
Number of variables, corresponding to the number of nodes in the graph. |
penalty |
The type of penalty on the graph structure. |
search |
The search method used for graph structure search. |
GA |
An object of class |
Fop, M., Murphy, T.B., and Scrucca, L. (2018). Model-based clustering with sparse covariance matrices. Statistics and Computing. To appear.
Scrucca, L. (2017). On some extensions to GA package: Hybrid optimisation, parallelisation and islands evolution. The R Journal, 9(1), 187-206.
Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53(4), 1-3.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # fit covariance graph model with default forward-stepwise search
data(mtcars)
x <- mtcars[,c(1,3:7)]
mod1 <- searchGGM(x, model = "covariance")
mod1
plot(mod1)
#
# prefer a sparser model
mod2 <- searchGGM(x, model = "covariance", penalty = "ebic")
mod2
plot(mod2)
# fit concentration graph model with backward-stepwise structure search
# with a covariance matrix in input
data(ability.cov)
mod3 <- searchGGM(S = ability.cov$cov, N = ability.cov$n.obs,
model = "concentration", search = "step-back")
mod3
mod3$graph
mod3$omega
plot(mod3)
## Not run:
# generate data from a Markov model
N <- 1000
V <- 20
dat <- matrix(NA, N, V)
dat[,1] <- rnorm(N)
for ( j in 2:V ) dat[,j] <- dat[,j-1] + rnorm(N, sd = 0.5)
mod4 <- searchGGM(data = dat, model = "concentration") # recover the model
plot(mod4, what = "adjacency")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.