searchGGM: Structure search and estimation for Gaussian graphical models

Description Usage Arguments Details Value References Examples

View source: R/searchGGM.R

Description

Graph structure search and estimation for Gaussian covariance and concentration graph models.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
searchGGM(data = NULL,
          S = NULL, N = NULL,
          model = c("covariance", "concentration"),
          search = c("step-forw", "step-back", "ga"),
          penalty = c("bic", "ebic", "erdos", "power"),
          beta = NULL,
          start = NULL,
          regularize = FALSE, regHyperPar = NULL,
          ctrlStep = ctrlSTEP(), ctrlGa = ctrlGA(), ctrlIcf = ctrlICF(),
          parallel = FALSE,
          verbose = FALSE, ...)

Arguments

data

A dataframe or matrix, where rows correspond to observations and columns to variables. Categorical variables are not allowed.

S

The sample covariance matrix of the data. If S = NULL, the maximum likelihood estimate of the covariance matrix is used in the estimation of the graphical model.

N

The number of observations. If data = NULL and S is provided in input, N must be provided in input as well.

model

The type of Gaussian graphical model. Default is "covariance". See "Details".

search

The type of structure search algorithm. If search = "step-forw", a greedy forward-stepwise search is used to find the optimal graph association structure. If search = "step-back", a greedy backward-stepwise search is implemented. If search = "ga" a stochastic search based on a genetic algorithm is employed. Default is "step-forw".

penalty

The penalty function used to define a criterion for scoring the candidate graph configurations. Default is "bic". See "Details" and penalty.

beta

The hyperparameter of the penalty function. See "Details" and penalty.

start

A starting matrix for the estimation algorithm. If NULL, the starting value is the diagonal sample covariance matrix. Used only when model = "covariance".

regularize

A logical argument indicating if Bayesian regularization should be performed. Default to FALSE. Used only when model = "covariance".

regHyperPar

A list of hyper parameters for Bayesian regularization. Only used when
regularization = TRUE; see also ctrlREG.

ctrlStep

A list of control parameters used in the stepwise search; see also ctrlSTEP.

ctrlGa

A list of control parameters for the genetic algorithm; see also ctrlGA.

ctrlIcf

A list of control parameters employed in the algorithm for estimation of graphical model parameters; see also ctrlICF.

parallel

A logical argument indicating if parallel computation should be used for structure search. If TRUE, all the available cores are used. The argument could also be set to a numeric integer value specifying the number of cores to be employed.

verbose

A logical argument controlling whether iterations of the structure searching and estimation procedure need to be shown or not.

...

Additional internal arguments not to be provided by the user.

Details

The function performs graph association structure search and maximum penalized likelihood estimation of the optimal Gaussian graphical model given the data provided in input.

A Gaussian covariance graph model is estimated if model = "covariance", while estimation of a Gaussian covariance graph model is performed if model = "concentration". A Gaussian covariance graph model postulates that some variables are marginally independent according to the inferred graph structure. On the other hand, in a Gaussian concentration graph model, variables are conditionally independent given their neighbors in the inferred graph. See also fitGGM.

Search for the optimal graph structure and parameter estimation is carried out by maximization of a Gaussian penalized likelihood, given as follows:

Covariance: argmax_(Sigma, A) L(X | Sigma, A) - P_beta(A) with Sigma in C_G(A)

Concentration: argmax_(Omega, A) L(X | Omega, A) - P_beta(A) with Omega in C_G(A)

where C_G(A) is the collection of sparse positive definite matrices whose zero patterns are given by graph G represented by the adjacency matrix A.

The penalty function P_beta(A) depends on the structure of graph G through the adjacency matrix A and a parameter beta; see penalty on how to specify the penalization term and for further information.

For this type of penalized log-likelihood, graph structure search and parameter estimation is a maximization combinatorial problem. For a given candidate structure (i.e. adjacency matrix), association parameters in the covariance or concentration matrix are estimated using the estimation algorithms implemented in fitGGM. Regarding structure search, this can be carried out either using a greedy forward-stepwise or a greedy backward-stepwise algorithm, by setting search = "step-forw" or search = "step-back" respectively. Alternatively, a stochastic search via genetic algorithm can be used by setting search = "ga". The procedure for the forward stepwise search is described in Fop et al. (2018), and the backward is implemented in a similar way; the genetic algorithm procedure relies on the GA package. All the structure searching methods can be run in parallel on a multi-core machine by setting the argument parallel = TRUE.

Value

An object of class 'fitGGM' containing the optimal estimated marginal or conditional independence Gaussian graphical model.

The output is a list containing:

sigma

The estimated covariance matrix.

omega

The estimated concentration (inverse covariance) matrix.

graph

The adjacency matrix corresponding to the optimal marginal or conditional independence graph.

model

Estimated model type, whether "covariance" or "concentration".

loglikPen

Value of the maximized penalized log-likelihood.

loglik

Value of the maximized log-likelihood.

nPar

Number of estimated parameters.

N

Number of observations.

V

Number of variables, corresponding to the number of nodes in the graph.

penalty

The type of penalty on the graph structure.

search

The search method used for graph structure search.

GA

An object of class 'ga-class' with information about the genetic algorithm. Only present when search = "ga". See ga.

References

Fop, M., Murphy, T.B., and Scrucca, L. (2018). Model-based clustering with sparse covariance matrices. Statistics and Computing. To appear.

Scrucca, L. (2017). On some extensions to GA package: Hybrid optimisation, parallelisation and islands evolution. The R Journal, 9(1), 187-206.

Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53(4), 1-3.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# fit covariance graph model with default forward-stepwise search
data(mtcars)
x <- mtcars[,c(1,3:7)]
mod1 <- searchGGM(x, model = "covariance")
mod1
plot(mod1)
#
# prefer a sparser model
mod2 <- searchGGM(x, model = "covariance", penalty = "ebic")
mod2
plot(mod2)


# fit concentration graph model with backward-stepwise structure search
# with a covariance matrix in input
data(ability.cov)
mod3 <- searchGGM(S = ability.cov$cov, N = ability.cov$n.obs,
                  model = "concentration", search = "step-back")
mod3
mod3$graph
mod3$omega
plot(mod3)


# generate data from a Markov model
N <- 1000
V <- 20
dat <- matrix(NA, N, V)
dat[,1] <- rnorm(N)
for ( j in 2:V ) dat[,j] <- dat[,j-1] + rnorm(N, sd = 0.5)
mod4 <- searchGGM(data = dat, model = "concentration")        # recover the model
plot(mod4, what = "adjacency")

michaelfop/mixggm documentation built on Oct. 27, 2018, 12:14 a.m.