# searchGGM: Structure search and estimation for Gaussian graphical models In michaelfop/mixggm: Mixtures of Gaussian Graphical Models

## Description

Graph structure search and estimation for Gaussian covariance and concentration graph models.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```searchGGM(data = NULL, S = NULL, N = NULL, model = c("covariance", "concentration"), search = c("step-forw", "step-back", "ga"), penalty = c("bic", "ebic", "erdos", "power"), beta = NULL, start = NULL, regularize = FALSE, regHyperPar = NULL, ctrlStep = ctrlSTEP(), ctrlGa = ctrlGA(), ctrlIcf = ctrlICF(), parallel = FALSE, verbose = FALSE, ...) ```

## Arguments

 `data` A dataframe or matrix, where rows correspond to observations and columns to variables. Categorical variables are not allowed. `S` The sample covariance matrix of the data. If `S = NULL`, the maximum likelihood estimate of the covariance matrix is used in the estimation of the graphical model. `N` The number of observations. If `data = NULL` and `S` is provided in input, `N` must be provided in input as well. `model` The type of Gaussian graphical model. Default is `"covariance"`. See "Details". `search` The type of structure search algorithm. If `search = "step-forw"`, a greedy forward-stepwise search is used to find the optimal graph association structure. If `search = "step-back"`, a greedy backward-stepwise search is implemented. If `search = "ga"` a stochastic search based on a genetic algorithm is employed. Default is `"step-forw"`. `penalty` The penalty function used to define a criterion for scoring the candidate graph configurations. Default is `"bic"`. See "Details" and `penalty`. `beta` The hyperparameter of the penalty function. See "Details" and `penalty`. `start` A starting matrix for the estimation algorithm. If `NULL`, the starting value is the diagonal sample covariance matrix. Used only when `model = "covariance"`. `regularize` A logical argument indicating if Bayesian regularization should be performed. Default to `FALSE`. Used only when `model = "covariance"`. `regHyperPar` A list of hyper parameters for Bayesian regularization. Only used when `regularization = TRUE`; see also `ctrlREG`. `ctrlStep` A list of control parameters used in the stepwise search; see also `ctrlSTEP`. `ctrlGa` A list of control parameters for the genetic algorithm; see also `ctrlGA`. `ctrlIcf` A list of control parameters employed in the algorithm for estimation of graphical model parameters; see also `ctrlICF`. `parallel` A logical argument indicating if parallel computation should be used for structure search. If TRUE, all the available cores are used. The argument could also be set to a numeric integer value specifying the number of cores to be employed. `verbose` A logical argument controlling whether iterations of the structure searching and estimation procedure need to be shown or not. `...` Additional internal arguments not to be provided by the user.

## Details

The function performs graph association structure search and maximum penalized likelihood estimation of the optimal Gaussian graphical model given the data provided in input.

A Gaussian covariance graph model is estimated if `model = "covariance"`, while estimation of a Gaussian covariance graph model is performed if `model = "concentration"`. A Gaussian covariance graph model postulates that some variables are marginally independent according to the inferred graph structure. On the other hand, in a Gaussian concentration graph model, variables are conditionally independent given their neighbors in the inferred graph. See also `fitGGM`.

Search for the optimal graph structure and parameter estimation is carried out by maximization of a Gaussian penalized likelihood, given as follows:

Covariance: argmax_(Sigma, A) L(X | Sigma, A) - P_beta(A) with Sigma in C_G(A)

Concentration: argmax_(Omega, A) L(X | Omega, A) - P_beta(A) with Omega in C_G(A)

where C_G(A) is the collection of sparse positive definite matrices whose zero patterns are given by graph G represented by the adjacency matrix A.

The penalty function P_beta(A) depends on the structure of graph G through the adjacency matrix A and a parameter beta; see `penalty` on how to specify the penalization term and for further information.

For this type of penalized log-likelihood, graph structure search and parameter estimation is a maximization combinatorial problem. For a given candidate structure (i.e. adjacency matrix), association parameters in the covariance or concentration matrix are estimated using the estimation algorithms implemented in `fitGGM`. Regarding structure search, this can be carried out either using a greedy forward-stepwise or a greedy backward-stepwise algorithm, by setting `search = "step-forw"` or `search = "step-back"` respectively. Alternatively, a stochastic search via genetic algorithm can be used by setting `search = "ga"`. The procedure for the forward stepwise search is described in Fop et al. (2018), and the backward is implemented in a similar way; the genetic algorithm procedure relies on the `GA` package. All the structure searching methods can be run in parallel on a multi-core machine by setting the argument `parallel = TRUE`.

## Value

An object of class `'fitGGM'` containing the optimal estimated marginal or conditional independence Gaussian graphical model.

The output is a list containing:

 `sigma` The estimated covariance matrix. `omega` The estimated concentration (inverse covariance) matrix. `graph` The adjacency matrix corresponding to the optimal marginal or conditional independence graph. `model` Estimated model type, whether `"covariance"` or `"concentration"`. `loglikPen` Value of the maximized penalized log-likelihood. `loglik` Value of the maximized log-likelihood. `nPar` Number of estimated parameters. `N` Number of observations. `V` Number of variables, corresponding to the number of nodes in the graph. `penalty` The type of penalty on the graph structure. `search` The search method used for graph structure search. `GA` An object of class `'ga-class'` with information about the genetic algorithm. Only present when `search = "ga"`. See `ga`.

## References

Fop, M., Murphy, T.B., and Scrucca, L. (2018). Model-based clustering with sparse covariance matrices. Statistics and Computing. To appear.

Scrucca, L. (2017). On some extensions to GA package: Hybrid optimisation, parallelisation and islands evolution. The R Journal, 9(1), 187-206.

Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53(4), 1-3.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36``` ```# fit covariance graph model with default forward-stepwise search data(mtcars) x <- mtcars[,c(1,3:7)] mod1 <- searchGGM(x, model = "covariance") mod1 plot(mod1) # # prefer a sparser model mod2 <- searchGGM(x, model = "covariance", penalty = "ebic") mod2 plot(mod2) # fit concentration graph model with backward-stepwise structure search # with a covariance matrix in input data(ability.cov) mod3 <- searchGGM(S = ability.cov\$cov, N = ability.cov\$n.obs, model = "concentration", search = "step-back") mod3 mod3\$graph mod3\$omega plot(mod3) ## Not run: # generate data from a Markov model N <- 1000 V <- 20 dat <- matrix(NA, N, V) dat[,1] <- rnorm(N) for ( j in 2:V ) dat[,j] <- dat[,j-1] + rnorm(N, sd = 0.5) mod4 <- searchGGM(data = dat, model = "concentration") # recover the model plot(mod4, what = "adjacency") ## End(Not run) ```

michaelfop/mixggm documentation built on Jan. 10, 2019, 8:40 p.m.