Description Usage Arguments Details Value References Examples
Graph structure search and estimation for Gaussian covariance and concentration graph models.
1 2 3 4 5 6 7 8 9 10 11 | searchGGM(data = NULL,
S = NULL, N = NULL,
model = c("covariance", "concentration"),
search = c("step-forw", "step-back", "ga"),
penalty = c("bic", "ebic", "erdos", "power"),
beta = NULL,
start = NULL,
regularize = FALSE, regHyperPar = NULL,
ctrlStep = ctrlSTEP(), ctrlGa = ctrlGA(), ctrlIcf = ctrlICF(),
parallel = FALSE,
verbose = FALSE, ...)
|
data |
A dataframe or matrix, where rows correspond to observations and columns to variables. Categorical variables are not allowed. |
S |
The sample covariance matrix of the data. If |
N |
The number of observations. If |
model |
The type of Gaussian graphical model. Default is |
search |
The type of structure search algorithm. If |
penalty |
The penalty function used to define a criterion for scoring the candidate graph configurations. Default is |
beta |
The hyperparameter of the penalty function. See "Details" and |
start |
A starting matrix for the estimation algorithm. If |
regularize |
A logical argument indicating if Bayesian regularization should be performed. Default to |
regHyperPar |
A list of hyper parameters for Bayesian regularization. Only used when |
ctrlStep |
A list of control parameters used in the stepwise search; see also |
ctrlGa |
A list of control parameters for the genetic algorithm; see also |
ctrlIcf |
A list of control parameters employed in the algorithm for estimation of graphical model parameters; see also |
parallel |
A logical argument indicating if parallel computation should be used for structure search. If TRUE, all the available cores are used. The argument could also be set to a numeric integer value specifying the number of cores to be employed. |
verbose |
A logical argument controlling whether iterations of the structure searching and estimation procedure need to be shown or not. |
... |
Additional internal arguments not to be provided by the user. |
The function performs graph association structure search and maximum penalized likelihood estimation of the optimal Gaussian graphical model given the data provided in input.
A Gaussian covariance graph model is estimated if model = "covariance"
, while estimation of a Gaussian covariance graph model is performed if model = "concentration"
. A Gaussian covariance graph model postulates that some variables are marginally independent according to the inferred graph structure. On the other hand, in a Gaussian concentration graph model, variables are conditionally independent given their neighbors in the inferred graph. See also fitGGM
.
Search for the optimal graph structure and parameter estimation is carried out by maximization of a Gaussian penalized likelihood, given as follows:
Covariance: argmax_(Sigma, A) L(X | Sigma, A) - P_beta(A) with Sigma in C_G(A)
Concentration: argmax_(Omega, A) L(X | Omega, A) - P_beta(A) with Omega in C_G(A)
where C_G(A) is the collection of sparse positive definite matrices whose zero patterns are given by graph G represented by the adjacency matrix A.
The penalty function P_beta(A) depends on the structure of graph G through the adjacency matrix A and a parameter beta; see penalty
on how to specify the penalization term and for further information.
For this type of penalized log-likelihood, graph structure search and parameter estimation is a maximization combinatorial problem. For a given candidate structure (i.e. adjacency matrix), association parameters in the covariance or concentration matrix are estimated using the estimation algorithms implemented in fitGGM
. Regarding structure search, this can be carried out either using a greedy forward-stepwise or a greedy backward-stepwise algorithm, by setting search = "step-forw"
or search = "step-back"
respectively. Alternatively, a stochastic search via genetic algorithm can be used by setting search = "ga"
. The procedure for the forward stepwise search is described in Fop et al. (2018), and the backward is implemented in a similar way; the genetic algorithm procedure relies on the GA
package. All the structure searching methods can be run in parallel on a multi-core machine by setting the argument parallel = TRUE
.
An object of class 'fitGGM'
containing the optimal estimated marginal or conditional independence Gaussian graphical model.
The output is a list containing:
sigma |
The estimated covariance matrix. |
omega |
The estimated concentration (inverse covariance) matrix. |
graph |
The adjacency matrix corresponding to the optimal marginal or conditional independence graph. |
model |
Estimated model type, whether |
loglikPen |
Value of the maximized penalized log-likelihood. |
loglik |
Value of the maximized log-likelihood. |
nPar |
Number of estimated parameters. |
N |
Number of observations. |
V |
Number of variables, corresponding to the number of nodes in the graph. |
penalty |
The type of penalty on the graph structure. |
search |
The search method used for graph structure search. |
GA |
An object of class |
Fop, M., Murphy, T.B., and Scrucca, L. (2018). Model-based clustering with sparse covariance matrices. Statistics and Computing. To appear.
Scrucca, L. (2017). On some extensions to GA package: Hybrid optimisation, parallelisation and islands evolution. The R Journal, 9(1), 187-206.
Scrucca, L. (2013). GA: A package for genetic algorithms in R. Journal of Statistical Software, 53(4), 1-3.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | # fit covariance graph model with default forward-stepwise search
data(mtcars)
x <- mtcars[,c(1,3:7)]
mod1 <- searchGGM(x, model = "covariance")
mod1
plot(mod1)
#
# prefer a sparser model
mod2 <- searchGGM(x, model = "covariance", penalty = "ebic")
mod2
plot(mod2)
# fit concentration graph model with backward-stepwise structure search
# with a covariance matrix in input
data(ability.cov)
mod3 <- searchGGM(S = ability.cov$cov, N = ability.cov$n.obs,
model = "concentration", search = "step-back")
mod3
mod3$graph
mod3$omega
plot(mod3)
## Not run:
# generate data from a Markov model
N <- 1000
V <- 20
dat <- matrix(NA, N, V)
dat[,1] <- rnorm(N)
for ( j in 2:V ) dat[,j] <- dat[,j-1] + rnorm(N, sd = 0.5)
mod4 <- searchGGM(data = dat, model = "concentration") # recover the model
plot(mod4, what = "adjacency")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.