ggmncv: GGMncv

Description Usage Arguments Details Value Note References Examples

View source: R/ggmncv.R

Description

\loadmathjax

Gaussian graphical modeling with nonconvex regularization. A thorough survey of these penalties, including simulation studies investigating their properties, is provided in \insertCitewilliams2020beyond;textualGGMncv.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
ggmncv(
  R,
  n,
  penalty = "atan",
  ic = "bic",
  select = "lambda",
  gamma = NULL,
  lambda = NULL,
  n_lambda = 50,
  lambda_min_ratio = 0.01,
  n_gamma = 50,
  initial = NULL,
  LLA = FALSE,
  unreg = FALSE,
  maxit = 10000,
  thr = 1e-04,
  store = TRUE,
  progress = TRUE,
  ebic_gamma = 0.5,
  penalize_diagonal = TRUE,
  ...
)

Arguments

R

Matrix. A correlation matrix of dimensions p by p.

n

Numeric. The sample size used to compute the information criterion.

penalty

Character string. Which penalty should be used (defaults to "atan")?

ic

Character string. Which information criterion should be used (defaults to "bic")? The options include aic, ebic (ebic_gamma defaults to 0.5), ric, or any of the generalized information criteria provided in section 5 of \insertCitekim2012consistent;textualGGMncv. The options are gic_1 (i.e., bic) to gic_6 (see 'Details').

select

Character string. Which tuning parameter should be selected (defaults to "lambda")? The options include "lambda" (the regularization parameter), "gamma" (governs the 'shape'), and "both".

gamma

Numeric. Hyperparameter for the penalty function. Defaults to 3.7 (scad), 2 (mcp), 0.5 (adapt), and 0.01 with all other penalties. Note care must be taken when departing from the default values (see the references in 'note')

lambda

Numeric vector. Regularization (or tuning) parameters. The defaults is NULL that provides default values with select = "lambda" and sqrt(log(p)/n) with select = "gamma".

n_lambda

Numeric. The number of \mjseqn\lambda's to be evaluated. Defaults to 50. This is disregarded if custom values are provided for lambda.

lambda_min_ratio

Numeric. The smallest value for lambda, as a fraction of the upperbound of the regularization/tuning parameter. The default is 0.01, which mimics the R package qgraph. To mimic the R package huge, set lambda_min_ratio = 0.1 and n_lambda = 10.

n_gamma

Numeric. The number of \mjseqn\gamma's to be evaluated. Defaults to 50. This is disregarded if custom values are provided in lambda.

initial

A matrix (p by p) or custom function that returns the inverse of the covariance matrix . This is used to compute the penalty derivative. The default is NULL, which results in using the inverse of R (see 'Note').

LLA

Logical. Should the local linear approximation be used (default to FALSE)?

unreg

Logical. Should the models be refitted (or unregularized) with maximum likelihood (defaults to FALSE)? Setting to TRUE results in the approach of \insertCiteFoygel2010;textualGGMncv, but with the regularization path obtained from nonconvex regularization, as opposed to the \mjseqn\ell_1-penalty.

maxit

Numeric. The maximum number of iterations for determining convergence of the LLA algorithm (defaults to 1e4). Note this can be changed to, say, 2 or 3, which will provide two and three-step estimators without convergence check.

thr

Numeric. Threshold for determining convergence of the LLA algorithm (defaults to 1.0e-4).

store

Logical. Should all of the fitted models be saved (defaults to TRUE)?

progress

Logical. Should a progress bar be included (defaults to TRUE)?

ebic_gamma

Numeric. Value for the additional hyper-parameter for the extended Bayesian information criterion (defaults to 0.5, must be between 0 and 1). Setting ebic_gamma = 0 results in BIC.

penalize_diagonal

Logical. Should the diagonal of the inverse covariance matrix be penalized (defaults to TRUE).

...

Additional arguments passed to initial when a function is provided and ignored otherwise.

Details

Several of the penalties are (continuous) approximations to the \mjseqn\ell_0 penalty, that is, best subset selection. However, the solution does not require enumerating all possible models which results in a computationally efficient solution.

L0 Approximations

Additional penalties:

gamma (\mjseqn\gamma):

The gamma argument corresponds to additional hyperparameter for each penalty. The defaults are set to the recommended values from the respective papers.

LLA

The local linear approximate is noncovex penalties was described in \insertCitefan2009networkGGMncv. This is essentially an iteratively re-weighted (g)lasso. Note that by default LLA = FALSE. This is due to the work of \insertCitezou2008one;textualGGMncv, which suggested that, so long as the starting values are good enough, then a one-step estimator is sufficient to obtain an accurate estimate of the conditional dependence structure. In the case of low-dimensional data, the sample based inverse covariance matrix is used for the starting values. This is expected to work well, assuming that \mjseqnn is sufficiently larger than \mjseqnp.

Generalized Information Criteria

The following are the available GIC:

Note that \mjseqn|\textbfE| denotes the number of edges (nonzero relations) in the graph, \mjseqnp the number of nodes (columns), and \mjseqnn the number of observations (rows). Further each can be understood as a penalty term added to negative 2 times the log-likelihood, that is,

\mjseqn

-2 l_n(\hat\boldsymbol\Theta) = -2 \Big[\fracn2 \textrmlog \textrmdet \hat\boldsymbol\Theta - \textrmtr(\hat\textbfS\hat\boldsymbol\Theta)\Big]

where \mjseqn\hat\boldsymbol\Theta is the estimated precision matrix (e.g., for a given \mjseqn\lambda and \mjseqn\gamma) and \mjseqn\hat\textbfS is the sample-based covariance matrix.

Value

An object of class ggmncv, including:

Note

initial

initial not only affects performance (to some degree) but also computational speed. In high dimensions (defined here as p > n), or when p approaches n, the precision matrix can become quite unstable. As a result, with initial = NULL, the algorithm can take a very (very) long time. If this occurs, provide a matrix for initial (e.g., using lw). Alternatively, the penalty can be changed to penalty = "lasso", if desired.

The R package glassoFast is under the hood of ggmncv \insertCitesustik2012glassofastGGMncv, which is much faster than glasso when there are many nodes.

References

\insertAllCited

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# data
Y <- GGMncv::ptsd

S <- cor(Y)

# fit model
# note: atan default
fit_atan <- ggmncv(S, n = nrow(Y),
                   progress = FALSE)

# plot
plot(get_graph(fit_atan),
     edge_magnify = 10,
     node_names = colnames(Y))

# lasso
fit_l1 <- ggmncv(S, n = nrow(Y),
                 progress = FALSE,
                 penalty = "lasso")

# plot
plot(get_graph(fit_l1),
     edge_magnify = 10,
     node_names = colnames(Y))


# for these data, we might expect all relations to be positive
# and thus the red edges are spurious. The following re-estimates
# the graph, given all edges positive (sign restriction).

# set negatives to zero (sign restriction)
adj_new <- ifelse( fit_atan$P <= 0, 0, 1)

check_zeros <- TRUE

# track trys
iter <- 0

# iterate until all positive
while(check_zeros){
  iter <- iter + 1
  fit_new <- constrained(S, adj = adj_new)
  check_zeros <- any(fit_new$wadj < 0)
  adj_new <- ifelse( fit_new$wadj <= 0, 0, 1)
}

# make graph object
new_graph <- list(P = fit_new$wadj,
                  adj = adj_new)
class(new_graph) <- "graph"

plot(new_graph,
     edge_magnify = 10,
     node_names = colnames(Y))

GGMncv documentation built on Dec. 15, 2021, 9:10 a.m.