Description Usage Arguments Details Value Note References Examples
Gaussian graphical modeling with nonconvex regularization. A thorough survey of these penalties, including simulation studies investigating their properties, is provided in \insertCitewilliams2020beyond;textualGGMncv.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
R |
Matrix. A correlation matrix of dimensions p by p. |
n |
Numeric. The sample size used to compute the information criterion. |
penalty |
Character string. Which penalty should be used (defaults to |
ic |
Character string. Which information criterion should be used (defaults to |
select |
Character string. Which tuning parameter should be selected
(defaults to |
gamma |
Numeric. Hyperparameter for the penalty function.
Defaults to 3.7 ( |
lambda |
Numeric vector. Regularization (or tuning) parameters.
The defaults is |
n_lambda |
Numeric. The number of \mjseqn\lambda's to be evaluated. Defaults to 50.
This is disregarded if custom values are provided for |
lambda_min_ratio |
Numeric. The smallest value for |
n_gamma |
Numeric. The number of \mjseqn\gamma's to be evaluated. Defaults to 50.
This is disregarded if custom values are provided in |
initial |
A matrix (p by p) or custom function that returns
the inverse of the covariance matrix . This is used to compute
the penalty derivative. The default is |
LLA |
Logical. Should the local linear approximation be used (default to |
unreg |
Logical. Should the models be refitted (or unregularized) with maximum likelihood
(defaults to |
maxit |
Numeric. The maximum number of iterations for determining convergence of the LLA
algorithm (defaults to |
thr |
Numeric. Threshold for determining convergence of the LLA algorithm
(defaults to |
store |
Logical. Should all of the fitted models be saved (defaults to |
progress |
Logical. Should a progress bar be included (defaults to |
ebic_gamma |
Numeric. Value for the additional hyper-parameter for the
extended Bayesian information criterion (defaults to 0.5,
must be between 0 and 1). Setting |
penalize_diagonal |
Logical. Should the diagonal of the inverse covariance
matrix be penalized (defaults to |
... |
Additional arguments passed to |
Several of the penalties are (continuous) approximations to the \mjseqn\ell_0 penalty, that is, best subset selection. However, the solution does not require enumerating all possible models which results in a computationally efficient solution.
L0 Approximations
Atan: penalty = "atan"
\insertCitewang2016variableGGMncv.
This is currently the default.
Seamless \mjseqn\ell_0: penalty = "selo"
\insertCitedicker2013variableGGMncv.
Exponential: penalty = "exp"
\insertCitewang2018variableGGMncv
Log: penalty = "log"
\insertCitemazumder2011sparsenetGGMncv.
Sica: penalty = "sica"
\insertCitelv2009unifiedGGMncv
Additional penalties:
SCAD: penalty = "scad"
\insertCitefan2001variableGGMncv.
MCP: penalty = "mcp"
\insertCitezhang2010nearlyGGMncv.
Adaptive lasso (penalty = "adapt"
): Defaults to \mjseqn\gamma = 0.5
\insertCitezou2006adaptiveGGMncv. Note that for consistency with the
other penalties, \mjseqn\gamma \rightarrow 0 provides more penalization and
\mjseqn\gamma = 1 results in \mjseqn\ell_1 regularization.
Lasso: penalty = "lasso"
\insertCitetibshirani1996regressionGGMncv.
gamma (\mjseqn\gamma):
The gamma
argument corresponds to additional hyperparameter for each penalty.
The defaults are set to the recommended values from the respective papers.
LLA
The local linear approximate is noncovex penalties was described in
\insertCitefan2009networkGGMncv. This is essentially an iteratively
re-weighted (g)lasso. Note that by default LLA = FALSE
. This is due to
the work of \insertCitezou2008one;textualGGMncv, which suggested that,
so long as the starting values are good enough, then a one-step estimator is
sufficient to obtain an accurate estimate of the conditional dependence structure.
In the case of low-dimensional data, the sample based inverse
covariance matrix is used for the starting values. This is expected to work well,
assuming that \mjseqnn is sufficiently larger than \mjseqnp.
Generalized Information Criteria
The following are the available GIC:
GIC_1: |\textbfE| \cdot \textrmlog(n)
(ic = "gic_1"
or ic = "bic"
)
GIC_2: |\textbfE| \cdot p^1/3
(ic = "gic_2"
)
GIC_3: |\textbfE| \cdot 2 \cdot \textrmlog(p)
(ic = "gic_3"
or ic = "ric"
)
GIC_4: |\textbfE| \cdot 2 \cdot \textrmlog(p) +
\textrmlog\big(\textrmlog(p)\big)
(ic = "gic_4"
)
GIC_5: |\textbfE| \cdot \textrmlog(p) +
\textrmlog\big(\textrmlog(n)\big) \cdot \textrmlog(p)
(ic = "gic_5"
)
GIC_6: |\textbfE| \cdot \textrmlog(n)
\cdot \textrmlog(p)
(ic = "gic_6"
)
Note that \mjseqn|\textbfE| denotes the number of edges (nonzero relations) in the graph, \mjseqnp the number of nodes (columns), and \mjseqnn the number of observations (rows). Further each can be understood as a penalty term added to negative 2 times the log-likelihood, that is,
\mjseqn-2 l_n(\hat\boldsymbol\Theta) = -2 \Big[\fracn2 \textrmlog \textrmdet \hat\boldsymbol\Theta - \textrmtr(\hat\textbfS\hat\boldsymbol\Theta)\Big]
where \mjseqn\hat\boldsymbol\Theta is the estimated precision matrix (e.g., for a given \mjseqn\lambda and \mjseqn\gamma) and \mjseqn\hat\textbfS is the sample-based covariance matrix.
An object of class ggmncv
, including:
Theta
Inverse covariance matrix
Sigma
Covariance matrix
P
Weighted adjacency matrix
adj
Adjacency matrix
lambda
Tuning parameter(s)
fit
glasso fitted model (a list)
initial
initial
not only affects performance (to some degree) but also
computational speed. In high dimensions (defined here as p > n),
or when p approaches n, the precision matrix can become quite unstable.
As a result, with initial = NULL
, the algorithm can take a very (very) long time.
If this occurs, provide a matrix for initial
(e.g., using lw
).
Alternatively, the penalty can be changed to penalty = "lasso"
, if desired.
The R
package glassoFast is under the hood of ggmncv
\insertCitesustik2012glassofastGGMncv, which is much faster than
glasso when there are many nodes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | # data
Y <- GGMncv::ptsd
S <- cor(Y)
# fit model
# note: atan default
fit_atan <- ggmncv(S, n = nrow(Y),
progress = FALSE)
# plot
plot(get_graph(fit_atan),
edge_magnify = 10,
node_names = colnames(Y))
# lasso
fit_l1 <- ggmncv(S, n = nrow(Y),
progress = FALSE,
penalty = "lasso")
# plot
plot(get_graph(fit_l1),
edge_magnify = 10,
node_names = colnames(Y))
# for these data, we might expect all relations to be positive
# and thus the red edges are spurious. The following re-estimates
# the graph, given all edges positive (sign restriction).
# set negatives to zero (sign restriction)
adj_new <- ifelse( fit_atan$P <= 0, 0, 1)
check_zeros <- TRUE
# track trys
iter <- 0
# iterate until all positive
while(check_zeros){
iter <- iter + 1
fit_new <- constrained(S, adj = adj_new)
check_zeros <- any(fit_new$wadj < 0)
adj_new <- ifelse( fit_new$wadj <= 0, 0, 1)
}
# make graph object
new_graph <- list(P = fit_new$wadj,
adj = adj_new)
class(new_graph) <- "graph"
plot(new_graph,
edge_magnify = 10,
node_names = colnames(Y))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.