knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
GGMncv is set up for low-dimensional settings, that is, when the number of observations ($n$) is much greater than the number of nodes ($p$). This is perhaps not typical in the Gaussian graphical modeling literature, and is a direct result of my (the author of GGMncv) field encountering low-dimensional data most often [see for example @williams2019nonregularized; @williams2020back]. As a result, the defaults are honed in for low-dimensional data!
Of course, GGMncv can readily be used for high-dimensional data. In what follows, I highlight several issues that may arise, in addition to solutions to overcome those issues.
By default, GGMncv uses the sample based (inverse) covariance matrix for the initial values, which is needed for employing nonconvex regularization. When $p > n$, GGMncv will produce an error because the sample based (inverse) covariance matrix cannot be inverted in this situation.
For example, notice the error when running the following code:
library(GGMncv) # p > n set.seed(2) main <- gen_net(p = 50, edge_prob = 0.05) set.seed(2) y <- MASS::mvrnorm(n = 49, mu = rep(0, 50), Sigma = main$cors) fit <- ggmncv(cor(y), n = nrow(y))
The solution is to provide an function for the initial matrix. To this end,
GGMncv includes the function lediot_wolf
which is a shrinkage
estimator [@ledoit2004well]. It is important to note that any
function can be used, so long as it return the inverse correlation
matrix.
fit <- ggmncv(cor(y), n = nrow(y), penalty = "atan", progress = FALSE, initial = ledoit_wolf, Y = y)
Notice the Y = y
, which is used internally to pass additional arguments
via ...
to the function provided in initial
.
The conditional dependence structure can then be plotted with
plot(get_graph(fit), node_size = 5)
Here is an example of providing a function.
initial_ggmncv <- function(y, ...){ Rinv <- corpcor::invcor.shrink(y, verbose = FALSE) return(Rinv) } fit2 <- ggmncv(cor(y), n = nrow(y), penalty = "atan", progress = FALSE, initial = initial_ggmncv, y = y) plot(get_graph(fit2), node_size = 5)
Perhaps it is of interest to compare performance, given that different initial values were used.
# ledoit and wolf score_binary(estimate = fit$adj, true = main$adj, model_name = "lw") # Shaffer and strimmer score_binary(estimate = fit2$adj, true = main$adj, model_name = "ss")
Perhaps a trickier situation is when the covariance matrix can be inverted, but it is still ill-conditioned. This can occur when $p$ approaches but does not exceed $n$. Here performance can be very bad.
# p -> n main <- gen_net(p = 50, edge_prob = 0.05) y <- MASS::mvrnorm(n = 60, mu = rep(0, 50), Sigma = main$cors) fit <- ggmncv(cor(y), n = nrow(y), penalty = "atan", progress = FALSE) score_binary(estimate = fit$adj, true = main$adj)
This is extremely problematic because there was no error, and the performance was terrible (note: 1 - specificity = the false positive rate).
One solution is again to provide a function to initial
.
fit <- ggmncv(cor(y), n = nrow(y), progress = FALSE, penalty = "atan", initial = ledoit_wolf, Y = y) score_binary(estimate = fit$adj, true = main$adj)
An additional solution is to use $L_1$-regularization, i.e.,
fit_l1 <- ggmncv(cor(y), n = nrow(y), progress = FALSE, penalty = "lasso") score_binary(estimate = fit_l1$adj, true = main$adj)
A quick comparison of KL-divergence
# atan kl_mvn(true = solve(main$cors), estimate = fit$Theta) # lasso kl_mvn(true = solve(main$cors), estimate = fit_l1$Theta)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.