knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "75%"
)

GGMnonreg: Non-regularized Gaussian Graphical Models

CRAN Version Downloads CircleCI build status zenodo

The goal of GGMnonreg is to estimate non-regularized graphical models. Note that the title is a bit of a misnomer, in that Ising and mixed graphical models are also supported.

Graphical modeling is quite common in fields with wide data, that is, when there are more variables than observations. Accordingly, many regularization-based approaches have been developed for those kinds of data. There are key drawbacks of regularization when the goal is inference, including, but not limited to, the fact that obtaining a valid measure of parameter uncertainty is very (very) difficult.

More recently, graphical modeling has emerged in psychology [@Epskamp2018ggm], where the data is typically long or low-dimensional [p < n; @williams2019nonregularized; @williams_rethinking]. The primary purpose of GGMnonreg is to provide methods specifically for low-dimensional data (e.g., those common to psychopathology networks).

Supported Models

Additional methods

The following are also included

Installation

To install the latest release version (1.1.0) from CRAN use

install.packages("GGMnonreg")    

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("donaldRwilliams/GGMnonreg")

Ising

An Ising model is fitted with the following

library(GGMnonreg)

# make binary
Y <- ifelse(ptsd[,1:5] == 0, 0, 1)

# fit model
fit <- ising_search(Y, IC = "BIC", 
                    progress = FALSE)

fit

Note the same code, more or less, is also used for GGMs and mixed graphical models.

Predictability

It is common to compute predictability, or variance explained, for each node in the network. An advantage of GGMnonreg is that a measure of uncertainty is also provided.

# data
Y <- na.omit(bfi[,1:5])

# fit model
fit <- ggm_inference(Y, boot = FALSE)

# predictability
predictability(fit)

Parameter Uncertainty

Confidence intervals for each relation are obtained with

# data
Y <- na.omit(bfi[,1:5])

# fit model
fit <- ggm_inference(Y, boot = TRUE, 
                     method = "spearman", 
                     B = 100, progress = FALSE)

confint(fit)

These can then be plotted with, say, ggplot2 (left to the user).

Edge Inclusion

When mining data, or performing an automatic search, it is difficult to make inference on the network parameters (e.g., confidence are not easily computed). To summarize data mining, GGMnonreg provides edge inclusion "probabilities" (proportion bootstrap samples for which each relation was detected).

# data
Y <- na.omit(bfi[,1:5])

# fit model
fit <-  eip(Y, method = "spearman", 
            B  = 100, progress = FALSE)

fit

Note in all cases, the provided estimates correspond to the upper-triangular elements of the network.

Expected Network Replicability

GGMnonreg allows for computing expected network replicability (ENR), i.e., the number of effects that will be detected in any number of replications. This is an analytic solution.

The first step is defining a true network

# first make the true network
cors <- cor(GGMnonreg::ptsd)

# inverse
inv <- solve(cors)

# partials
pcors <-  -cov2cor(inv)

# set values to zero
pcors <- ifelse(abs(pcors) < 0.05, 0, pcors)

Then obtain ENR

fit_enr <- enr(net = pcors, n = 500, replications = 2)

fit_enr

Note this is inherently frequentist. As such, over the long run, 45 % of the edges will be replicated on average. Then we can further infer that, in hypothetical replication attempts, more than half of the edges will be replicated only 5 % of the time.

ENR can also be plotted

plot_enr(fit_enr)

Intuition

Here is the basic idea of ENR

# location of edges
index <- which(pcors[upper.tri(diag(20))] != 0)

# convert network into correlation matrix
diag(pcors) <- 1
cors_new <- corpcor::pcor2cor(pcors)

# replicated edges
R <- NA

# increase 1000 to, say, 5,000
for(i in 1:1000){

  # two replications
  Y1 <- MASS::mvrnorm(500, rep(0, 20), cors_new)
  Y2 <- MASS::mvrnorm(500, rep(0, 20), cors_new)

  # estimate network 1
  fit1 <- ggm_inference(Y1, boot = FALSE)

  # estimate network 2
  fit2 <- ggm_inference(Y2, boot = FALSE)

  # number of replicated edges (detected in both networks)
  R[i] <- sum(
    rowSums(
      cbind(fit1$adj[upper.tri(diag(20))][index],
            fit2$adj[upper.tri(diag(20))][index])
    ) == 2)
}

Notice that replication of two networks is being assessed over the long run. In other words, if we draw two random samples, what is the expected replicability.

Compare analytic to simulation

# combine simulation and analytic
cbind.data.frame(
  data.frame(simulation = sapply(seq(0, 0.9, 0.1), function(x) {
    mean(R > round(length(index) * x) )
  })),
  data.frame(analytic = round(fit_enr$cdf, 3))
)

# average replicability (simulation)
mean(R / length(index))

# average replicability (analytic)
fit_enr$ave_pwr

ENR works with any correlation, assuming there is an estimate of the standard error.

Network plot

# data
Y <- ptsd

# estimate graph
fit <- ggm_inference(Y, boot = FALSE)

# get info for plotting
plot(fit, edge_magnify = 5)

Bug Reports, Feature Requests, and Contributing

Bug reports and feature requests can be made by opening an issue on Github. To contribute towards the development of GGMnonreg, you can start a branch with a pull request and we can discuss the proposed changes there.

References



donaldRwilliams/GGMnonreg documentation built on Nov. 13, 2021, 9:57 a.m.