enr: Expected Network Replicability
In donaldRwilliams/GGMnonreg: Non-Regularized Gaussian Graphical Models

Description Usage Arguments Value Note References Examples

Investigate network replicability for any kind of partial correlation, assuming there is an analytic solution for the standard error (e.g., Pearson's or Spearman's).

1	enr(net, n, alpha = 0.05, replications = 2, type = "pearson")

`net`	True network of dimensions p by p.
`n`	Integer. The samples size, assumed equal in the replication attempts.
`alpha`	The desired significance level (defaults to `0.05`). Note that 1 - alpha corresponds to specificity.
`replications`	Integer. The desired number of replications.
`type`	Character string. Which type of correlation coefficients to be computed. Options include `"pearson"` (default) and `"spearman"`.

An list of class enr including the following:

ave_power: Average power.
cdf: cumulative distribution function.
p_s: Power for each edge, or the probability of success for a given trial.
p: Number of nodes.
n_nonzero: Number of edges.
n: Sample size.
replication: Replication attempts.
var_pwr: Variance of power.
type: Type of correlation coefficient.

This method was introduced in \insertCitewilliams2020learning;textualGGMnonreg.

The basic idea is to determine the replicability of edges in a partial correlation network. This requires defining the true network, which can include edges of various sizes, and then solving for the proportion of edges that are expected to be replicated (e.g. in two, three, or four replication attempt).

\insertAllCited

# (1) define partial correlation network

# correlations from ptsd symptoms
cors <- cor(GGMnonreg::ptsd)

# inverse
inv <- solve(cors)

# partials
pcors <-  -cov2cor(inv)

# set values to zero
# (this is the partial correlation network)
pcors <- ifelse(abs(pcors) < 0.05, 0, pcors)


# compute ENR in two replication attempts
fit_enr <- enr(net = pcors,
               n = 500,
               replications = 2)


# intuition for the method:
# The above did not require simulation, and here I use simulation
# for the same purpose.

# location of edges
# (where the edges are located in the network)
index <- which(pcors[upper.tri(diag(20))] != 0)

# convert network a into correlation matrix
# (this is needed to simulate data)
diag(pcors) <- 1
cors_new <- corpcor::pcor2cor(pcors)

# replicated edges
# (store the number of edges that were replicated)
R <- NA

# simulate how many edges replicate in two attempts
# (increase 100 to, say, 5,000)
for(i in 1:100){

  # two replications
  Y1 <- MASS::mvrnorm(500, rep(0, 20), cors_new)
  Y2 <- MASS::mvrnorm(500, rep(0, 20), cors_new)

  # estimate network 1
  fit1 <- ggm_inference(Y1, boot = FALSE)

  # estimate network 2
  fit2 <- ggm_inference(Y2, boot = FALSE)

  # number of replicated edges (detected in both networks)
  R[i] <- sum(
    rowSums(
      cbind(fit1$adj[upper.tri(diag(20))][index],
            fit2$adj[upper.tri(diag(20))][index])
    ) == 2)
}


# combine simulation and analytic
cbind.data.frame(
  data.frame(simulation = sapply(seq(0, 0.9, 0.1), function(x) {
    mean(R > round(length(index) * x) )
  })),
  data.frame(analytic = round(fit_enr$cdf, 3))
)

# now compare simulation to the analytic solution
# average replicability (simulation)
mean(R / length(index))

# average replicability (analytic)
fit_enr$ave_pwr