Network Scores"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#"
)

Introduction

This vignette shows you how to compute network scores using the state-of-the-art psychometric network algorithms in R. The vignette will walk through an example that compares latent network scores to latent variable scores computed by confirmatory factor analysis (CFA) and will explain the similarities and differences between the two.

To get started, a few packages need to be installed (if you don't have them already) and loaded.

# Install 'NetworkToolbox' package
install.packages("NetworkToolbox")
# Install 'lavaan' package
install.packages("lavaan")
# Load packages
library(EGAnet)
library(NetworkToolbox)
library(lavaan)

Estimate Dimensions

To estimate dimensions, we'll use exploratory graph analysis [@golino2017ega1; @golino2017ega2; @golino2019ega3]. EGA first computes a network using either the graphical least absolute selection and shrinkage operator [GLASSO; @friedman2008sparse; @friedman2014glasso] with extended Bayesian information criterion (EBIC) from the R package qgraph [EBICglasso; @epskamp2018tutorial] or the triangulated maximally filtered graph [TMFG; @massara2016network] from the NetworkToolbox package [@NetworkToolbox]. EGA then applies the Walktrap community detection algorithm [@walktrap] from the igraph package [@igraph]. Below is the code to estimate EGA using the EBICglasso method with the NEO PI-3 openness to experience data (n = 802) in the NetworkToolbox package.

# Run EGA
ega <- EGA(neoOpen, model = "glasso", algorithm = "louvain", plot.EGA = FALSE)

As depicted above, EGA estimates there to be 7 dimensions of openness to experience. Using these dimensions, we can now estimate network scores; but first, we'll go into details about how these are estimated.

Network Loadings [@christensen2021equivalency]

Network loadings are roughly equivalent to factor loadings and differ only in the association measures used to compute them. For networks, the centrality measure node strength is used to compute the sum of the connections to a node. Previous simulation studies have reported that node strength is generally redundant with CFA factor loadings [@hallquist2019problems]. Importantly, Hallquist and colleagues [-@hallquist2019problems] found that a node's strength represents a combination of dominant and cross-factor loadings. To mitigate this issue, I've developed a function called net.loads, which computes the node strength for each node in each dimension, parsing out the connections that represent dominant and cross-dimension loadings. Below is the code to compute standardized ($std; unstandardized, $unstd) network loadings.

# Standardized
n.loads <- net.loads(A = ega)$std

To provide mathematical notation, Let $W$ represent a symmetric $m \times m$ matrix where $m$ is the number of terms. Node strength is then defined as:

$$S_i = \sum_{j = 1}^m |w_{ij}|,$$

$$L_{if} = \sum_{j \in f}^F |w_{ij}|,$$

where $|w_{ij}|$ is the absolute weight (e.g., partial correlation) between node $i$ and $j$, $S_i$ is the sum of the edge weights connected to node $i$ across all nodes ($n$; i.e., node strength for node $i$), $L_{if}$ is the sum of edge weights in factor $f$ that are connected to node $i$ (i.e., node $i$'s loading for factor $f$), and $F$ is the number of factors (in the network). This measure can be standardized using the following formula:

$$z_{L_{if}} = \frac{L_{if}}{\sqrt{\sum L_{.f}}},$$

where the denominator is equal to the square root of the sum of all the weights for nodes in factor $f$. Notably, the standardized loadings are absolute weights with the signs being added after the loadings have been computed [following the same procedure as factor loadings; @comrey2013first]. In contrast to factor loadings, the network loadings are computed after the number of factors have been extracted from the network's structure. Variables are deterministically assigned to specific factors via a community detection algorithm rather than the traditional factor analytic standard of their largest loading in the loading matrix. This means that some nodes may not have any connections to nodes in other factors in the network, leading some variables to have zeros for some factors in the network loading matrix.

These standardized network loadings summarize the information in the edge weights and so they depend on the type of association represented by the edge weight (e.g., partial correlations, zero-order correlations). Thus, the meaning of these network loadings will change based on the type of correlation used. Specifically, partial correlations represent the unique measurement of a factor whereas zero-order correlations represent the unique and shared contribution of a variable's measurement of a factor.

Network Scores [@golino2020dynega]

These network loadings form the foundation for computing network scores .Because the network loadings represent the middle ground between a saturated (EFA) and simple (CFA) structure, the network scores accommodate the inclusion of only the most important cross-loadings in their computation. This capitalizes on information often lost in typical CFA structures but reduces the cross-loadings of EFA structures to only the most important loadings.

To compute network scores, the following code can be used:

# Network scores
net.scores <- net.scores(data = neoOpen, A = ega)

The net.scores function will return three objects: scores, commCor, and loads. scores contain the network scores for each dimension and an overall score. commCor contains the partial correlations between the dimensions in the network (and with the overall score). Finally, loads will return the standardized network loadings described above. Below we detail the mathematical notation for computing network scores.

First, we take each community and identify items that do not have loadings on that community equal to zero:

$$z_{tf} = z_{NL_{i \in f}} \neq 0,$$

where $z_{NL_{f}}$ is the standardized network loadings for community $f$, and $z_{tf}$ is the network loadings in community $f$,that are not equal to zero. Next, $z_{tf}$ is divided by the standard deviation of the corresponding items in the data, $X$:

$$wei_{tc} = \frac{z_{tf}}{\sqrt{\frac{\sum_{i=1}^{t \in f} (X_i - \bar{X_t})^2}{n - 1}}},$$

where the denominator, $\sqrt{\frac{\sum_{i=1}^{t \in f} (X_i - \bar{X_t})^2}{n - 1}}$, corresponds to the standard deviation of the items with non-zero network loadings in community $f$, and $wei_{tf}$ is the weight for the non-zero loadings in community $f$. These can be further transformed into relative weights for each non-zero loading:

$$relWei_{tf} = \frac{wei_{t \in f}}{\sum^F_f wei_{t \in f}},$$

where $\sum^F_f wei_{t \in f}$ is the sum of the weights in community $f$, and $relWei_{tf}$ is the relative weights for non-zero loadings in community $f$. We then take these relative weights and multiply them to the corresponding items in the data, $X_{t \in f}$, to obtain the community (e.g., factor) score:

$$\hat{\theta_f} = \sum\limits^F_{f} X_{t \in f} \times relWei_{t \in f},$$

where $\hat{\theta_f}$ is the network score for community $f$.

Comparison to CFA Scores

It's important to note that CFA scores are typically computed using a simple structure (items only load on one factor) and regression techniques. Network scores, however, are computed using a complex structure and are a weighted composite rather than a latent factor.

# Latent variable scores
# Estimate CFA
output <- capture.output(cfa.net <- CFA(ega, estimator = "WLSMV", data = neoOpen, plot.CFA = FALSE))

# Compute latent variable scores
lv.scores <- lavPredict(cfa.net$fit)

# Initialize correlations vector
cors <- numeric(ega$n.dim)

# Compute correlations
for(i in 1:ega$n.dim)
{cors[i] <- cor(net.scores$std.scores[,i], lv.scores[,i], method = "spearman")}

# Create matrix for table
cors.mat <- as.matrix(round(cors,3))
colnames(cors.mat) <- "Correlations"
row.names(cors.mat) <- paste("Factor", 1:ega$n.dim)
# Latent variable scores
# Estimate CFA
cfa.net <- CFA(ega, estimator = "WLSMV", data = neoOpen)

# Compute latent variable scores
lv.scores <- lavPredict(cfa.net$fit)

# Initialize correlations vector
cors <- numeric(ega$n.dim)

# Compute correlations
for(i in 1:ega$n.dim)
{cors[i] <- cor(net.scores$std.scores[,i], lv.scores[,i], method = "spearman")}
# Print table
knitr::kable(cors.mat)

As shown in the table, the network scores strongly correlate with the latent variable scores. Because Spearman's correlation was used, the orderings of the values take precedence. These large correlations between the scores reflect considerable redundancy between these scores.

\newpage

References

\begingroup \setlength{\parindent}{-0.5in} \setlength{\leftskip}{0.5in}

\endgroup



Try the EGAnet package in your browser

Any scripts or data that you put into this service are public.

EGAnet documentation built on Feb. 17, 2021, 1:06 a.m.