knitr::opts_chunk$set( collapse = TRUE, comment = "#" )

This vignette shows you how to use the `UVA`

in the EGAnet package [@EGAnet]. The contents of this vignette are taken directly from @christensen2020unique. Following @christensen2019psychometric, `UVA`

provides two approaches for reducing redundancy in data: removing all but one redundant variable or creating latent variables from redundant variables. For the former approach, researchers select one variable from variables that are determined to be redundant and remove the other variables from the dataset. As a general heuristic, researchers can compute corrected item-test correlations for the variables in the redundant response set. The variable that has the largest correlation is likely to be the one that best captures the overall essence of the redundant variables [@devellis2017scale; @mcdonald1999test]. Other rules of thumb for this approach are to select variables that have the most variance [@devellis2017scale] and variables that are more general (e.g., "I often express my opinions" is better than "I often express my opinions in meetings" because it does not imply a specific context). For the latter approach, redundant variables can be combined in to a reflective latent variable and latent scores can be estimated, replacing the redundant variables. Following recent suggestions, for ordinal data with categories fewer than six, the Weighted Least Squares Mean- and Variance-adjusted (WLSMV) estimator is used; otherwise, if all categories are greater than or equal to six then Maximum Likelihood with Robust standard errors (MLR) is used [@rhemtulla2012can]. We strongly recommend the latent variable approach because it minimizes measurement error and retains all possible information available in the data.

Before digging into UVA, the data should be loaded from the *psychTools* package [@psychTools] in R.

# Download latest EGAnet package devtools::install_github("hfgolino/EGAnet", dependencies = c("Imports", "Suggests")) # Load packages library(psychTools) library(EGAnet) # Set seed for reproducibility set.seed(6724) # Load SAPA data # Select Five Factor Model personality items only idx <- na.omit(match(gsub("-", "", unlist(spi.keys[1:5])), colnames(spi))) items <- spi[,idx] # Obtain item descriptions for UVA key.ind <- match(colnames(items), as.character(spi.dictionary$item_id)) key <- as.character(spi.dictionary$item[key.ind])

The code above installs the latest *EGAnet* package, loads *EGAnet* and *psychTools*, sets a seed for random number generation, and obtains the 70 SAPA items that correspond to the five-factor model of personality as well as their respective item descriptions. The item descriptions are optional but provide convenient processing when deciding which items are redundant (see Figure 2).

Moving forward with the application of the `UVA`

, we start by evaluating the dimensional structure of the SAPA inventory *without* reducing redundancy. The following code can be run:

# EGA (with redundancy) ega.wr <- EGA(items, algorithm = "louvain", plot.EGA = FALSE) plot(ega.wr, plot.args = list(node.size = 8, edge.alpha = 0.2))

# Initial EGA knitr::include_graphics("./Figures/Fig6-1.png", dpi = 75)

Without performing UVA, EGA estimates that there are seven factors. Notably, there were a couple small factors identified by EGA: Factor 2 and Factor 7 (see Figure 1). Investigating the items' descriptions of these two factors, it seems likely that these represent minor factors of redundant variables: Factor 2 ("Believe that people are basically moral," "Believe that others have good intentions," "Trust people to mainly tell the truth," "Trust what people say," and "Feel that most people can't be trusted") and Factor 7 ("Enjoy being thought of as a normal mainstream person," "Rebel against authority," "Believe that laws should be strictly enforced," and "Try to follow the rules"). The divergence from the traditional five factor structure is likely due to these (and other) redundancies.[^3]

[^3]: For a comparison, we estimated dimensions using parallel analysis with polychoric correlations and principal component analysis (PCA) and principal axis factoring (PAF). These methods identified 13 and 14 dimensions, respectively.

To handle the redundancy in the scale, we can now use the `UVA`

function:

# Perform unique variable analysis (latent variable) sapa.ra <- UVA(data = items, method = "wTO", type = "adapt", key = key, reduce = TRUE, reduce.method = "latent", adhoc = TRUE)

There are a few arguments worth noting. First, `method`

will change the association method being used. By default, the weighted topological overlap method (`"wTO"`

) is applied. Second, `type`

will change the significance type being used. By default, adaptive alpha is used (`"adapt"`

). The `key`

argument will accept item descriptions that map to the variables in the `data`

argument. The `reduce`

argument, which defaults to `TRUE`

, is for whether the reduction process should occur. The `reduce.method`

is whether the reduction process should be to create latent variables of redundant items (`"latent"`

) or remove all but one of the redundant items (`"remove"`

). `reduce.method`

defaults to `"latent"`

(to continue with the `"remove"`

tutorial, skip to next section). Finally, `adhoc`

will perform an adhoc redundancy check using the weighted topological overlap method with threshold. This check is to determine whether redundancies still might exist in the data.

Next, we'll walk through the reduction process. After running the code above, the R console will output a target variable with a list of potential redundant variables (Figure 2) and an associated "redundancy chain" plot (see Figure 3).

# R console example knitr::include_graphics("./Figures/Figure7.png", dpi = 700)

In Figure 2, the potential redundant variables are listed below the target variable. Some of the potential redundant variables were *directly* identified as redundant with the target variable while other potential redundant variables were *indirectly* redundant meaning that they were redundant with one (or more) of the variables that were directly identified as redundant with the target variable but they themselves not actually identified as redundant with the target variable. In this way, there is a so-called "redundancy chain." Figure 3 provides a more intuitive depiction of this notion.

# Redundancy chain example knitr::include_graphics("./Figures/Figure8.png", dpi = 1300)

In the redundancy chain plot, each node represents a variable with label and color denoting the target variable ("Target" and blue, respectively) and potential redundancies (corresponding numbers and red, respectively). The connections between the nodes represent a regularized partial correlation with the thickness of an edge denoting its magnitude. The presence of an edge suggests that variables were identified as redundant rather than an actual network of associations. The interpretation of this plot would be that the target variable was identified with potential redundancy variables 1, 2, 3, and 4. Potential redundancy variable 5 was not redundant with the target variable but it was redundant with potential redundancy variable 4 (hence the "chain" of redundancy). When consulting the redundancy chain plot, researchers should pay particular attention to *cliques* or a fully connected set of nodes. In Figure 3, there are two 3-cliques (or triangles) with the target variable (i.e., Target -- 1 -- 2 and Target -- 1 -- 3).

In a typical psychometric network, these triangles contribute to a measure known as the *clustering coefficient* or the extent to which a node's neighbors are connected to each other. Based on this statistical definition, the clustering coefficient has recently been considered as a measure of redundancy in psychological networks [@costantini2019stability; @dinic2019centrality]. In this same sense, these triangles suggest that these variables are likely redundant. Therefore, triangles in these redundancy chain plots can be used as a heuristic to identify redundancies.

In our example, we selected these variables as redundant by inputting their numbers into the R console with commas separating them (i.e., `1, 2, 3`

). After pressing `ENTER`

, a new latent variable is created from these variables and a prompt appears to label it with a new name (e.g., `'Original ideation'`

). Finally, a message will appear confirming the creation of a latent variable and removal of the redundant variables from the dataset.

# Latent keying example knitr::include_graphics("./Figures/Figure9.png", dpi = 900)

For the second target variable (Figure 4), "Trust what people say," we combined it with all the other possible redundant items (i.e., `1, 2, 3, 4`

). Notably, there was one item that was reverse keyed, "Feel that most people cant be trusted," which was negatively correlated with the latent variable. Because there was an item negatively correlated with the latent variable, a secondary prompt appears asking to reverse code the latent variable so that the label can go in the desired direction. In review of the correlations of the variables with the latent variable, we can see that the latent variable is positively keyed already; therefore, we entered `n`

and labeled the component. If, however, the signs of the correlations were the inverse, then `y`

could be entered, which would reverse the meaning of the latent variable towards a positively keyed orientation. The function will proceed through the rest of the redundant variables until all have been handled (see Appendix for our handling).

After completing the UVA, an optional adhoc check of redundant variables can be performed using `adhoc = TRUE`

. `UVA`

performs this by default and will check if any redundancies remain using the weighted topological overlap method (`method = "wTO"`

) and threshold (`type = "threshold"`

). For our example, there were no longer any redundant variables. Our UVA reduced the dataset from 70 items down to 25 *personality components* or items or sets of items that share a common cause [@christensen2019psychometric]. These components largely correspond to the 27 identified components by Condon [-@condon2018sapa] suggesting that our approach was effective.

# EGA (with redundant variables combined) ega <- EGA(sapa.ra$reduced$data, algorithm = "louvain", plot.EGA = FALSE) plot(ega, plot.args = list(vsize = 8, edge.alpha = 0.2, label.size = 4, legend.names = c("Conscientiousness", "Neuroticism", "Extraversion", "Openness to Experience", "Agreeableness")))

# Re-estimate EGA knitr::include_graphics("./Figures/Fig10-1.png", dpi = 75)

With these components, we then re-estimated the dimensionality of the SAPA inventory using EGA. This time, five components resembling the five-factor model were estimated (Figure 5). These five factors also align and correspond to the expected factor structure of the SAPA inventory, corroborating the effectiveness of the UVA. In sum, our example demonstrates that redundancy can lead to minor factors, which may bias dimensionality estimates towards overfactoring (as shown in Figure 1). When this redundancy is handled, then the dimensionality estimates can be expected to be more accurate and in line with theoretical expectations (as shown in Figure 5).[^4] Similar results were achieved by using the remove all but one variable approach (see next section).

[^4]: For a comparison, we estimated dimensions using parallel analysis with polychoric correlations and principal component analysis (PCA) and principal axis factoring (PAF). These methods identified 5 and 6 dimensions, respectively.

In going through UVA with the remove all but one variable option (see code below), we selected the same variables as redundant as shown in Appendix but rather than creating a latent variable we removed all but one variable.

# Perform unique variable analysis (removing all but one variable) sapa.rm <- UVA(data = items, method = "wTO", type = "adapt", key = key, reduce = TRUE, reduce.method = "remove", adhoc = TRUE)

The presentation of UVA interface is mostly the same with one minor detail changed (Figure 6).

# R console example knitr::include_graphics("./Figures/Figure11.png", dpi = 600)

After selecting which variables are redundant, a variable is selected to be *kept*. To make this decision, the corrected item-test (or redundant set) correlations, means, standard deviations, and ranges of the variables are provided in R's plot window (Figure 7).

# R console example knitr::include_graphics("./Figures/Figure12.png", dpi = 600)

The row names of the table denote the redundancy options which are reprinted. For the entire analysis, we selected the variables which had the largest item-test correlations (i.e., "Item-Total r" in Figure 7) and when equivalent the largest standard deviation. After UVA was finished and the adhoc check confirmed there were no more redundancies, we re-estimated the dimensionality of the dataset (Figure 8).

# EGA (with redundant variables removed) ega.rm <- EGA(sapa.rm$reduced$data, algorithm = "louvain", plot.EGA = FALSE) plot(ega.rm, plot.args = list(vsize = 8, edge.alpha = 0.2, label.size = 4, layout.exp = 0.5, legend.names = c("Conscientiousness", "Neuroticism", "Extraversion", "Openness to Experience", "Agreeableness")))

# Removed dimensionality knitr::include_graphics("./Figures/Fig13-1.png", dpi = 75)

Consistent with results presented in the manuscript, five factors roughly resembling the five-factor model were found. The item placement for all items are appropriate for their dimensions as well. Similarly, parallel analysis identified five and six dimensions for principal component analysis and principal axis factoring, respectively. In all, the results largely align with one another, demonstrating that removing variables can be an effective approach to reducing redundancy in data.

\newpage

\begingroup \setlength{\parindent}{-0.5in} \setlength{\leftskip}{0.5in}

\endgroup

\newpage

# Read in .csv merged <- read.csv("./merged.items.csv") colnames(merged)[1] <- "Latent Variable" # Make table knitr::kable(merged)

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.