knitr::opts_chunk$set(
  cache = FALSE,
  collapse = FALSE,
  warning = FALSE,
  message = FALSE,
  tidy = FALSE,
  fig.align='center',
  comment = "#>",
  fig.path = "man/figures/README-",
  R.options = list(width = 200)
)

netlit: Augment a literature review with network analysis statistics

CRAN status


Understanding the gaps and connections across existing theories and findings is a perennial challenge in scientific research. Systematically reviewing scholarship is especially challenging for researchers who may lack domain expertise, including junior scholars or those exploring new substantive territory. Conversely, senior scholars may rely on longstanding assumptions and social networks that exclude new research. In both cases, ad hoc literature reviews hinder accumulation of knowledge. Scholars are rarely systematic in selecting relevant prior work or then identifying patterns across their sample. To encourage systematic, replicable, and transparent methods for assessing literature, we propose an accessible network-based framework for reviewing scholarship. In our method, we consider a literature as a network of recurring concepts (nodes) and theorized relationships among them (edges). Network statistics and visualization allow researchers to see patterns and offer reproducible characterizations of assertions about the major themes in existing literature.

netlit provides functions to generate network statistics from a literature review. Specifically, it processes a dataset where each row is a proposed relationship ("edge") between two concepts or variables ("nodes"). The aim is to offer easy tools to begin using the power of network analysis in R for literature reviews. Using netlit simply requires researchers to enter relationships they observe in prior studies into a simple spreadsheet.

To install netlit from CRAN, run the following:

install.packages("netlit")

Basic Usage

The review() function takes in a dataframe, data, that includes from and to columns (a directed graph structure).

In the example below, we use example data from this project on redistricting. These data are a set of related concepts (from and to) in the redistricting literature and citations for these relationships (cites and cites_empirical). See the main netlit vignette for more details on this example.

library(netlit)

data("literature")

head(literature)

netlit offers four functions: make_edgelist(), make_nodelist(), augment_nodelist(), and review().

review() is the primary function (and probably the only one you need). The others are helper functions that perform the individual steps that review() does all at once. review() takes in a dataframe with at least two columns representing linked concepts (e.g., a cause and an effect) and returns data augmented with network statistics. Users must either specify "from" nodes and "to" nodes with the from and to arguments or include columns named from and to in the supplied data object.

review() returns a list of three objects:

  1. an augmented edgelist (a list of relationships with edge_betweenness calculated),
  2. an augmented nodelist (a list of concepts with degree and betweenness calculated), and
  3. a graph object suitable for use in other igraph functions or other network visualization packages.

Including node attributes

Users may wish to include edge attributes (e.g., information about the relationship between the two concepts) or node attributes (information about each concept). We show how to do so below. But first, consider the basic use of review():

lit <- review(literature, from = "from", to = "to")

lit

head(lit$edgelist)

head(lit$nodelist)

Edge and node attributes can be added using the edge_attributes and node_attributes arguments.

edge_attributes is a vector that identifies columns in the supplied data frame that the user would like to retain. (To retain all variables from literature, use edge_attributes = names(literature).)

node_attributes is a separate dataframe that contains attributes for each node in the primary data set. node_attributes must be a dataframe with a column node with values matching the to or from columns of the data argument.

The example node_attributes data include one column type indicating a type for each each node/variable/concept.

data("node_attributes")

head(node_attributes)

lit <- review(literature,
              edge_attributes = c("cites", "cites_empirical"),
              node_attributes = node_attributes)

lit

head(lit$edgelist)

head(lit$nodelist)

Mapping literature networks

Below is a plot of redistricting literature network from the main netlit vignette using the graph object returned by the netlit::review() function as the input to network graphing functions from packages like ggnetwork. The nodelist and edgelist objects also provide required inputs for other network visualization packages, e.g. ggraph or visNetwork (vignettes on how to make similar plots in ggraph and visNetwork will be posted shortly).

Nodes represent theoretical concepts, shaded by total degree centrality. Arrows connect concepts theorized as directional relationships in works, colored by number of works. Solid edges indicate empirically studied connections; dashed are relationships that have been theorized but not studied empirically.

knitr::include_graphics("man/figures/ggraph-1.png")


judgelord/netlit documentation built on Jan. 3, 2023, 2:42 p.m.