knitr::opts_chunk$set(echo = TRUE, cache = TRUE, fig.width=10, fig.height=7, out.width = "100%", split = TRUE, fig.align = 'center', #fig.path='../man/figures/', fig.retina = 1, warning=FALSE, message=FALSE) library(knitr) library(kableExtra) # Table formatting and pipe (%>%) # format kable for document type kable <- function(...){ if (knitr::is_latex_output()){ head(..., 25) %>% knitr::kable(booktabs = TRUE, format = 'latex') %>% kable_styling(latex_options = c("striped", "scale_down", "HOLD_position")) } else { knitr::kable(...) %>% kable_styling() %>% scroll_box(height = "200px") } }
Understanding the gaps and connections across existing theories and findings is a perennial challenge in scientific research. Systematically reviewing scholarship is especially challenging for researchers who may lack domain expertise, including junior scholars or those exploring new substantive territory. Conversely, senior scholars may rely on longstanding assumptions and social networks that exclude new research. In both cases, ad hoc literature reviews hinder accumulation of knowledge. Scholars are rarely systematic in selecting relevant prior work or then identifying patterns across their sample. To encourage systematic, replicable, and transparent methods for assessing literature, we propose an accessible network-based framework for reviewing scholarship. In our method, we consider a literature as a network of recurring concepts (nodes) and theorized relationships among them (edges). Network statistics and visualization allow researchers to see patterns and offer reproducible characterizations of assertions about the major themes in existing literature.
netlit
provides functions to generate network statistics from a literature review. Specifically, it processes a dataset where each row is a proposed relationship ("edge") between two concepts or variables ("nodes").
The aim is to offer easy tools to begin using the power of network analysis in R for literature reviews. Using netlit
simply requires researchers to enter relationships they observe in prior studies into a simple spreadsheet.
netlit
R PackageThe netlit
package provides functions to generate network statistics from a literature review. Specifically, netlit
provides a wrapper for igraph
functions to facilitate using network analysis in literature reviews.
Install this package with
devtools::install_github("judgelord/netlit")
To install netlit
from CRAN, run the following:
install.packages("netlit")
The review()
function takes in a dataframe, data
, that includes from
and to
columns (a directed graph structure).
In the example below, we use example data from this project on redistricting. These data are a set of related concepts (from
and to
) in the redistricting literature and citations for these relationships (cites
and cites_empirical
). See vignette("netlit")
for more details on this example.
library(netlit) # Load replication version of main data and metadata on citations load(here::here("replication_data", "literature.rda")) literature %>% kable()
netlit
offers four main functions: make_edgelist()
, make_nodelist()
, augment_nodelist()
, and review()
.
review()
is the primary function. The others are helper functions that perform the individual steps that review()
does all at once. review()
takes in a dataframe with at least two columns representing linked concepts (e.g., a cause and an effect) and returns data augmented with network statistics. Users must either specify "from" nodes and "to" nodes with the from
and to
arguments or include columns named from
and to
in the supplied data
object.
review()
returns a list of three objects:
edgelist
(a list of relationships with edge_betweenness
calculated), nodelist
(a list of concepts with degree
and betweenness
calculated), and graph
object suitable for use in other igraph
functions or other network visualization packages. Users may wish to include edge attributes (e.g., information about the relationship between the two concepts) or node attributes (information about each concept). We show how to do so below. But first, consider the basic use of review()
:
lit <- review(literature, from = "from", to = "to") lit edges <- lit$edgelist edges %>% kable() nodes <- lit$nodelist nodes %>% kable()
Edge and node attributes can be added using the edge_attributes
and node_attributes
arguments. edge_attributes
is a vector that identifies columns in the supplied data frame that the user would like to retain. node_attributes
is a separate dataframe that contains attributes for each node in the primary data set. The example node_attributes
data include one column type
indicating a type for each each node/variable/concept.
# Load replication version of node type coding load(here::here("replication_data/node_attributes.rda")) node_attributes %>% kable() lit <- review(literature, edge_attributes = c("cites", "cites_empirical"), node_attributes = node_attributes) lit
Tip: to retain all variables from literature
, use edge_attributes = names(literature)
.
library(tidyverse) library(magrittr) library(ggraph)
clean <- . %>% str_replace_all("([a-z| |-]{8}) ","\\1\n") %>% str_replace_all(" ([a-z| |-]{9})", "\n\\1") %>% str_to_title() %>% str_replace("\nOf\n", "\nOf ") %>% str_replace("\nFellow ", " Fellow\n") %>% str_replace("\nState\n", " State\n") %>% str_replace("\nDistrict\n", " District\n") %>% str_replace("\nWith\n", " With\n") literature$from %<>% clean() literature$to %<>% clean() node_attributes$node %<>% clean()
We separated multiple cites to a theorized relationship with semicolons. Let's count the total number of citations and the number of citations to empirical work by splitting out each cite and measuring the length of that vector.
# count cites literature %<>% group_by(to, from) %>% mutate(cite_weight = str_split(cites, ";")[[1]] %>% length(), cite_weight_empirical = str_split(cites_empirical, ";",)[[1]] %>% length(), cite_weight_empirical = ifelse(is.na(cites_empirical), 0, cite_weight_empirical)) %>% ungroup() # subsets literature %<>% mutate(communities_node = str_c(to, from) %>% str_detect("Commun"), confound = case_when( from == "Preserve\nCommunities\nOf Interest" & to == "Rolloff" ~ T, from == "Voter\nInformation\nAbout Their\nDistrict" & to == "Rolloff" ~ T, from == "Preserve\nCommunities\nOf Interest" & to == "Voter\nInformation\nAbout Their\nDistrict" ~ T, T ~ F), empirical = ifelse(!is.na(cites_empirical), "Empirical work", "No empirical work"))
Now we use review()
on this expanded edgelist, including all variables in the literature
data with edge_attributes = names(literature)
.
# now with all node and edge attributes lit <- review(literature, edge_attributes = names(literature), node_attributes = node_attributes ) edges <- lit$edgelist edges %>% kable() nodes <- lit$nodelist nodes %>% kable()
igraph
object# define igraph object as g g <- lit$graph g
What does it mean?
D
means directed N
means named graph W
means weighted graph name (v/c)
means name is a node attribute and it's a character cite_weight (e/n)
means cite_weight is an edge attribute and it's numeric ggraph
We can also plot using the package ggraph
package to plot the igraph
object.
This package allows us to plot self-ties, but it is slightly more difficult to use ggplot features (e.g. colors and legend labels) compared to ggnetwork
.
set.seed(5) library(ggraph) p <- ggraph(g, layout = 'fr') + geom_node_point( aes(color = degree_total %>% as.factor() ), size = 6, alpha = .7 ) + geom_edge_arc2( start_cap = circle(3,'mm'), end_cap = circle(6, 'mm'), aes( color = cite_weight , linetype = empirical ), curvature = 0, arrow = arrow(length = unit(2, 'mm'), type = "open") ) + geom_edge_loop( start_cap = circle(5, 'mm'), end_cap = circle(2, 'mm'), aes( color = cite_weight , linetype = empirical ), n = 300, strength = .6, arrow = arrow(length = unit(2, 'mm'), type = "open") ) + geom_node_text( aes(label = name), size = 2.3) + ggplot2::theme_void() + theme(legend.position="bottom") + labs(edge_color = "Number of\nPublications", color = "Total Degree\nCentrality", edge_linetype = "") + scale_edge_colour_viridis( discrete = FALSE, option = "plasma", begin = 0, end = .9, direction = -1, guide = "legend", aesthetics = "edge_colour") + scale_color_viridis_d(option = "mako", begin = 1, end = .5) p ggsave(filename = "docs/netlit-replication_files/figure-html/figure2.pdf", width=10, height=7)
p + facet_wrap("communities_node") ggsave(filename = "docs/netlit-replication_files/figure-html/figure3a.pdf", width=20, height=7) p + facet_wrap("confound") ggsave(filename = "docs/netlit-replication_files/figure-html/figure3b.pdf", width=20, height=7)
Edge Betweenness
ggraph(g, layout = 'fr') + geom_node_point(size = 10, alpha = .1) + theme_void() + theme(legend.position="bottom" ) + scale_color_viridis_c(begin = .5, end = 1, direction = -1, option = "cividis") + scale_edge_color_viridis(begin = 0.2, end = .9, direction = -1, guide = "legend", option = "cividis") + geom_edge_arc2( start_cap = circle(3, 'mm'), end_cap = circle(5, 'mm'), aes( color = edge_betweenness, linetype = empirical ), curvature = .1, arrow = arrow(length = unit(2, 'mm'), type = "closed")) + geom_edge_loop(aes(color = edge_betweenness)) + geom_node_text(aes(label = name), size = 2.3) + labs(edge_color = "Edge Betweenness", color = "Node Betweenness", edge_linetype = "")
Node Betweenness
p <- ggraph(g, layout = 'fr') + geom_node_point( aes(color = betweenness), size = 6, alpha = .7 ) + geom_edge_arc2( start_cap = circle(3, 'mm'), end_cap = circle(6, 'mm'), aes( color = cite_weight, linetype = empirical ), curvature = 0, arrow = arrow(length = unit(2, 'mm'), type = "open") ) + geom_edge_loop( start_cap = circle(5, 'mm'), end_cap = circle(2, 'mm'), aes( color = cite_weight, linetype = empirical ), n = 300, strength = .6, arrow = arrow(length = unit(2, 'mm'), type = "open") ) + geom_node_text(aes(label = name), size = 2.3) + theme_void() + theme(legend.position="bottom") + labs(edge_color = "Number of\nPublications", color = "Betweeneness", edge_linetype = "") + scale_edge_color_viridis(option = "plasma", begin = 0, end = .9, direction = -1, guide = "legend") + scale_color_gradient2() p ggraph(g, layout = 'fr') + geom_node_point(aes(color = betweenness), size = 10, alpha = 1) + theme_void() + theme(legend.position="bottom") + scale_color_viridis_c(begin = .5, end = 1, direction = -1, option = "cividis") + scale_edge_color_viridis(begin = 0.2, end = .9, direction = -1, option = "cividis", guide = "legend") + geom_edge_arc2( start_cap = circle(3, 'mm'), end_cap = circle(5, 'mm'), aes( color = edge_betweenness, linetype = empirical ), curvature = .1, arrow = arrow(length = unit(2, 'mm'), type = "closed")) + geom_edge_loop(aes(color = edge_betweenness)) + labs(edge_color = "Edge Betweenness", color = "Node Betweenness", edge_linetype = "") + geom_node_text(aes(label = name), size = 2.3)
ggraph(g, layout = 'fr') + geom_node_point(aes(color = degree_total), size = 10, alpha = 1) + theme_void() + theme(legend.position="bottom" ) + scale_color_gradient2() + scale_edge_color_viridis(begin = 0.2, end = .9, direction = -1, option = "cividis", guide = "legend") + geom_edge_arc2( start_cap = circle(3, 'mm'), end_cap = circle(5, 'mm'), aes( color = edge_betweenness, linetype = empirical ), curvature = .1, arrow = arrow(length = unit(2, 'mm'), type = "closed")) + geom_edge_loop(aes(color = edge_betweenness)) + labs(edge_color = "Edge Betweenness", color = "Total Degree", edge_linetype = "") + geom_node_text(aes(label = name), size = 2.3)
Articles were chosen according to specific selection criteria. We first identified articles published since 2010 that either 1) were published in one of eight high-ranking journals or 2) gained at least 50 citations according to Google Scholar. We then chose articles that contained four possible key terms in the title or abstract.
# Journal articles in example data data("literature_metadata") literature_metadata %>% kable() # count publications per journal pub_table <- literature_metadata %>% filter(str_detect(paste(literature$cites, collapse = "|"), Author)) %>% count(Publication, name = "Articles") %>% mutate(Publication = case_when( Publication == "AJPS" ~ "American Journal of Political Science", Publication == "APSR" ~ "American Political Science Review", Publication == "BJPS" ~ "British Journal of Political Science", Publication == "JOP" ~ "The Journal of Politics", Publication == "NCL Review" ~ "North Carolina Law Review", Publication == "QJPS" ~ "Quarterly Journal of Political Science", TRUE ~ Publication )) pub_table %>% kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.