library(xaringanthemer) style_mono_accent( base_color = "#1c5253", header_font_google = google_font("Josefin Sans"), text_font_google = google_font("Montserrat", "300", "300i"), code_font_google = google_font("Fira Mono") )
knitr::opts_chunk$set(cache=TRUE)
class: middle
Let's see it from our TidyTuesday Twitter community r emo::ji("bird")
: from informative to witty to artsy
.left-column[
Common Slave Routes: @drob
]
.right-column[
]
Let's see it from our TidyTuesday Twitter community r emo::ji("bird")
: from informative to witty to artsy
.left-column[
UN Voting: @jdavison
]
.right-column[
]
Let's see it from our TidyTuesday Twitter community r emo::ji("bird")
: from informative to witty to artsy
.left-column[
Simpson's guesting: @jmcastagnetto
]
.right-column[
]
Let's see it from our TidyTuesday Twitter community r emo::ji("bird")
: from informative to witty to artsy
.left-column[
Cocktail ingredients: @andreaNOdell
]
.right-column[
]
Let's see it from our TidyTuesday Twitter community r emo::ji("bird")
: from informative to witty to artsy
.left-column[
Art?: @cedscherer
]
.right-column[
]
Network analysis tries to uncover insights that are often covert when only records are analyzed as independent elements in a collection.
It allows us to leverage the structure of the collection and the emergent properties that arise from the interactions of the elements
class: inverse, center, middle
r emo::ji("+1")
.pull-left[
A graph^ (or network) is a representation of how things are connected.
In this example, each node is a country, and each edge represents existence of trading between countries
Sometimes edges have weights which measures 'strength' of relationship between nodes. Here, the thickness of the line represents country's share of exports ]
.pull-right[
United States perspective Total - All-Groups, for Export and Buyer 2016 https://wits.worldbank.org/CountryNetwork.aspx?lang=en ]
.footnote[
^symbolically, you might also see the notation G(V, E)
for graphs containing vertices V and edges E r emo::ji("smile")
]
.pull-left[ Centrality is a concept that deals with how important is a vertex in the network
In this paper, the authors have a section that tries to identify congested locations based on the locations' betweenness centrality.
.pull-right[
Multiplex networks in metropolitan areas: generic features and local effects
https://doi.org/10.1098/rsif.2015.0651
]
.pull-left[ Community detection or finding clusters is about identifying subgroups from the whole graph. Imagine this as an analog of k-means clustering or hierarchical clustering
Useful for - Analyzing fragmentation and polarization - "How to get away with breaking terrorist cells?" - Understanding flow of information - "How I met your 'Parler?'" - Summarization: from individual members to clusters - "How to train your ML with reduced dimensions?"
The visualization from this paper shows strong partisanship in the realm of TikTok duets. ]
.pull-right[
Dancing to the Partisan Beat: A First Analysis of Political Communication on TikTok
https://doi.org/10.1145/3394231.3397916
]
class: inverse, center, middle
Time to get our hands dirty!
# nothing is tidy without library(tidyverse) # manipulating graph data structure tidily library(tidygraph) # visualizing graphs (or graphing graphs :) ) library(ggraph) # this was the OG when I was in school ~5 years ago # tidygraph abstracts most of the APIs here # library(igraph)
Note: We'll be skipping API's and {rtweet}
since Layla covered this from last week.
Toy Example:
Who are the top mentionees and mentioners in the #TidyTuesday community?
What are some of the 'cliques' in this network, and who are its members?
Steps
.pull-left[
We'll start of by loading the twitter dataset in {dataedu}
library(dataedu) tt_tweets <- dataedu::tt_tweets
]
.pull-right[ Here are the variables in this table
names(tt_tweets) %>% paste(collapse=", ") %>% str_wrap(35) %>% cat()
Finding mentions
regex_pattern <- "@([A-Za-z]+[A-Za-z0-9_]+)(?![A-Za-z0-9_]*\\.)" mentions_step1 <- tt_tweets %>% # Use regular expression to identify all the usernames in a tweet mutate(all_mentions = str_extract_all(text, regex_pattern)) %>% unnest(all_mentions) %>% # hey, a friend from text analysis in ch. 11!! :) mutate(all_mentions = str_trim(all_mentions)) %>% select(sender = screen_name, all_mentions) print(utils::head(mentions_step1,2))
.pull-left[
r emo::ji("bulb")
An edgelist representation essentially lists all edges in a graph.
Each record represents the existence of a connection between a node to another.
mentions
is an edgelist table connecting a user A to another user B if, A mentions B.
mentions <- mentions_step1 %>% mutate(all_mentions = str_sub(all_mentions, start = 2)) %>% #remove `@` in the string of mentioned user # rename all_mentions to receiver select(sender, receiver = all_mentions)
]
.pull-right[
print(mentions)
class: center, middle
r emo::ji("bulb")
Segue Concept: (Un)DirectednessA graph can be directed or undirected.
As the name suggest, directed graphs are strict in a sense that a connection from node X to Y, does not necessarily imply that a connection exist from node Y to X. Further, if connections exist from X to Y, and from Y to X, their weights need not be the same.
Cheeky Example: In graphs of "relationships", if feelings are requited, you get an undirected graph; Unfortunately, if they are unrequited, then you get a directed graph.
mentions
table directed or undirected?.pull-left[
By using {tidygraph}
, we can convert the mentions
dataframe into a graph.
r emo::ji("bulb")
Notice how this is represented as an adjacency list. It is another represetation of a graph!
For a given node, it lists all of the nodes that are connected to it. For example: - @dgwinfred mentioned @CedScherer - @davidmasp mentioned @thomasp ]
.pull-right[
g <- as_tbl_graph(mentions) str(g)
{tidygraph}
(Out) Degree Centrality: who are the top mentioners?
.pull-left[
degree_stat <- g %>% mutate(degree=centrality_degree(mode="out")) %>% # pull out the "nodes" attribute of g activate(nodes) %>% as_tibble() %>% top_n(10, degree) %>% dplyr::arrange(desc(degree))
.pull-right[
print(degree_stat)
{tidygraph}
APIs to get graph statistics(In) Degree Centrality: who are the top mentionees?
.pull-left[
degree_stat <- g %>% mutate(degree=centrality_degree(mode="in")) %>% # pull out the "nodes" attribute of g activate(nodes) %>% as_tibble() %>% top_n(10, degree) %>% dplyr::arrange(desc(degree))
.pull-right[
print(degree_stat)
.pull-left[
Here, we use the APIs from {ggraph}
. It takes a graph object as its input. We can then add layout (which organizes the nodes based on different algorithms). geom_node_point
and geom_edge_link
specifies the network features that are mapped to the aesthetics.
p <- # we chose the kk layout as it created a graph which was easy-to-interpret, # but others are available; see ?ggraph ggraph(g, layout = "kk") + # this adds the points to the graph geom_node_point() + # this adds the links, or the edges; #alpha = .2 makes it so that the lines are partially transparent geom_edge_link(alpha = .2) + # this last line of code adds a ggplot2 # theme suitable for network graphs theme_graph()
]
.pull-right[
print(p)
Let's use the node's in-degree centrality to remove users that are not mentioned that often. Let's also visualize that by mapping the size and color aesthetic of the node to this centrality measure.
.pull-left[
p <- g %>% # we chose the kk layout as it created a graph which was easy-to-interpret, # but others are available; see ?ggraph mutate(degree=centrality_degree(mode="in")) %>% # pull out the "nodes" attribute of g activate(nodes) %>% filter(degree>5) %>% ggraph(layout = "kk") + geom_edge_link(alpha = .2) + geom_node_point(aes(size = degree, colour = degree)) + theme_graph()
]
.pull-right[
print(p)
]
We're not insinuating cliques in a "Mean Girl" way. Cliques has a precise meaning in graphs: a subgraph, where all the nodes are connected with each other.
Note: clique by default turns directed graphs to undirected graphs AFAIK
largest_clique_size <- igraph::clique_num(g) cliques <- igraph::cliques(g, largest_clique_size)
plots <- list() for (i in 1:3) { clique_members <- names(cliques[[i]]) plots[[i]] <- g %>% mutate(degree=centrality_degree(mode="in")) %>% activate(nodes) %>% mutate(is_in=name %in% clique_members) %>% filter(degree>3) %>% ggraph(layout = 'kk') + geom_edge_link(alpha = .1) + geom_node_point(aes(size = degree, colour = is_in)) + # theme_graph() + theme(legend.position = "none") + labs(caption=str_wrap(paste(clique_members, collapse=" "), 30)) }
print(patchwork::wrap_plots(plots, nrow = 1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.