library(knitr) options(width = 102) knitr::opts_chunk$set(message = FALSE, warning = FALSE) library(ggplot2) theme_set(theme_bw())
Here we'll examine an example application of the widyr package, particularly the
pairwise_dist functions. We'll use the data on United Nations General Assembly voting from the
library(dplyr) library(unvotes) un_votes
This dataset has one row for each country for each roll call vote. We're interested in finding pairs of countries that tended to vote similarly.
Notice that the
vote column is a factor, with levels (in order) "yes", "abstain", and "no":
We may then be interested in obtaining a measure of country-to-country agreement for each vote, using the
library(widyr) cors <- un_votes %>% mutate(vote = as.numeric(vote)) %>% pairwise_cor(country, rcid, vote, use = "pairwise.complete.obs", sort = TRUE) cors
We could, for example, find the countries that the US is most and least in agreement with:
US_cors <- cors %>% filter(item1 == "United States of America") # Most in agreement US_cors # Least in agreement US_cors %>% arrange(correlation)
This can be particularly useful when visualized on a map.
library(maps) library(fuzzyjoin) library(countrycode) library(ggplot2) world_data <- map_data("world") %>% regex_full_join(iso3166, by = c("region" = "mapname")) %>% filter(region != "Antarctica")
US_cors %>% mutate(a2 = countrycode(item2, "country.name", "iso2c")) %>% full_join(world_data, by = "a2") %>% ggplot(aes(long, lat, group = group, fill = correlation)) + geom_polygon(color = "gray", size = .1) + scale_fill_gradient2() + coord_quickmap() + theme_void() + labs(title = "Correlation of each country's UN votes with the United States", subtitle = "Blue indicates agreement, red indicates disagreement", fill = "Correlation w/ US")
Another useful kind of visualization is a network plot, which can be created with Thomas Pedersen's ggraph package. We can filter for pairs of countries with correlations above a particular threshold.
library(ggraph) library(igraph) cors_filtered <- cors %>% filter(correlation > .6) continents <- tibble(country = unique(un_votes$country)) %>% filter(country %in% cors_filtered$item1 | country %in% cors_filtered$item2) %>% mutate(continent = countrycode(country, "country.name", "continent")) set.seed(2017) cors_filtered %>% graph_from_data_frame(vertices = continents) %>% ggraph() + geom_edge_link(aes(edge_alpha = correlation)) + geom_node_point(aes(color = continent), size = 3) + geom_node_text(aes(label = name), check_overlap = TRUE, vjust = 1, hjust = 1) + theme_void() + labs(title = "Network of countries with correlated United Nations votes")
Choosing the threshold for filtering correlations (or other measures of similarity) typically requires some trial and error. Setting too high a threshold will make a graph too sparse, while too low a threshold will make a graph too crowded.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.