In dgrtwo/rpanama: The Panama Papers Dataset

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-",
  fig.retina = 2,
  message = FALSE,
  width = 120,
  warning = FALSE
)

The Panama Papers in R

This provides the datasets from the ICIJ Offshore Leaks Database from the Panama Papers. This dataset was constructed by the International Consortium of Investigative Journalists and available here under the Creative Commons Attribution-ShareAlike license. Besides that data, this package is available under GPL-3.

Future versions of this package may soon include other data from the Panama Papers besides the offshore leaks dataset.

Installation

Install using devtools:

devtools::install_github("dgrtwo/rpanama")

Offshore Leaks

There are five main datasets that this package provides. First is information on the Entities, Intermediaries, Officers, and Addresses from the offshore leaks dataset (see here for the definitions of each):

library(rpanama)
library(dplyr)

Entities
Intermediaries
Officers
Addresses

These each make up nodes in a network. The all_edges dataset contains connections between these, including which entities have which offshore accounts.

all_edges

all_edges %>%
  count(rel_type, sort = TRUE)

Each of the five above datasets comes directly from the CSVs released by the ICIJ here.

Example

We could find which countries have the most offshore entities (note that there are multiple countries split by ;):

library(tidyr)
library(stringr)

most_common_countries <- Entities %>%
  select(node_id, countries) %>%
  unnest(country = str_split(countries, ";")) %>%
  count(country, sort = TRUE)

most_common_countries

We may also find inter-country intermediary relationships were most common. We could do this by joining the Intermediaries data with the all_edges data and then the Entities data:

entity_countries <- Entities %>%
  unnest(entity_country = str_split(countries, ";"),
         entity_code = str_split(country_codes, ";")) %>%
  select(node_id, entity_country, entity_code)

connections <- Intermediaries %>%
  unnest(intermediary_country = str_split(countries, ";"),
         intermediary_code = str_split(country_codes, ";")) %>%
  inner_join(all_edges, by = c(node_id = "node_1")) %>%
  inner_join(entity_countries, by = c(node_2 = "node_id")) %>%
  filter(rel_type == "intermediary of",
         intermediary_country != entity_country) %>%
  count(intermediary_country, entity_country, intermediary_code, entity_code) %>%
  ungroup() %>%
  arrange(desc(n))

connections

We could even put this on a map:

library(rworldmap)
library(geosphere)

country_centers <- getMap()@data %>%
  tbl_df() %>%
  select(code = ISO_A3, longitude = LON, latitude = LAT)

great_circles <- connections %>%
  filter(n > 100) %>%
  mutate(connection = row_number()) %>%
  select(intermediary_code, entity_code, n, connection) %>%
  gather(type, code, contains("code")) %>%
  inner_join(country_centers, by = "code") %>%
  group_by(connection, n) %>%
  arrange(desc(type)) %>%
  filter(n() == 2) %>%
  do(as.data.frame(gcIntermediate(c(.$longitude[1], .$latitude[1]),
                                  c(.$longitude[2], .$latitude[2]))))

library(ggplot2)
library(ggthemes)
library(ggalt)

world_map <- filter(map_data("world"), region != "Antarctica")

ggplot() +
  geom_map(data = world_map, map = world_map,
           aes(x = long, y = lat, map_id = region),
           color = "#b2b2b2", size = 0.15, fill = NA) +
  geom_path(data=great_circles, color = "#a50026", 
            aes(lon, lat, group = connection, alpha = n),
            arrow = arrow(length = unit(0.1, "inches"))) +
  scale_color_gradient2(low = "white", high = "#a50026", trans = "log") +
  coord_proj() +
  labs(title = "Most common offshore entity relationships") +
  theme_map(base_family="Arial") +
  #scale_size_continuous(trans = "log", range = c(.1, 1)) +
  theme(legend.position = "none", 
        plot.title=element_text(hjust = 0.5, size = 14))