In rOpenGov/rqog: Download data from the Quality of Government Institute data

knitr::opts_chunk$set(echo = TRUE,message = FALSE, warning = FALSE, cache = FALSE,
                      out.width = "100%")

compiled at r Sys.time()

Download data from the Quality of Government Institute data

Quotation from Quality of Governance institute website

The QoG Institute was founded in 2004 by Professor Bo Rothstein and Professor Sören Holmberg. It is an independent research institute within the Department of Political Science at the University of Gothenburg. We conduct and promote research on the causes, consequences and nature of Good Governance and the Quality of Government (QoG) - that is, trustworthy, reliable, impartial, uncorrupted and competent government institutions.

The main objective of our research is to address the theoretical and empirical problem of how political institutions of high quality can be created and maintained. A second objective is to study the effects of Quality of Government on a number of policy areas, such as health, the environment, social policy, and poverty. We approach these problems from a variety of different theoretical and methodological angles.

Quality of Government institute provides data in five different data sets, both in cross-sectional and longitudinal versions:

rqog-package provides access to Basic, Standard and OECD datasets through function read_qog(). Standard data has all the same indicators as in Basic data (367 variables) and an additional ~1600 indicators. Both basic and standard datasets have 194 countries. OECD dataset has 1020 indicators from 35 countries. rqog uses longitudinal datasets by default that have time-series of varying duration from majority of the indicators and countries.

Quality of Government Institute provides codebooks for all datasets:

You consult the codebooks for description of the data and indicators.

Installation

library(devtools)
install_github("ropengov/rqog")
library(rqog)

Examples

Download data and plot numeric indicators

Basic Data

Basic data has a selection of most common indicators, 344 indicators from 211 countries. Below is an example on how to extract data on population and Democracy (Freedom House/Polity) index from BRIC-countries from 1990 to 2010 and to plot it.

library(rqog)
library(dplyr)
library(ggplot2)
library(tidyr)
# Download a local coppy of the file
basic <- read_qog(which_data="basic", data_type = "time-series")
# Subset the data
dat.l <- basic %>% 
  # filter years and countries
  filter(year %in% 1990:2010,
         cname %in% c("Russia","China","India","Brazil")) %>% 
  # select variables
  select(cname,year,p_polity2,wdi_pop1564) %>% 
  # gather to long format
  gather(., var, value, 3:4) %>% 
  # remove NA values
  filter(!is.na(value))

# Plot
ggplot(dat.l, aes(x=year,y=value,color=cname)) + 
  geom_point() + geom_line() +
  geom_text(data = dat.l %>% 
              group_by(cname) %>% 
              filter(year == max(year)),
          aes(x=year,y=value,label=cname),
          hjust=1,vjust=-1,size=3,alpha=.8) +
  facet_wrap(~var, scales="free") +
  theme_minimal() +  
  theme(legend.position = "none") +
  labs(title = "Plotting QoG basic data",
       caption = "Data: QoG Basic data")

Standard data

Standard data includes 2190 indicators from 211 countries. Below is an example on how to extract data on Economic Performance and GINI index (World Bank estimate) from BRIC-countries and plot it.

library(rqog)
# Download a local coppy of the file
standard <- read_qog("standard", "time-series")
# Subset the data
dat.l <- standard %>% 
  # filter years and countries
  filter(year %in% 1990:2020,
         cname %in% c("Russia","China","India","Brazil")) %>% 
  # select variables
  select(cname,year,bti_ep,wdi_gini) %>% 
  # gather to long format
  gather(., var, value, 3:4) %>% 
  # remove NA values
  filter(!is.na(value))

# Plot the data
# Plot
ggplot(dat.l, aes(x=year,y=value,color=cname)) + 
  geom_point() + geom_line() +
  geom_text(data = dat.l %>% 
              group_by(cname) %>% 
              filter(year == max(year)),
          aes(x=year,y=value,label=cname),
          hjust=1,vjust=-1,size=3,alpha=.8) +
  facet_wrap(~var, scales="free") +
  theme_minimal() +  
  theme(legend.position = "none") +
  labs(title = "Plotting QoG Standard data",
       caption = "Data: QoG Standard data")

OECD data

OECD data includes 1006 variables, but from a smaller number of wealthier countries of 36. In the example below four indicators:

Total expenditure on health oecd_pphlthxp_t1c
Income inequality: GINI index (World Bank estimate) wdi_gini
Gross National Income per Capita oecd_natinccap_t1
Adjusted general government debt-to-GDP (excl. unfunded pension liability) oecd_govdebt_t1

We will include all the countries and all the years included in the data.

library(rqog)
# Download a local coppy of the file
oecd <- read_qog("oecd", "time-series")
# Subset the data
dat.l <- oecd %>% 
  # select variables
  select(cname,year,oecd_pphlthxp_t1c,wdi_gini,oecd_natinccap_t1,oecd_govdebt_t1) %>% 
  # gather to long format
  gather(., var, value, 3:6) %>% 
  # remove NA values
  filter(!is.na(value))

# Plot the data
# Plot
ggplot(dat.l, aes(x=year,y=value,color=cname)) + 
  geom_point() + geom_line() +
  geom_text(data = dat.l %>% 
              group_by(var,cname) %>% 
              filter(year == max(year)),
          aes(x=year,y=value,label=cname),
          hjust=1,vjust=-1,size=3,alpha=.8) +
  facet_wrap(~var, scales="free") +
  theme_minimal() +  
  theme(legend.position = "none") +
  labs(title = "Plotting QoG OECD data",
       caption = "Data: QoG OECD data")

Work with metadata and factor indicators

Packages is shipped with seven metadatas for each year (2016-2022) meta_basic_cs_2022, meta_basic_ts_2022, meta_std_cs_2022, meta_std_ts_2022, meta_oecd_cs_2022 and meta_oecd_ts_2022. Data frames are generated from original spss versions of data using tidymetadata::create_metadata()-function.

Browsing metadata

You can browse the content by applying grepl to name column. Let's find indicators containing term Corruption either in lower or uppercase.

library(rqog)
meta_basic_ts_2022[grepl("Corruption", meta_basic_ts_2022$name, ignore.case = TRUE),]

Assigning labels to values with metadata

The data rqoq imports to R is in .csv-format without the labels and names shipped together with spss or Stata formats. As such it is the desired format to work with in R, especially with numeric indicators. However, many of the indicators in QoG are factors meaning that they have discrete values with a corresponding label. You can use the metadatas to assign labels for values of such indicators. Lets take the ccp_cc as an example below and first print the value and label colums of the data.

meta_basic_ts_2022 %>% filter(code == "ccp_cc") %>% select(value,label)

Currently we have basic data in R in an object called basic. Lets see the frequencies of each value

basic %>% count(ccp_cc)

Now, using the metadata with assign values with corresponding labels

basic %>% 
  count(ccp_cc) %>% 
  mutate(ccp_cc_lab = meta_basic_ts_2022[meta_basic_ts_2022$code == "ccp_cc",]$label[match(ccp_cc,meta_basic_ts_2022[meta_basic_ts_2022$code == "ccp_cc",]$value)])

So, lets find two factor variables with few more values from the cross-sectional data

meta_basic_cs_2022 %>% 
  filter(class =="factor") %>% 
  group_by(code) %>% 
  summarise(n_of_values = n()) %>% 
  arrange(desc(n_of_values))

Lets take these two factors and summarise the regime types per regions

meta_basic_cs_2022 %>% 
  filter(code %in% c("ht_region","ht_colonial")) %>% 
  distinct(code, .keep_all = TRUE)

# lets download the cross-sectional data first
basic_cs <- read_qog(which_data = "basic", data_type = "cross-sectional")

plot_d <- basic_cs %>% 
  # group by region
  group_by(ht_region) %>% 
  # count per group frequencies of each regime type
  count(ht_colonial) %>%
  ungroup() %>% 
  # label
  mutate(ht_region_lab = meta_basic_ts_2022[meta_basic_ts_2022$code == "ht_region",]$label[match(ht_region,meta_basic_ts_2022[meta_basic_ts_2022$code == "ht_region",]$value)],
         ht_colonial_lab = meta_basic_ts_2022[meta_basic_ts_2022$code == "ht_colonial",]$label[match(ht_colonial,meta_basic_ts_2022[meta_basic_ts_2022$code == "ht_colonial",]$value)]) %>% 
  na.omit()
head(plot_d)

# lets abbreviate option '0. Never colonized by a Western overseas colonial power' to '0. Never'
plot_d$ht_colonial_lab[plot_d$ht_colonial_lab == "0. Never colonized by a Western overseas colonial power"] <- '0. Never'

Then we can create a simple bar plot

# indicators names from metadata
ind_name <- unique(meta_basic_cs_2022[meta_basic_cs_2022$code == "ht_colonial",]$name)
group_name <- unique(meta_basic_cs_2022[meta_basic_cs_2022$code == "ht_region",]$name)

ggplot(plot_d, aes(x=ht_colonial_lab,y=n)) + 
  geom_col() + 
  facet_wrap(~ht_region_lab, scales = "free") + 
  theme_minimal() + theme(axis.text.x = element_text(angle = 90, size = 7)) +
  labs(title = paste0(ind_name," by ",group_name ), 
       caption = "Data: Quality of Government institute", x = NULL, y = "number of countries") +
  coord_flip()