knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(virgo)
library(dplyr)
library(tidyr)
library(gapminder)

This example is inspired by the Many Models example in R for Data Science by Grolemund and Wickham.

To begin we create a drop down menu with each continent using input_select() and then we create a line plot for each country's life expectancy over time. Here the selection modifies the opacity of each line, so we can compare countries within a continent to each other:

selection <- select_bind(
  continent = input_select(
    name = "Select Continent:",
    choices = c(NA, levels(gapminder$continent)))
)

lex <- gapminder %>% 
  vega(enc(x = year, y = lifeExp, group = country)) %>% 
  mark_line(
    enc(opacity = encode_if(selection, 0.5, 0.1))
  )
lex 

We see that for most countries life expectancy is generally increasing, however several countries from Africa and Asia have dips in their life expectancies.

Next for each country we can estimate a simple linear model to summarise the relationship of how life expectancy has changed over time. We will use $R^2$ to assess the quality of the fit and link that back to our raw data

# we could do this with broom but this seems
# ok for now
gapminder_augmented <- gapminder %>% 
    group_nest(continent, country) %>% 
    mutate(
      model = lapply(data, function(x) lm(lifeExp ~ year, data = x)),
      r2 = vapply(model, function(x) summary(x)$r.squared, numeric(1)),
    ) %>% 
  unnest(data) %>% 
  select(-model)

gapminder_augmented

Now we can create a plot driven selection, that will display the $R^2$ values within each continent:

select_r2 <- select_interval("y")

tick_plot <- gapminder_augmented %>% 
    vega(enc(x = continent, y = r2)) %>% 
    mark_tick(
      enc(opacity = encode_if(select_r2, 1, 0.1))
    )
tick_plot

New we can overlay the real data alongside the model $R^2$ values to identify which countries have poor fits. First, we modify the color scale to match the provided gapminder::country_colors and then produce a line plot.

palette <- country_colors[sort(names(country_colors))]

country_fits <- gapminder_augmented %>% 
  vega(enc(x = year, y = lifeExp, color = country)) %>% 
  mark_line(selection = select_r2) %>% 
  scale_color(range = palette ,guide = FALSE)

country_fits

Next we combine them together to see which countries have poor fits.

hconcat(tick_plot, country_fits)

The countries with $R^2$ below 0.2 have strong non-linear trends in life expectancies hovering over the lines identifies them to be countries affected by genocide (Rwanda) or the HIV/AIDS epidemic (Botswana, Lesotho, Swaziland, Zambia, Zimbabwe).



vegawidget/virgo documentation built on May 3, 2021, 7:32 a.m.