knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(virgo) library(dplyr) library(tidyr) library(gapminder)
This example is inspired by the Many Models example in R for Data Science by Grolemund and Wickham.
To begin we create a drop down menu with each continent
using input_select()
and then we create a line plot for each country's life expectancy over time. Here the selection modifies
the opacity of each line, so we can compare countries within a continent to each other:
selection <- select_bind( continent = input_select( name = "Select Continent:", choices = c(NA, levels(gapminder$continent))) ) lex <- gapminder %>% vega(enc(x = year, y = lifeExp, group = country)) %>% mark_line( enc(opacity = encode_if(selection, 0.5, 0.1)) ) lex
We see that for most countries life expectancy is generally increasing, however several countries from Africa and Asia have dips in their life expectancies.
Next for each country we can estimate a simple linear model to summarise the relationship of how life expectancy has changed over time. We will use $R^2$ to assess the quality of the fit and link that back to our raw data
# we could do this with broom but this seems # ok for now gapminder_augmented <- gapminder %>% group_nest(continent, country) %>% mutate( model = lapply(data, function(x) lm(lifeExp ~ year, data = x)), r2 = vapply(model, function(x) summary(x)$r.squared, numeric(1)), ) %>% unnest(data) %>% select(-model) gapminder_augmented
Now we can create a plot driven selection, that will display the $R^2$ values within each continent:
select_r2 <- select_interval("y") tick_plot <- gapminder_augmented %>% vega(enc(x = continent, y = r2)) %>% mark_tick( enc(opacity = encode_if(select_r2, 1, 0.1)) ) tick_plot
New we can overlay the real data alongside the model $R^2$ values to
identify which countries have poor fits. First, we modify the color scale
to match the provided gapminder::country_colors
and then produce a line plot.
palette <- country_colors[sort(names(country_colors))] country_fits <- gapminder_augmented %>% vega(enc(x = year, y = lifeExp, color = country)) %>% mark_line(selection = select_r2) %>% scale_color(range = palette ,guide = FALSE) country_fits
Next we combine them together to see which countries have poor fits.
hconcat(tick_plot, country_fits)
The countries with $R^2$ below 0.2 have strong non-linear trends in life expectancies hovering over the lines identifies them to be countries affected by genocide (Rwanda) or the HIV/AIDS epidemic (Botswana, Lesotho, Swaziland, Zambia, Zimbabwe).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.