knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
This package provides a single dataset of censal and intercensal population estimates for the United States by year of age and sex, for every year from 1900 to 2019.
uscenpops
is a data package for use in teaching.
You can install the beta version of uscenpops from GitHub with:
devtools::install_github("kjhealy/uscenpops")
drat
While using install_github()
works just fine, it would be nicer to be able to just type install.packages("uscenpops")
or update.packages("uscenpops")
in the ordinary way. We can do this using Dirk Eddelbuettel's drat package. Drat provides a convenient way to make R aware of package repositories other than CRAN.
First, install drat
:
if (!require("drat")) { install.packages("drat") library("drat") }
Then use drat
to tell R about the repository where uscenpops
is hosted:
drat::addRepo("kjhealy")
You can now install uscenpops
:
install.packages("uscenpops")
To ensure that the uscenpops
repository is always available, you can add the following line to your .Rprofile
or .Rprofile.site
file:
drat::addRepo("kjhealy")
With that in place you'll be able to do install.packages("uscenpops")
or update.packages("uscenpops")
and have everything work as you'd expect.
Note that the drat repository only contains data packages that are not on CRAN, so you will never be in danger of grabbing the wrong version of any other package.
The package works best with the tidyverse libraries.
library(tidyverse)
Load the data:
library(uscenpops)
uscenpops
library(dplyr) library(ggplot2) pop_pyr <- uscenpops %>% select(year, age, male, female) %>% pivot_longer(male:female, names_to = "group", values_to = "count") %>% group_by(year, group) %>% mutate(total = sum(count), pct = (count/total)*100, base = 0) pop_pyr
## Axis labels mbreaks <- c("1M", "2M", "3M") ## colors pop_colors <- c("#E69F00", "#0072B2") ## In-plot year labels dat_text <- data.frame( label = c(seq(1900, 2015, 5), 2019), year = c(seq(1900, 2015, 5), 2019), age = rep(95, 25), count = rep(-2.75e6, 25) ) pop_pyr$count[pop_pyr$group == "male"] <- -pop_pyr$count[pop_pyr$group == "male"] p <- pop_pyr %>% filter(year %in% c(seq(1900, 2015, 5), 2019)) %>% ggplot(mapping = aes(x = age, ymin = base, ymax = count, fill = group)) p + geom_ribbon(alpha = 0.9, color = "black", size = 0.1) + geom_label(data = dat_text, mapping = aes(x = age, y = count, label = label), inherit.aes = FALSE, vjust = "inward", hjust = "inward", fontface = "bold", color = "gray40", fill = "gray95") + scale_y_continuous(labels = c(rev(mbreaks), "0", mbreaks), breaks = seq(-3e6, 3e6, 1e6), limits = c(-3e6, 3e6)) + scale_x_continuous(breaks = seq(10, 100, 10)) + scale_fill_manual(values = pop_colors, labels = c("Females", "Males")) + guides(fill = guide_legend(reverse = TRUE)) + labs(x = "Age", y = "Population in Millions", title = "Age Distribution of the U.S. Population, 1900-2019", subtitle = "Age is top-coded at 75 until 1939, at 85 until 1979, and at 100 since then", caption = "Kieran Healy / kieranhealy.org / Data: US Census Bureau.", fill = "") + theme(legend.position = "bottom", plot.title = element_text(size = rel(2), face = "bold"), strip.background = element_blank(), strip.text.x = element_blank()) + coord_flip() + facet_wrap(~ year, ncol = 5)
The data are sourced from the US Census Bureau, from the residential estimates available in various formats and spans at https://www2.census.gov/programs-surveys/popest/tables/. In any year where multiple months were available, the July estimate was used.
citation("uscenpops")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.