inst/script/make-admissions.R

library(tidyverse)
library(rvest)

url <- "https://en.wikipedia.org/wiki/Simpson%27s_paradox#Berkeley_gender_bias_case"
h <- read_html(url)

admissions <- h %>% 
  html_nodes("table") %>% 
  .[[2]] %>% 
  html_table(fill=TRUE)  %>%
  setNames(tolower(c("major", paste(names(.)[-1], .[1,-1], sep="_")))) %>%
  slice(2:n()) %>%
  gather(key, value, -major) %>%
  separate(key, c("gender", "key"), "_")  %>%
  spread(key, value) %>% 
  arrange(gender) %>%
  mutate_at(c("admitted","applicants"), funs(parse_number)) %>%
  data.frame(., stringsAsFactors = FALSE)

save(admissions, file="data/admissions.rda", compress="xz")

Try the dslabs package in your browser

Any scripts or data that you put into this service are public.

dslabs documentation built on May 29, 2024, 6:29 a.m.