I explicitly use this package to teach data cleaning, so have refactored my old cleaning code into several scripts. I also include them as compiled Markdown reports. Caveat: these are realistic cleaning scripts! Not the highly polished ones people write with 20/20 hindsight :) I wouldn't necessarily clean it the same way again (and I would download more recent data!), but at this point there is great value in reproducing the data I've been using for ~5 years.

Cleaning history

## 2015-12-30 I'm using dev version of almost everything but don't
## want to clutter the README with session_info()
## here the important versions
##  purrr      * 0.1.0.9000 2015-12-29 Github (hadley/purrr@13ba73a)   
##  tidyr      * 0.3.1.9000 2015-12-31 Github (hadley/tidyr@f85cdf4) 
suppressPackageStartupMessages(library(dplyr))
library(stringr)
suppressPackageStartupMessages(library(purrr))
library(tidyr)
library(knitr)

fls <- list.files()
x <- data_frame(fls) %>%
 separate(fls, c("script", "slug", "ext"), "[_\\.]", remove = FALSE)
x <- x %>% 
  filter(script %>% str_detect("^[0-9]+"),
         ext %>% str_detect("R|r|md|tsv")) %>% 
  select(-slug)
y <- x %>%
  group_by(script) %>% 
  nest()

collapse_md_links <- function(x) {
  x %>% {
    paste0("[", ., "](", ., ")")
    } %>% 
    paste(collapse = ", ")
}
jfun <- function(z) {
  data_frame(r_script = z$fls[z$ext == "R"] %>% collapse_md_links(),
             notebook = z$fls[z$ext == "md"] %>% collapse_md_links(),
             tsv = z$fls[z$ext == "tsv"] %>% collapse_md_links())
}

y$data %>% 
  map_df(jfun) %>% 
  kable()
devtools::session_info()


YTLogos/gapminder documentation built on May 20, 2019, 1:47 p.m.