I explicitly use this package to teach data cleaning, so have refactored my old cleaning code into several scripts. I also include them as compiled Markdown reports. Caveat: these are realistic cleaning scripts! Not the highly polished ones people write with 20/20 hindsight :) I wouldn't necessarily clean it the same way again (and I would download more recent data!), but at this point there is great value in reproducing the data I've been using for ~5 years.

Cleaning history

library(tidyverse)
library(stringr)
library(knitr)
library(here)

x <- tibble(fls = list.files(here("data-raw"))) %>%
  mutate(fls_basename = basename(fls)) %>%
  separate(fls_basename, c("script", "slug", "ext"), "[_\\.]")
x <- x %>%
  filter(
    script %>% str_detect("^[0-9]+"),
    ext %>% str_detect("R|r|md|tsv")
  ) %>%
  select(-slug)
y <- x %>%
  group_by(script) %>%
  nest()

collapse_md_links <- function(x) {
  x %>%
    {
      paste0("[", ., "](", ., ")")
    } %>%
    paste(collapse = ", ")
}
jfun <- function(z) {
  tibble(
    r_script = z$fls[z$ext == "R"] %>% collapse_md_links(),
    notebook = z$fls[z$ext == "md"] %>% collapse_md_links(),
    tsv = z$fls[z$ext == "tsv"] %>% collapse_md_links()
  )
}

y$data %>%
  map_df(jfun) %>%
  kable()
devtools::session_info()


jennybc/gapminder documentation built on March 14, 2023, 11:24 p.m.