I explicitly use this package to teach data cleaning, so have refactored my old cleaning code into several scripts. I also include them as compiled Markdown reports. Caveat: these are realistic cleaning scripts! Not the highly polished ones people write with 20/20 hindsight :) I wouldn't necessarily clean it the same way again (and I would download more recent data!), but at this point there is great value in reproducing the data I've been using for ~5 years.
Cleaning history
gdata
package. It was kind of painful, due to encoding and other issues. See the scripts in this state in v0.1.0.readxl
. This was much less painful. Present day.library(tidyverse) library(stringr) library(knitr) library(here) x <- tibble(fls = list.files(here("data-raw"))) %>% mutate(fls_basename = basename(fls)) %>% separate(fls_basename, c("script", "slug", "ext"), "[_\\.]") x <- x %>% filter( script %>% str_detect("^[0-9]+"), ext %>% str_detect("R|r|md|tsv") ) %>% select(-slug) y <- x %>% group_by(script) %>% nest() collapse_md_links <- function(x) { x %>% { paste0("[", ., "](", ., ")") } %>% paste(collapse = ", ") } jfun <- function(z) { tibble( r_script = z$fls[z$ext == "R"] %>% collapse_md_links(), notebook = z$fls[z$ext == "md"] %>% collapse_md_links(), tsv = z$fls[z$ext == "tsv"] %>% collapse_md_links() ) } y$data %>% map_df(jfun) %>% kable()
devtools::session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.