I explicitly use this package to teach data cleaning, so have refactored my old cleaning code into several scripts. I also include them as compiled Markdown reports. Caveat: these are realistic cleaning scripts! Not the highly polished ones people write with 20/20 hindsight :) I wouldn't necessarily clean it the same way again (and I would download more recent data!), but at this point there is great value in reproducing the data I've been using for ~5 years.
Cleaning history
gdata
package. It was kind of painful, due to encoding and other issues. See the scripts in this state in v0.1.0.readxl
. This was much less painful. Present day.## 2015-12-30 I'm using dev version of almost everything but don't ## want to clutter the README with session_info() ## here the important versions ## purrr * 0.1.0.9000 2015-12-29 Github (hadley/purrr@13ba73a) ## tidyr * 0.3.1.9000 2015-12-31 Github (hadley/tidyr@f85cdf4) suppressPackageStartupMessages(library(dplyr)) library(stringr) suppressPackageStartupMessages(library(purrr)) library(tidyr) library(knitr) fls <- list.files() x <- data_frame(fls) %>% separate(fls, c("script", "slug", "ext"), "[_\\.]", remove = FALSE) x <- x %>% filter(script %>% str_detect("^[0-9]+"), ext %>% str_detect("R|r|md|tsv")) %>% select(-slug) y <- x %>% group_by(script) %>% nest() collapse_md_links <- function(x) { x %>% { paste0("[", ., "](", ., ")") } %>% paste(collapse = ", ") } jfun <- function(z) { data_frame(r_script = z$fls[z$ext == "R"] %>% collapse_md_links(), notebook = z$fls[z$ext == "md"] %>% collapse_md_links(), tsv = z$fls[z$ext == "tsv"] %>% collapse_md_links()) } y$data %>% map_df(jfun) %>% kable()
devtools::session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.