knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", results = "markup" ) library(dirtyr, warn.conflicts = FALSE, verbose = FALSE)
The goal of dirtyr is to provide additional tools for working with "dirty" data. It is true that on second thought the name "dirtyr" doesn't sound as appropriate as possibly "cleanr", but that has already been taken. In fact, I should probably change this because you may not want to put "dirtyr" into a search engine.
This package is in a very developmental state. Use with caution.
You can install the most recent development version of dirtyr from GitHub with:
devtools::install_github("jmbarbone/dirtyr")
Not every system will provide easily accessible dates.
Systems may allow for ambiguous dates to be recorded alongside exact dates.
unknown_date()
creates a date object out of the ambiguity and provides either the earliest or latest possible dates.
x <- c("UN Feb 2000", "1958", "17 UNK 1992") unknown_date(x, format = "dmy") unknown_date(x, format = "dmy", possible = "latest")
We can also split dates apart when we need to:
x <- c("2010-01-12", "2020-09-30", "1999-12-31") split_date(as.Date(x))
Or even parse a column inside a data frame;
xx <- data.frame( x1 = 1:3, x2 = runif(3), date1 = as.Date(c("1950-10-05", "2020-04-29", "1992-12-17")), x3 = letters[1:3], date2 = as.Date(c("2010-01-12", "2020-09-30", "1999-12-31"))) parse_date(xx, c("date1", "date2"))
x <- 2:5 names(x) <- letters[x] new_index <- c(1, 3, 5, 7) names(new_index) <- letters[new_index] reindex(x, new_index = new_index) reindex(x, new_index = new_index, keep_empty = TRUE)
to_boolean()
presents a simplified method of recoding vectors as logical values.
This function is flexible enough to take strict requirements for the true/false values or a simple operation.
x <- c("Y", "Y", "N", "N", "N/A", "Y - not sure", "YN", "YY") to_boolean(x, "Y", names= TRUE) y <- c(1, 2, 3, 4) to_boolean(y, 1, 2) to_boolean(y, "<= 1", 2) tibble::tibble(x = x, y = rep(y, 2)) %>% dplyr::mutate(xb = to_boolean(x, c("Y", "Y - not sure"), na = "N/A"), yb = to_boolean(y, "<= 1", ">= 3"))
## coming soon...
Easier conversions of non-numeric data to numeric (not just double). Here numeric is a value that can either be an integer or a double.
x <- c(as.character(1:5), NA, letters[1:2], "1e4") ## Check if values may be numeric maybe_numeric(x, names = TRUE) maybe_integer(x, names = TRUE) maybe_integer(c(x, 1.2), names = TRUE) ## Throws a warning as.numeric(x) ## Doesn't throw a warning (res <- to_numeric(x)) ## correctly identifies as integers class(res) ## control for turning off x %>% to_numeric(int = FALSE) %>% class()
This has special behaviors for factors where the character label is used rather than the numeric level.
f <- factor(c(seq(0, 1, .2), "No", "Na_character_")) ## Check if possibly numeric maybe_numeric(f, names = TRUE) maybe_integer(f, names = TRUE) ## Uses factor level as.numeric(f) ## uses text to_numeric(f)
This flips a vector (just a wrapper for rev()
) or a data.frame or matrix.
This does not perform any sorting on the data frame, just a flip.
iris %>% head() iris %>% head() %>% reverse()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.