knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  results = "markup"
)
library(dirtyr, warn.conflicts = FALSE, verbose = FALSE)

dirtyr

The goal of dirtyr is to provide additional tools for working with "dirty" data. It is true that on second thought the name "dirtyr" doesn't sound as appropriate as possibly "cleanr", but that has already been taken. In fact, I should probably change this because you may not want to put "dirtyr" into a search engine.

This package is in a very developmental state. Use with caution.

Installation

You can install the most recent development version of dirtyr from GitHub with:

devtools::install_github("jmbarbone/dirtyr")

Dates

Not every system will provide easily accessible dates. Systems may allow for ambiguous dates to be recorded alongside exact dates. unknown_date() creates a date object out of the ambiguity and provides either the earliest or latest possible dates.

x <- c("UN Feb 2000", "1958", "17 UNK 1992")

unknown_date(x, format = "dmy")

unknown_date(x, format = "dmy", possible = "latest")

We can also split dates apart when we need to:

x <- c("2010-01-12", "2020-09-30", "1999-12-31")
split_date(as.Date(x))

Or even parse a column inside a data frame;

xx <- data.frame(
  x1 = 1:3,
  x2 = runif(3),
  date1 = as.Date(c("1950-10-05", "2020-04-29", "1992-12-17")),
  x3 = letters[1:3],
  date2 = as.Date(c("2010-01-12", "2020-09-30", "1999-12-31")))

parse_date(xx, c("date1", "date2"))

Reindex

x <- 2:5
names(x) <- letters[x]
new_index <- c(1, 3, 5, 7)
names(new_index) <- letters[new_index]

reindex(x, new_index = new_index)

reindex(x, new_index = new_index, keep_empty = TRUE)

To boolean

to_boolean() presents a simplified method of recoding vectors as logical values. This function is flexible enough to take strict requirements for the true/false values or a simple operation.

x <- c("Y", "Y", "N", "N", "N/A", "Y - not sure", "YN", "YY")
to_boolean(x, "Y", names= TRUE)

y <- c(1, 2, 3, 4)
to_boolean(y, 1, 2)

to_boolean(y, "<= 1", 2)

tibble::tibble(x = x,
               y = rep(y, 2)) %>% 
  dplyr::mutate(xb = to_boolean(x, c("Y", "Y - not sure"), na = "N/A"),
                yb = to_boolean(y, "<= 1", ">= 3"))

QC

## coming soon...

Smaller things

Text to numeric

Easier conversions of non-numeric data to numeric (not just double). Here numeric is a value that can either be an integer or a double.

x <- c(as.character(1:5), NA, letters[1:2], "1e4")

## Check if values may be numeric
maybe_numeric(x, names = TRUE)

maybe_integer(x, names = TRUE)

maybe_integer(c(x, 1.2), names = TRUE)

## Throws a warning
as.numeric(x)

## Doesn't throw a warning
(res <- to_numeric(x))

## correctly identifies as integers
class(res)

## control for turning off
x %>% 
  to_numeric(int = FALSE) %>% 
  class()

This has special behaviors for factors where the character label is used rather than the numeric level.

f <- factor(c(seq(0, 1, .2), "No", "Na_character_"))

## Check if possibly numeric
maybe_numeric(f, names = TRUE)

maybe_integer(f, names = TRUE)

## Uses factor level
as.numeric(f)

## uses text
to_numeric(f)                    

Reversing data.frames

This flips a vector (just a wrapper for rev()) or a data.frame or matrix. This does not perform any sorting on the data frame, just a flip.

iris %>% 
  head()

iris %>% 
  head() %>% 
  reverse()


jmbarbone/dirtyr documentation built on Sept. 23, 2020, 4:05 a.m.