datefixR

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(datefixR)

This vignette describes the functionality of datefixR in more detail than the README. DatefixR is a lightweight package consisting of a two user-accessible functions, fix_date_char() and fix_date_df(), which converts dates written in different formats into R's built-in Date class. The former is designed to modify character vectors whilst the latter is intended for modifying columns of a data frame (or tibble).

Practically, this package is most useful when handling date data which has been supplied via text boxes (instead of a date-specific input with a consistent date format). However, this package may also be useful to validate the format of date data (see date and month imputation).

Usage

Date standardization

Firstly, we will demonstrate date standardization without imputation. We consider a data frame with two columns of dates in differing formats with no missing data.

```{R, echo = TRUE} bad.dates <- data.frame( id = seq(5), some.dates = c( "02/05/92", "01-04-2020", "1996/05/01", "2020-05-01", "02-04-96" ), some.more.dates = c( "01 03 2015", "02/05/00", "01/05/1990", "03-Dec-2012", "02 April 2020" ) ) knitr::kable(bad.dates)

`fix_date_df()` requires two arguments, `df`, a data frame (or tibble) object
and `col.names`, a character vector containing the names of columns to be
standardized. By default, the first column of the data frame is assumed to
contain row IDs. These IDs are used if a warning or error is raised to assist
with locating the source of the error. The ID column can also be manually
provided via the `id` argument. 

The output from this function is a data frame object with the selected date
columns now belonging to the `Date` class. 

```{R}
fixed.dates <- fix_date_df(
  bad.dates,
  c("some.dates", "some.more.dates")
)
knitr::kable(fixed.dates)

datefixR can handle many different formats including -, /, or white space separation, year-first or day-first, and month supplied as a number, an abbreviation or full length name.

fix_date_char() is similar to fix_date_df() but only applies to a single character object.

fix_date_char("01 02 2014")

Localization

datefixR` supports dates being provided in English, Français (French), Deutsche (German), español (Spanish), or português (Portuguese) by recognizing both the names of the months in these languages and formatting customs. Expected languages do not need to be specified and can be provided just like any other date to be standardized. This can be seen by calling

fix_date_char("7 de septiembre de 2014")

Functions in datefixR assume day-first instead of month-first when day, month, and year are all given (unless year is given first). However this behavior can be modified by passing format = "mdy" to function calls.

fix_date_char("01 02 2014", format = "mdy")

Date and month imputation

By default, datefixR imputes missing months as July, and missing days of the month as the first day. As such, "1992" converts to

fix_date_char("1992")

The argument for defaulting to July is 1-2 July is halfway through the year (on a non leap year). Therefore assuming the year supplied is indeed correct, you are only a maximum of 6 months off from the true date. However, this behavior can be changed by supplying the day.impute and month.impute arguments with an integer corresponding to the desired day and month. For example, day.impute = 1 and month.impute = 1 results in the first day of January being imputed instead.

fix_date_char("1992", day.impute = 1, month.impute = 1)

The imputation mechanism can also be modified to impute NA if a month or day is missing by setting day.impute or month.impute to NA. This will also result in a warning being raised.

fix_date_char("1992", month.impute = NA)

Finally, imputation can be prevented by setting day.impute or month.impute to NULL. This will result in an error being raised if the day or month are missing respectively.

```{R, eval = FALSE} fix_date_char("1992", month.impute = NULL)

ERROR

`day.impute` and `month.impute` can also be passed to `fix_date_df()` for
similar functionality. 

```{R}
example.df <- data.frame(
  id = seq(1, 3),
  some.dates = c("2014", "April 1990", "Mar 19")
)
fix_date_df(example.df, "some.dates", day.impute = 1, month.impute = 1)

Citation

If you use this package in your research, please consider citing datefixR. An up-to-date citation can be obtained by running

citation("datefixR")


Try the datefixR package in your browser

Any scripts or data that you put into this service are public.

datefixR documentation built on Sept. 22, 2022, 5:06 p.m.