fix_dates: Clean up messy date columns

fix_dates

Clean up messy date columns



Cleans up a dataframe object which has date columns entered via a free-text box (possibly by different users) and are therefore in a non-standardized format. Supports numerous separators including /,-, or space. Supports all-numeric, abbreviation, or long-hand month notation. Where day of the month has not been supplied, the first day of the month is imputed. Either DMY or YMD is assumed by default. However, the US system of MDY is supported via the format argument.


  day.impute = 1,
  month.impute = 7,
  id = NULL,
  format = "dmy"



A dataframe or tibble object with messy date column(s)


Character vector of names of columns of messy date data


Integer. Day of the month to be imputed if not available. defaults to 1. If day.impute = NA then NA will be imputed for the date instead and a warning will be raised. If day.impute = NULL then instead of imputing the day of the month, the function will fail


Integer. Month to be be imputed if not available. Defaults to 7 (July). If month.impute = NA then NA will be imputed for the date instead and a warning will be raised. If month.impute = NULL then instead of imputing the month, the function will fail.


Name of column containing row IDs. By default, the first column is assumed.


Character. The format which a date is mostly likely to be given in. Either "dmy" (default) or "mdy". If year appears to have been given first, then YMD is assumed for the subject (format argument is not used for these observations)


A dataframe or tibble object. Dependent on the type of df. Selected columns are of type Date

See Also

fix_date Similar to fix_dates() except can only be applied to character objects.


bad.dates <- data.frame(
  id = seq(5),
  some.dates = c(
  some.more.dates = c(
    "jan 2020"
fixed.df <- fix_dates(bad.dates, c("some.dates", "some.more.dates"))
# ->
fixed.df <- fix_date_df(bad.dates, c("some.dates", "some.more.dates"))

