clean_dates: Clean date variables within a dataset based on a dictionary...

View source: R/clean_dates.R

clean_datesR Documentation

Clean date variables within a dataset based on a dictionary of value-replacement pairs

Description

Applies a dictionary of value-replacement pairs and a conversion function (defaults to parse_dates) to clean and standardize values of date variables. To use this approach the date columns of the original dataset should generally be imported as type "text" or "character" so that non-valid values are not automatically coerced to missing values on import.

Usage

clean_dates(x, vars, vars_id, dict_clean = NULL, fn = parse_dates, na = ".na")

Arguments

x

A data frame with one or more date columns to clean

vars

Names of date columns within x to clean

vars_id

Vector of one or more ID columns within x on which corrections should be conditional.

dict_clean

Optional dictionary of value-replacement pairs (e.g. produced by a prior run of check_dates). Must include columns "variable", "value", "replacement", and all columns specified by vars_id.

fn

Function to parse raw date values. Defaults to parse_dates.

na

Keyword to use within column "replacement" for values that should be converted to NA. Defaults to ".na". The keyword is used to distinguish between "replacement" values that are missing because they have yet to be manually verified, and values that have been verified and really should be converted to NA.

Value

The original data frame x but with cleaned versions of the date variables specified in argument vars

Examples

# load example dataset and cleaning dictionary
data(ll1)
data(clean_dates1)

# clean dates using only date coercion function
clean_dates(
  ll1,
  vars = c("date_onset", "date_admit", "date_exit"),
  vars_id = "id"
)

# clean dates using dictionary and coercion function
clean_dates(
  ll1,
  vars = c("date_onset", "date_admit", "date_exit"),
  vars_id = "id",
  dict_clean = clean_dates1
)


epicentre-msf/dbc documentation built on Oct. 24, 2023, 9:25 p.m.