check_dates | R Documentation |
The resulting cleaning dictionary can be manually reviewed to fill in
appropriate replacement values for each non-valid date value, or a
missing-value keyword indicating that the value should be converted to NA
,
and then used with function clean_dates
.
Similar to check_numeric
, values are considered 'non-valid' if they
cannot be coerced using a given function. The default date-coercing function
is parse_dates
, which can handle a wide variety of date formats, but the
user could alternatively specify a simpler function like as.Date
. The
user may also specify additional expressions that would indicate a non-valid
date value. For example, the expression date_admit > Sys.Date()
could be
used to check for admission dates in the future.
check_dates(
x,
vars,
vars_id,
queries = list(),
dict_clean = NULL,
fn = parse_dates,
na = ".na",
populate_na = FALSE
)
x |
A data frame with one or more columns to check |
vars |
Names of date columns within |
vars_id |
Vector of one or more ID columns within |
queries |
Optional list of expressions to check for non-valid dates. May
include a list( date_admit > date_exit, # admission later than exit .x > Sys.Date() # any date in future ) |
dict_clean |
Optional dictionary of value-replacement pairs (e.g.
produced by a prior run of |
fn |
Function to parse raw date values. Defaults to |
na |
Keyword to use within column "replacement" for values that should
be converted to |
populate_na |
Logical indicating whether to pre-populate column
"replacement" with values specified by keyword |
Data frame representing a dictionary of non-valid values, to be used in a future data cleaning step (after specifying the corresponding replacement values). Columns include:
columns specified in vars_id
variable
: column name of date variable within x
value
: raw date value
date
: parsed date value
replacement
: correct value that should replace a given non-valid value
query
: which query was triggered by the given raw date value (if any)
Note that, unlike functions check_numeric
and check_categorical
,
which only return rows corresponding to non-valid values, this function
returns all date values corresponding to any observation (i.e. row) with at
least one non-valid date value. This is to provide context for the non-valid
value and aid in making the appropriate correction.
# load example dataset
data(ll1)
# basic output
check_dates(
ll1,
vars = c("date_onset", "date_admit", "date_exit"),
vars_id = "id"
)
# add additional queries to evaluate
check_dates(
ll1,
vars = c("date_onset", "date_admit", "date_exit"),
vars_id = "id",
queries = list(
date_onset > date_admit,
date_admit > date_exit,
.x > as.Date("2021-01-01")
)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.