| impute_date | R Documentation |
This function imputes missing **month** and/or **day** components in partial date strings where the **year** is known. It assumes input dates are provided in either the *dmy* format (day-month-year) **or** the *ymd* format (year-month-day) and does not process datetime values or strings containing time components or non-date characters.
impute_date(
data_frame,
column_name,
date_format = "ymd",
separator = "-",
year = "UNKN",
month = "UNK",
day = "UN",
min_max = "min",
suffix = "_DT"
)
data_frame |
data frame |
column_name |
name of column that keeps dates to be imputed |
date_format |
by default "ymd". choose between ymd (if first year, then month then day) and dmy (if first day, then month then year) |
separator |
by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator |
year |
by default "UNKN" - the format of unknown year |
month |
by default "UNK" - the format of unknown month |
day |
by default "UN" - the format of unknown day |
min_max |
by default "min". controlling imputation direction."min" - Impute the earliest possible date "max"' - Impute the latest possible date |
suffix |
by default "_DT" - new imputed date is named as source variable with suffix |
If the **year** is missing or explicitly marked as unknown (e.g., '"UNKN"'), the function returns 'NA'. When the **month** is missing, the function imputes **January (01)** as the default month. When the **day** is missing, it imputes the **first day of the month (01)**.
Any datetime strings (e.g., '"NA-01-2025T11:10:00"') must be preprocessed to remove the time component before applying this function (e.g., convert to '"NA-01-2025"').
In addition to imputing the date, the function creates an accompanying **flag variable** named as: '"<source_variable>_<suffix>F"'. This flag variable indicates the type of imputation performed:
'NA' — No imputation was performed (the original date was complete).
'"D"' — The **day** component was imputed.
'"M"' — The **month** component were imputed.
'"D, M"' — Both **month** and **day** components were imputed.
A data frame identical to the input, with an additional column representing the imputed values. The imputed column name is constructed by appending the suffix "_imputed" to the source variable name.
Lukasz Andrzejewski
impute_date(data_frame = data.frame(K = c('2025 11 UN', '2025 UNK 23')),
column_name = "K", separator = " ")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.