impute_date_ymd: Impute Missing Components in Partial Date Strings

View source: R/impute_date.R

impute_date_ymdR Documentation

Impute Missing Components in Partial Date Strings

Description

This function imputes missing **month** and/or **day** components in partial date strings where the **year** is known. It assumes input dates are provided in the *ymd* format (year-month-day) and does not process datetime values or strings containing time components or non-date characters.

Usage

impute_date_ymd(
  data_frame,
  column_name,
  separator = "-",
  year = "UNKN",
  month = "UNK",
  day = "UN",
  min_max = "min",
  suffix = "_DT"
)

Arguments

data_frame

data frame

column_name

name of column that keeps dates to be imputed

separator

by default "-" it is a day-month-year separator, for example "2024-10-21" has "-" separator

year

by default "UNKN" - the format of unknown year

month

by default "UNK" - the format of unknown month

day

by default "UN" - the format of unknown day

min_max

by default "min". controlling imputation direction."min" - Impute the earliest possible date "max"' - Impute the latest possible date

suffix

by default "_DT" - new imputed date is named as source variable with suffix

Details

If the **year** is missing or explicitly marked as unknown (e.g., '"UNKN"'), the function returns 'NA'. When the **month** is missing, the function imputes **January (01)** as the default month. When the **day** is missing, it imputes the **first day of the month (01)**.

Any datetime strings (e.g., '"2025-01-NAT11:10:00"') must be preprocessed to remove the time component before applying this function (e.g., convert to '"2025-01-NA"').

In addition to imputing the date, the function creates an accompanying **flag variable** named as: '"<source_variable>_<suffix>F"'. This flag variable indicates the type of imputation performed:

  • 'NA' — No imputation was performed (the original date was complete or missing year).

  • '"D"' — The **day** component was imputed. The **month** component was imputed.

  • '"M"' — The **month** component were imputed.

  • '"D, M"' — Both **month** and **day** components were imputed.

Value

A data frame identical to the input, with an additional column representing the imputed values. The imputed column name is constructed by appending the suffix "_imputed" to the source variable name.

Author(s)

Lukasz Andrzejewski


datetoiso documentation built on Dec. 7, 2025, 9:06 a.m.