check_hospitalizations: Validate Hospitalization Data

View source: R/check_hospitalizations.R

check_hospitalizationsR Documentation

Validate Hospitalization Data

Description

This function checks the structure and content of hospitalization data (data.frame or data.table) for use in pre2dup workflows. It validates required columns, data types, date consistency, and chronological logic (admission before discharge). If all checks pass, it can return a cleaned data.table with the required columns and types.

Usage

check_hospitalizations(
  dt,
  hosp_person_id = NULL,
  hosp_admission = NULL,
  hosp_discharge = NULL,
  date_range = NULL,
  print_all = FALSE,
  return_data = FALSE
)

Arguments

dt

data.frame or data.table containing hospitalization records.

hosp_person_id

Character. Column name for the person identifier.

hosp_admission

Character. Column name for hospital admission date.

hosp_discharge

Character. Column name for hospital discharge date.

date_range

Character vector of length 2. Date range for hospitalizations (e.g., c("1995-01-01", "2025-12-31")). Default is NULL (no date range check).

print_all

Logical. If TRUE, all row numbers that caused warnings are printed; if FALSE, only the first 5 problematic rows are printed.

return_data

Logical. If TRUE and no errors are detected, returns a data.table with the validated columns and proper types. If FALSE, only a message is printed.

Details

The following checks are performed:

  • Existence and naming of required columns

  • Validity of person identifiers (numeric or non-numeric, no missing values)

  • Admission and discharge dates are present and convertible to date

  • Admission date is strictly before discharge date

  • All dates are within the specified range (if given)

  • Overlapping hospitalizations are combined

If any errors are found, the function stops execution and prints all error messages.

Value

If return_data = TRUE, returns a data.table containing only the validated columns, with dates converted to integer and overlapping hospitalizations combined. If errors are detected, the function stops and prints error messages.

Examples

PID <- c(1, 1, 2, 2)
Entry <- c("2023-01-01", "2023-02-01", "2023-01-01", "2023-02-01")
Leave <- c("2023-01-15", "2023-02-15", "2023-01-10", "2023-02-10")
hospital_data <- data.frame(PID, Entry, Leave)

hospitalizations <- check_hospitalizations(
  hospital_data,
  hosp_person_id = "PID",
  hosp_admission = "Entry",
  hosp_discharge = "Leave",
  return_data = TRUE
)
hospitalizations


piavat/PRE2DUP-R documentation built on June 11, 2025, 11:42 a.m.