check_purchases: Validate Drug Purchase Data

View source: R/check_purchases.R

check_purchasesR Documentation

Validate Drug Purchase Data

Description

This function checks the structure and content of drug purchase data (data.frame or data.table) for use in pre2dup workflows. It helps users detect errors in advance, such as missing or invalid records, incorrect formats, or dates outside a specified range. If all checks pass, the function can return a validated data.table with the required columns and proper types.

Usage

check_purchases(
  dt,
  pre_person_id = NULL,
  pre_atc = NULL,
  pre_package_id = NULL,
  pre_date = NULL,
  pre_ratio = NULL,
  pre_ddd = NULL,
  date_range = NULL,
  print_all = FALSE,
  return_data = FALSE
)

Arguments

dt

data.frame or data.table containing drug purchase records.

pre_person_id

Character. Column name for the person identifier.

pre_atc

Character. Column name for the ATC code.

pre_package_id

Character. Column name for the package identifier (e.g., Vnr in Nordic data).

pre_date

Character. Column name for the drug purchase date.

pre_ratio

Character. Column name for the amount of drug purchased: for whole packages, number of packages; for partial supplies, the proportion of a package (e.g., 0.5 for 14 tablets from a 28-tablet package).

pre_ddd

Character. Column name for defined daily dose (DDD) of the purchase.

date_range

Character vector of length 2. Date range for purchase dates (e.g., c("1995-01-01", "2018-12-31")). Default is NULL (no date range check).

print_all

Logical. If TRUE, all row numbers that caused warnings are printed; if FALSE, only the first 5 problematic rows are printed.

return_data

Logical. If TRUE and no errors are detected, returns a data.table with the validated columns and proper types. If FALSE, only a message is printed.

Details

The following checks are performed:

  • Existence and naming of required columns

  • No missing or duplicated records

  • Each package has a unique ATC code

  • Validity of person identifiers (numeric or non-numeric, no missing values)

  • Validity of ATC codes (no missing or invalid values)

  • Validity of package IDs and purchase ratio (numeric, no missing values)

  • DDD values: missing allowed, but not zero or negative

  • All purchase dates must be present, convertible, and within the specified range (if given)

  • Sufficient DDD coverage per ATC (with user confirmation if below threshold)

If any errors are found, the function stops execution and prints all error messages.

Value

If return_data = TRUE, returns a data.table containing only the validated columns, with converted types. If errors are detected, the function stops and prints error messages.

Examples

ID <- c(rep(100001, 3), rep(100002, 3))
ATC <- c(rep("N06AX11", 3), rep("N05AH03", 3))
vnr <- c(rep(48580, 3), rep(145698, 3))
dates <- as.Date(c("1998-07-04","1998-07-27","1998-08-28", "2000-01-12", "2000-02-05","2000-02-24"))
ratios <- c(0.5, 2, 2, 1, 0.5, 2)
ddds <- c(7.5, 30, 30, 28, 14, 56)
purchases <- data.frame(ID, ATC, vnr, dates, ratios, ddds)

check_purchases(
  dt = purchases,
  pre_person_id = "ID",
  pre_atc = "ATC",
  pre_package_id = "vnr",
  pre_date = "dates",
  pre_ratio = "ratios",
  pre_ddd = "ddds",
  date_range = c("1995-01-01", "2018-12-31"),
  print_all = TRUE,
  return_data = TRUE
)


piavat/PRE2DUP-R documentation built on June 11, 2025, 11:42 a.m.