View source: R/check_purchases.R
check_purchases | R Documentation |
This function checks the structure and content of drug purchase data (data.frame or data.table) for use in pre2dup
workflows.
It helps users detect errors in advance, such as missing or invalid records, incorrect formats, or dates outside a specified range.
If all checks pass, the function can return a validated data.table with the required columns and proper types.
check_purchases(
dt,
pre_person_id = NULL,
pre_atc = NULL,
pre_package_id = NULL,
pre_date = NULL,
pre_ratio = NULL,
pre_ddd = NULL,
date_range = NULL,
print_all = FALSE,
return_data = FALSE
)
dt |
data.frame or data.table containing drug purchase records. |
pre_person_id |
Character. Column name for the person identifier. |
pre_atc |
Character. Column name for the ATC code. |
pre_package_id |
Character. Column name for the package identifier (e.g., Vnr in Nordic data). |
pre_date |
Character. Column name for the drug purchase date. |
pre_ratio |
Character. Column name for the amount of drug purchased: for whole packages, number of packages; for partial supplies, the proportion of a package (e.g., 0.5 for 14 tablets from a 28-tablet package). |
pre_ddd |
Character. Column name for defined daily dose (DDD) of the purchase. |
date_range |
Character vector of length 2. Date range for purchase dates (e.g., c("1995-01-01", "2018-12-31")). Default is NULL (no date range check). |
print_all |
Logical. If TRUE, all row numbers that caused warnings are printed; if FALSE, only the first 5 problematic rows are printed. |
return_data |
Logical. If TRUE and no errors are detected, returns a data.table with the validated columns and proper types. If FALSE, only a message is printed. |
The following checks are performed:
Existence and naming of required columns
No missing or duplicated records
Each package has a unique ATC code
Validity of person identifiers (numeric or non-numeric, no missing values)
Validity of ATC codes (no missing or invalid values)
Validity of package IDs and purchase ratio (numeric, no missing values)
DDD values: missing allowed, but not zero or negative
All purchase dates must be present, convertible, and within the specified range (if given)
Sufficient DDD coverage per ATC (with user confirmation if below threshold)
If any errors are found, the function stops execution and prints all error messages.
If return_data = TRUE
, returns a data.table containing only the validated columns, with converted types.
If errors are detected, the function stops and prints error messages.
ID <- c(rep(100001, 3), rep(100002, 3))
ATC <- c(rep("N06AX11", 3), rep("N05AH03", 3))
vnr <- c(rep(48580, 3), rep(145698, 3))
dates <- as.Date(c("1998-07-04","1998-07-27","1998-08-28", "2000-01-12", "2000-02-05","2000-02-24"))
ratios <- c(0.5, 2, 2, 1, 0.5, 2)
ddds <- c(7.5, 30, 30, 28, 14, 56)
purchases <- data.frame(ID, ATC, vnr, dates, ratios, ddds)
check_purchases(
dt = purchases,
pre_person_id = "ID",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "dates",
pre_ratio = "ratios",
pre_ddd = "ddds",
date_range = c("1995-01-01", "2018-12-31"),
print_all = TRUE,
return_data = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.