#' Estimate Drug Use Periods from Drug Purchase Data
#'
#' Estimates drug use periods based on individual drug purchase data. Optionally, hospitalization data can be incorporated. The estimation uses package-specific and Anatomical Therapeutic Chemical (ATC) Classification code -level parameters. This function supports estimation for individuals with varied purchase patterns, including stockpiling behavior.
#'
#' @param pre_data data.frame or data.table containing drug purchases.
#' @param pre_person_id character, name of the column containing person id.
#' @param pre_atc character, name of the column containing ATC code.
#' @param pre_package_id character, name of the column containing package id.
#' @param pre_date character, name of the column containing purchase date.
#' @param pre_ratio character, name of the column containing ratio of packages purchased (e.g., number of packages).
#' @param pre_ddd character, name of the column containing defined daily doses (DDD) of the purchase.
#' @param package_parameters data.frame or data.table containing package parameters.
#' @param pack_atc character, name of the column containing ATC code.
#' @param pack_id character, name of the column containing package id.
#' @param pack_ddd_low character, name of the column containing lower limit of daily DDD.
#' @param pack_ddd_usual character, name of the column containing usual daily DDD.
#' @param pack_dur_min character, name of the column containing minimum duration of the package.
#' @param pack_dur_usual character, name of the column containing usual duration of the package.
#' @param pack_dur_max character, name of the column containing maximum duration of the package.
#' @param atc_parameters data.frame or data.table containing ATC parameters.
#' @param atc_class character, name of the column containing ATC class.
#' @param atc_ddd_low character, name of the column containing lower limit of daily DDD for the ATC class.
#' @param atc_ddd_usual character, name of the column containing usual daily DDD for the ATC class.
#' @param atc_dur_min character, name of the column containing minimum duration for the ATC class.
#' @param atc_dur_max character, name of the column containing maximum duration for the ATC class.
#' @param hosp_data data.frame or data.table containing hospitalizations.
#' @param hosp_person_id character, name of the column containing person id.
#' @param hosp_admission character, name of the column containing admission date.
#' @param hosp_discharge character, name of the column containing discharge date.
#' @param date_range character, vector of two dates, start and end of the purchase data.
#' @param global_gap_max numeric, maximum gap between purchases, default 300..
#' @param global_min numeric, minimum duration of a purchase, default 5.
#' @param global_max numeric, maximum duration of a purchase, default 300.
#' @param global_max_single numeric, maximum duration of a single purchase, default 150.
#' @param global_ddd_high numeric, maximum daily DDD for a purchase per day for any ATC, default 10.
#' @param global_hosp_max numeric, maximum number of hospital days to be considered when estimating the exposure duration, default 30.
#' @param days_covered numeric, maximum number of days to be added to the exposure duration to cover the gap between purchases, default 5.
#' @param weight_past numeric, weight for the past purchase in sliding average calculation, default 1.
#' @param weight_current numeric, weight for the current purchase in sliding average calculation, default 4.
#' @param weight_next numeric, weight for the next purchase in sliding average calculation, default 1.
#' @param weight_first_last numeric, weight for the first and last purchase in sliding average calculation, default 5.
#' @param calculate_pack_dur_usual TRUE or FALSE, re-calculate usual duration of the package based on the purchase frequency in data, default FALSE.
#' @param post_process_perc numeric, percentage of the data to be used in post-processing, default 1.
#'
#' @return a list of two elements. Main element is \code{periods}: a data.table with one row per drug use period, including person, ATC, period start/end dates, duration, number of purchases, and total DDD. If \code{calculate_pack_dur_usual = TRUE}, an additional element \code{pack_info} contains updated package parameter information.
#'
#' @details
#'
#' Before starting to estimate the drug use periods, the function validates the input data and arguments by checking for missing values and unacceptable duplicates.
#' It will stop execution if such issues are detected, with the following exceptions:
#'
#' \itemize{
#' \item Up to 10\% of missing DDD values per ATC class in the drug purchase data is allowed.
#' \item Up to 10\% of missing package parameter records per ATC class is allowed.
#' }
#' If either threshold is exceeded, the function prompts the user to decide whether to continue.
#' If the user agrees, ATC classes with insufficient data are excluded, and the function proceeds with the remaining data.
#'
#' There are five available methods for estimating the duration of each purchase, presented in the order of preference:
#' \itemize{
#' \item Main method: Based on purchased daily doses (DDDs), temporal average of daily DDDs, and individual purchase patterns.
#' \item Package DDD method: Based on purchased DDDs and the usual daily DDD for the specific package.
#' \item Package duration method: Based on the usual duration of the package, considering the proportion of the package purchased.
#' \item ATC-level DDD method: Based on purchased DDDs and usual daily DDDs at the ATC level.
#' \item Minimum ATC duration method: Based on the minimum duration defined for the ATC group.
#' }
#'
#' Periods that are close in time can be joined in a post-processing step controlled by \code{post_process_perc}. Post processing percentage reduces by 0.1 at each estimation round to prevent very long calculation times for large datasets.
#'
#' In addition to estimating drug use periods, the function can also calculate common package durations from the purchase data.
#' These calculated durations can be used to verify and adjust the usual duration parameters of packages.
#' After making corrections, re-run the function to recalculate drug use periods using the updated package parameters.
#'
#' @import data.table
#'
#' @seealso
#' Each data type has their own check functions. \code{pre2dup} runs the checks internally, but checking the validity before running the program is recommended for faster and easier error detection and handling.
#'
#' \code{\link{check_purchases}}, \code{\link{check_hospitalizations}}, \code{\link{check_package_parameters}}, \code{\link{check_atc_parameters}}
#'
#' @examples
#' period_data <-pre2dup(pre_data = purchases_example, pre_person_id = "id",
#' pre_atc = "ATC", pre_package_id = "vnr", pre_date = "purchase_date",
#' pre_ratio = "n_packages", pre_ddd = "amount",
#' package_parameters = package_parameters_example,
#' pack_atc = "ATC", pack_id = "vnr", pack_ddd_low = "lower_ddd",
#' pack_ddd_usual ="usual_ddd", pack_dur_min = "minimum_dur",
#' pack_dur_usual = "usual_dur", pack_dur_max = "maximum_dur",
#' atc_parameters = ATC_parameters, atc_class = "partial_atc",
#' atc_ddd_low = "lower_ddd_atc", atc_ddd_usual = "usual_ddd_atc",
#' atc_dur_min = "minimum_dur_atc", atc_dur_max = "maximum_dur_atc",
#' hosp_data = hospitalizations_example, hosp_person_id = "id",
#' hosp_admission = "hospital_start", hosp_discharge = "hospital_end",
#' date_range = c("2025-01-01", "2025-12-31"),
#' global_gap_max = 300, global_min = 5, global_max = 300,
#' global_max_single = 150, global_ddd_high = 10,
#' global_hosp_max = 30, weight_past = 1, weight_current = 4,
#' weight_next = 1, weight_first_last = 5,
#' calculate_pack_dur_usual = TRUE,
#' days_covered = 5,
#' post_process_perc = 1)
#'
#'period_data$periods
#'
#' @export
pre2dup <- function(pre_data,
pre_person_id,
pre_atc,
pre_package_id,
pre_date,
pre_ratio,
pre_ddd,
package_parameters,
pack_atc,
pack_id,
pack_ddd_low,
pack_ddd_usual,
pack_dur_min,
pack_dur_usual,
pack_dur_max,
atc_parameters,
atc_class,
atc_ddd_low,
atc_ddd_usual,
atc_dur_min,
atc_dur_max,
hosp_data = NULL,
hosp_person_id = NULL,
hosp_admission = NULL,
hosp_discharge = NULL,
date_range = NULL,
global_gap_max = 300,
global_min = 5,
global_max = 300,
global_max_single = 150,
global_ddd_high = 10,
global_hosp_max = 30,
days_covered = 5,
weight_past = 1,
weight_current = 4,
weight_next = 1,
weight_first_last = 5,
calculate_pack_dur_usual = FALSE,
post_process_perc = 1) {
# Dataset checks ----
## Check global arguments --------------------------------------------------
message("Step 1/6: Checking parameters and datasets...")
.check_positive <- function(numvar) {
if (!is.numeric(numvar) || numvar < 0) {
err_mess <- "Input non numeric or below 0 in argument"
stop(paste(err_mess, sQuote(substitute(numvar))), call. = FALSE)
}
}
.check_positive(global_gap_max)
.check_positive(global_min)
.check_positive(global_max)
.check_positive(global_max_single)
.check_positive(global_ddd_high)
.check_positive(weight_past)
.check_positive(weight_current)
.check_positive(weight_next)
.check_positive(weight_first_last)
.check_positive(days_covered)
.check_positive(post_process_perc)
if (!is.null(hosp_data))
.check_positive(global_hosp_max)
if (global_max <= global_min)
stop(
"Expected Global maximum (global_max) longer than global minimum (global_min).",
call. = FALSE
)
if (global_max > global_gap_max)
stop(
"Expected Maximum distance (global_gap_max) to be longer than or equal to global maximum (global_max).",
call. = FALSE
)
if (global_max_single > global_max)
stop(
"Expected Global maximum of single purchase (global_max_single) shorter than or equal to Global maximum (global_max).",
call. = FALSE
)
if (!is.logical(calculate_pack_dur_usual) ||
is.na(calculate_pack_dur_usual)) {
stop("Argument 'calculate_pack_dur_usual' must be either TRUE or FALSE.",
call. = FALSE)
}
## Check and modify datasets -----------------------------------------
pre_data <- check_purchases(
pre_data,
pre_person_id = pre_person_id,
pre_atc = pre_atc,
pre_package_id = pre_package_id,
pre_date = pre_date,
pre_ratio = pre_ratio,
pre_ddd = pre_ddd,
date_range = date_range,
return_data = TRUE
)
if (nrow(pre_data) == 0)
stop("No records left after deleting ATCs without sufficient level of DDD records.",
call. = FALSE)
# After checks, rename columns for easier internal use (data was handled in check)
oldnames <- c(pre_person_id, pre_atc, pre_package_id, pre_ratio, pre_ddd)
newnames <- c("pid_pre", "atc_pre", "packid_pre", "ratio_pre", "ddd_pre")
setnames(pre_data, oldnames, newnames)
# Package parameters
if (!is.null(package_parameters) &
isTRUE(calculate_pack_dur_usual))
package_parameters_orig <- package_parameters
package_parameters <- check_package_parameters(
dt = package_parameters,
pack_atc = pack_atc,
pack_id = pack_id,
pack_ddd_low = pack_ddd_low,
pack_ddd_usual = pack_ddd_usual,
pack_dur_min = pack_dur_min,
pack_dur_usual = pack_dur_usual,
pack_dur_max = pack_dur_max,
return_data = TRUE
)
# After copy and checks, rename columns for easier internal use
oldnames <- c(
pack_atc,
pack_id,
pack_ddd_low,
pack_ddd_usual,
pack_dur_min,
pack_dur_usual,
pack_dur_max
)
newnames <- c(
"atc_code",
"pack_no",
"ddd_low_pack",
"ddd_usual_pack",
"dur_min_pack",
"dur_usual_pack",
"dur_max_pack"
)
setnames(package_parameters, oldnames, newnames)
atc_parameters <- check_atc_parameters(
dt = atc_parameters,
atc_class = atc_class,
atc_ddd_low = atc_ddd_low,
atc_ddd_usual = atc_ddd_usual,
atc_dur_min = atc_dur_min,
atc_dur_max = atc_dur_max,
return_data = TRUE
)
# After checks, rename columns for easier internal use
oldnames <- c(atc_class,
atc_ddd_low,
atc_ddd_usual,
atc_dur_min,
atc_dur_max)
newnames <- c("part_atc",
"ddd_low_atc",
"ddd_usual_atc",
"dur_min_atc",
"dur_max_atc")
setnames(atc_parameters, oldnames, newnames)
if (!is.null(hosp_data)) {
# Validate hospital data
hosp_data <- check_hospitalizations(
hosp_data,
hosp_person_id = hosp_person_id,
hosp_admission = hosp_admission,
hosp_discharge = hosp_discharge,
return_data = TRUE
)
}
## Find multiple ATCs combining purchases and parameters ----
pre_pairs <- pre_data[, .(packid = packid_pre, atc = atc_pre)]
pack_pairs <- package_parameters[, .(packid = pack_no, atc = atc_code)]
pre_pack_pairs <- rbind(pre_pairs, pack_pairs)
pre_pack_pairs <- unique(pre_pack_pairs, by = names(pre_pack_pairs))
find_multiple_atcs(atc_col = pre_pack_pairs$atc, package_col = pre_pack_pairs$packid)
# Prepare data ------------------------------------------------------------
# Combine same packages of the day ----
pre_data[is.na(ddd_pre), ddd_pre := 0]
# Group by person, ATC, date and package
pre_data[, bl_pack := as.integer(factor(paste(
pid_pre, atc_pre, date_pre, packid_pre, sep = "_"
)))]
pre_data[, let(ddd_pre = sum(ddd_pre),
ratio_pre = sum(ratio_pre)), by = list(bl_pack)]
# Aggregate
pre_data <- unique(pre_data, by = c("bl_pack"))
# Merge datasets ----------------------------------------------------------
# Find the closest matching (partial) in ATC parameters and prescriptions and
# add it to prescriptions for merging
.find_closest_atc <- function(atc, opt_atcs) {
while (nchar(atc) > 0) {
if (atc %in% opt_atcs) {
return(atc)
}
atc <- substr(atc, 1, nchar(atc) - 1)
}
return(NA)
}
opt_atcs <- atc_parameters$part_atc
pre_data[, atc_short := sapply(atc_pre, .find_closest_atc, opt_atcs)]
if (pre_data[is.na(atc_short), .N] > 0)
stop(
"Every ATC class in drug purhcases should exist at least with one character level in ATC parameters. Please check the ATC parameters and try again.",
call. = FALSE
)
## Merge files -------------------------------------------------------
pre_data <- merge(
pre_data,
package_parameters[, .(pack_no,
ddd_low_pack,
ddd_usual_pack,
dur_min_pack,
dur_usual_pack,
dur_max_pack)],
by.x = "packid_pre",
by.y = "pack_no",
all.x = TRUE
)
pre_data <- merge(
pre_data,
atc_parameters,
by.x = "atc_short",
by.y = "part_atc",
all.x = TRUE
)
# Check how many purchases have DDDs recorded or vnr-info (using usual DDD) ----
package_info_covr <- check_coverage(pre_data$atc_pre,
pre_data$dur_usual_pack,
limit = limit,
opti = "missing")
if (!is.null(package_info_covr)) {
emessage <- paste0(
"Coverage of package parameter information in less than approved (",
limit,
"%) in following ATC(s): ",
package_info_covr,
"."
)
message(emessage)
continue <- readline(prompt = atc_question)
if (tolower(continue) != "y") {
stop("Process interrupted by user.", call. = FALSE)
} else {
atc_drops <- extract_atc_codes(package_info_covr)
pre_data <- pre_data[atc_pre %notin% atc_drops, ]
message(
"Excluded ATC(s) with insufficient package parameter information: ",
paste(atc_drops, collapse = ", "),
". Process continues with the rest of the data."
)
}
}
if (nrow(pre_data) == 0)
stop(
"No records left after deleting ATCs without sufficient package parameter information.",
call. = FALSE
)
# pid_atc_date is unit for persons' purchases for the same ATC per day
pre_data[, pid_atc_date := as.integer(factor(paste(pid_pre, atc_pre, date_pre, sep = "_")))]
message("Step 2/6: Calculating purchase durations...")
### Set DDD and duration limits ----
# If any of package based DDD lower limit is missing,
# take the ATC based DDD lower limit for common limit
pre_data[, min_ddd := ifelse(all(!is.na(ddd_low_pack)), min(ddd_low_pack), ddd_low_atc), by = pid_atc_date]
# Minimum duration can be calculated based on either package minimum duration (preferred)
# or ATC minimum duration (always available). If several packages per day,
# then take the maximum of minimum durations.
pre_data[, min_dur := ifelse(!is.na(dur_min_pack),
ratio_pre * dur_min_pack,
ratio_pre * dur_min_atc)]
pre_data[, min_dur := max(min_dur), by = pid_atc_date]
# Because the durations were multiplied, they can be
# shorter than global minimum or longer than
# global maximum, reject:
pre_data[, min_dur := pmax(pmin(min_dur, global_max), global_min)]
# Maximum duration is the maximum of maximum durations.
# First calculate the maximum duration for each package:
# Use package duration, if available,
# otherwise use ATC duration.
# Either will be multiplied with the ratio.
pre_data[, max_dur := ifelse(!is.na(dur_max_pack),
ratio_pre * dur_max_pack,
ratio_pre * dur_max_atc)]
# Take the maximum of all days packages
pre_data[, max_dur := max(max_dur), by = pid_atc_date]
# Because the durations were multiplied, they can be
# shorter than global minimum or longer than
# global maximum, limit:
pre_data[, max_dur := pmin(pmax(max_dur, global_min), global_max)]
# Count package based ERFLs --------------------------------------------------
.count_erfl <- function(rat,
usu_dur,
ddd,
usu_ddd_p,
usu_ddd_a,
min_d_a) {
# Package's daily DDD based ERFLs are calculated based on the following rules:
# usual duration of all packages are available
# purchases DDD is available ( = greater than 0) for all rows
if (all(!is.na(usu_ddd_p) & all(ddd > 0))) {
method <- "erfl_ddd_pack"
value <- max(rat) / sum(rat) * sum(ddd / usu_ddd_p)
# If DDDs or usual DDD of all packages are not available
# use packages' usual durations
} else if (all(!is.na(usu_dur))) {
method <- "erfl_dur_pack"
value <- max(rat) / sum(rat) * sum(rat * usu_dur)
# If DDDs are available but package parameters are not available
# calculate using usual ATC DDDs
} else if (all(!is.na(usu_ddd_a) & all(ddd > 0))) {
method <- "erfl_ddd_atc"
value <- sum(ddd) / unique(usu_ddd_a)
} else if (all(!is.na(min_d_a))) {
# Return 'min_d_a' if other conditions are not met
method <- "erfl_min_atc"
value <- unique(min_d_a)
}
return(list(method = method, value = value))
}
pre_data[, c("erfl_met", "erfl_pack_based") := .count_erfl(
rat = ratio_pre,
usu_dur = dur_usual_pack,
ddd = ddd_pre,
usu_ddd_p = ddd_usual_pack,
usu_ddd_a = ddd_usual_atc,
min_d_a = dur_min_atc
), by = pid_atc_date]
### Aggregate to person - ATC - date ----
pre_data[, ddd_pre := sum(ddd_pre), by = pid_atc_date]
# Count number of different packages per day,
# if several, then will be omitted from mode calculation
pre_data[, n_vnr := .N, by = pid_atc_date]
pre_data <- unique(pre_data, by = "pid_atc_date")
# ### Count time between purchases and initialize block ----
setorder(pre_data, pid_pre, atc_pre, date_pre) # Sort by person, ATC and date
# Init group by pid and ATC
pre_data[, bl_init := as.integer(factor(paste(pid_pre, atc_pre, sep = "_")))]
pre_data[, btw := date_pre - shift(date_pre, type = "lag", fill = date_pre[1L]), by = bl_init]
# Define periods start points
setorder(pre_data, bl_init, pid_pre, atc_pre)
pre_data[, let(
pid_lag = shift(pid_pre, type = "lag"),
atc_lag = shift(atc_pre, type = "lag")
), by = bl_init]
pre_data[, startp := pid_lag != pid_pre | atc_lag != atc_pre]
pre_data[is.na(startp), startp := TRUE]
# Set blocks based on global gap
pre_data[, startp := ifelse(btw > global_gap_max, TRUE, startp)]
pre_data[, block := cumsum(startp)]
pre_data[, block_i := seq_len(.N), by = block]
pre_data[, block_length := .N, by = block]
# ERFL based on sliding avergage ------------------------------------------
### Count sliding average and personal purchase pattern ----
.count_sliding_average <- function(ddd,
time,
w_past,
w_present,
w_future,
w_limit,
max_refill) {
if (length(ddd) < 2) {
return(NA)
} else {
# Estimate the time for the last purchase
time_est <- min(ddd[length(ddd)] / (ddd[length(ddd) - 1] / time[length(ddd)]), max_refill)
sliding_average <- numeric(length(ddd))
for (i in 1:length(ddd)) {
if (i == 1) {
# Handle the first element
sliding_average[i] <- sum(w_limit * ddd[i], w_future * ddd[i + 1], na.rm = TRUE) /
sum(w_limit * time[i + 1], w_future * time[i + 2], na.rm = TRUE)
} else if (i > 1 && i < length(ddd)) {
# Handle the middle elements
use_time <- ifelse(i + 2 > length(ddd), time_est, time[i + 2])
sliding_average[i] <- sum(w_past * ddd[i - 1], w_present * ddd[i], w_future * ddd[i + 1], na.rm = TRUE) /
sum(w_past * time[i],
w_present * time[i + 1],
w_future * use_time,
na.rm = TRUE)
} else {
# Handle the last element
sliding_average[i] <- sum(w_limit * ddd[i], ddd[i - 1], na.rm = TRUE) /
sum(w_limit * time_est, time[i], na.rm = TRUE)
}
}
sliding_average
}
}
# Most operations are for purchases with at least two purchases
idx <- pre_data$block_length > 2
# Calculate the sliding average for the purchases
pre_data[idx, sliding_average := .count_sliding_average(
ddd = ddd_pre,
time = btw,
w_past = weight_past,
w_present = weight_current,
w_future = weight_next,
w_limit = weight_first_last,
max_refill = global_max
), by = block]
### Relative standard deviation ----
# Limit the sliding average to 99
pre_data[, sliding_average := pmin(sliding_average, 99)]
.f_rsd <- function(x) {
pmin(0.5, pmax(0.2, sqrt(sum(((x - mean(x))^2) / (length(x))
)) / mean(x)))
}
pre_data[idx , rsd := .f_rsd(sliding_average), by = block]
## Erfl_ddd ----
pre_data[, sliding_average := pmin(pmax(sliding_average, min_ddd), global_ddd_high)]
pre_data[, erfl_ddd := ddd_pre * (1 + rsd) / sliding_average]
# # Choose ERFL and limit to min and max ---------------------------------------
pre_data[, erfl := ifelse(pre_ddd > 0 &
!is.na(erfl_ddd) & erfl_ddd > 0 ,
erfl_ddd,
erfl_pack_based)]
# Save ERFL methods
pre_data[, erfl_method := ifelse(pre_ddd > 0 &
!is.na(erfl_ddd) &
erfl_ddd > 0,
"erfl_ddd",
erfl_met)]
# Number/proportion of packages shall not impact the ERFL based on ATC minimum
# others will be limited to min and max
pre_data[, erfl := ifelse(erfl_method == "erfl_min_atc", erfl, pmin(pmax(min_dur, erfl), max_dur))]
# If ERFL doesn't reach the next purcase, restart period ----------------
pre_data[shift(erfl, type = "lag") < btw &
!(startp), startp := T]
# # Check if extended ERFL reaches the next purchase ---------------------------
### Count hospital days between purchases ----
pre_data[, purc_end := date_pre + erfl]
setorder(pre_data, pid_pre, atc_pre, date_pre)
pre_data[, next_purchase := shift(date_pre, type = "lead"), by = list(pid_pre, atc_pre)]
if (!is.null(hosp_data)) {
hospital_times <- calc_hosp_days(
pre_id = pre_data$pid_pre,
pre_atc = pre_data$atc_pre,
curr_purc_start = pre_data$date_pre,
exp_end = pre_data$purc_end,
next_start = pre_data$next_purchase,
hosp_id = hosp_data$pid_hosp,
hosp_in = hosp_data$admission_date,
hosp_out = hosp_data$discharge_date
)
# Hospital days between purchases
pre_data$hosp_days_any <- hospital_times$hosp_days_all
# Hospital days that started during exposure
pre_data$hosp_days <- hospital_times$hosp_days_exp
} else {
# If no hospital data, set to 0
pre_data[, hosp_days := 0]
pre_data[, hosp_days_any := 0]
}
### Extend over gap if hospitalizations and/or allowing few extra days covers the gap ----
pre_data[, erfl_extended := erfl + pmin(hosp_days, global_hosp_max) + days_covered, by = block]
pre_data[startp == TRUE &
btw > 0 &
btw <= global_gap_max &
shift(erfl_extended, type = "lag") >= btw, startp := FALSE]
# Stockpiling -------------------------------------------------------------
message("Step 3/6: Stockpiling assessment...")
# Shift previous and next temporal averages
pre_data[, let(
pre_ta = shift(sliding_average, type = "lag"),
pas_ta = shift(sliding_average, type = "lead")
), by = block]
# Find dropping temporal average doses
pre_data[idx, drop := fifelse(
!is.na(pre_ta) & !is.na(pas_ta) &
pre_ta > sliding_average &
(
pas_ta > sliding_average |
(sliding_average > pas_ta & block_i == (block_length - 1))
),
TRUE,
FALSE
), by = block]
# Detect possible stockpiling
pre_data[idx, pos_stock := fifelse(drop == TRUE &
shift(startp, type = "lead") == TRUE, TRUE, FALSE), by = block]
# Combine purchased total DDDs and times between purchases
pre_data[idx, let(
sp_ddd = shift(ddd_pre, type = "lag") + ddd_pre,
sp_btw = btw + shift(btw, type = "lead")
), by = block]
# Calculate sliding average for stockpiling purchases
pre_data[idx, sp_erfl := sp_ddd * (1 + rsd) / sliding_average]
# Maximum of stockpiling ERFLs is the summed maximum of the connected purchases
pre_data[idx, sp_max := shift(max_dur, type = "lag") + max_dur]
pre_data[idx, sp_erfl := pmin(sp_erfl, sp_max), by = block]
# Count hospital days between purchases
pre_data[idx, sp_hosp_days := shift(hosp_days, type = "lag") + hosp_days
, by = block]
pre_data[idx, sp_erfl_extended := sp_erfl +
pmin(sp_hosp_days, global_hosp_max) + days_covered]
# Final decision on stockpiling
pre_data[idx, stock := fifelse(pos_stock == TRUE &
round(sp_erfl_extended) >= sp_btw,
TRUE,
FALSE)]
# Continue period if stockpiling
pre_data[shift(stock, type = "lag"), startp := FALSE, by = block]
# Create periods ----------------------------------------------------------
pre_data[, period := cumsum(startp)]
pre_data[, period_length := .N, by = period]
# Index of last purchase in period
lidx <- pre_data[, .I[.N], by = .(period)]$V1
# Re-calculate package durations ------------------------------------------
if (calculate_pack_dur_usual) {
message("Step 4/6: Calculating common package durations in data...")
# Pick valid purchases for calculation of times between purchases
# At least six purchases in period
pre_data[, valid := ifelse(period_length > 5, TRUE, FALSE)]
# hospital days
pre_data[hosp_days_any > 0, valid := FALSE]
# If last, time till next purchase is not valid
pre_data[lidx, valid := FALSE]
# Only one kind of packages per ATC
pre_data[n_vnr > 1, valid := FALSE]
# Only whole packages
pre_data[ratio_pre %% 1 != 0, valid := FALSE]
# At least five users for package in valid data
pre_data[valid == TRUE, n_users := length(unique(pid_pre)), by = packid_pre]
pre_data[n_users < 5, valid := FALSE]
# Make temporary data with valid purchases
# Lift the duration
pre_data[, btw_next := shift(btw, type = "lead"), by = period]
tmp <- pre_data[valid == TRUE, .(packid_pre, btw_next, ratio_pre)]
# Calculate the mode of the refill lengths
if (nrow(tmp) > 0) {
modes <- mode_calculation(
pack_id = tmp$packid_pre,
pre_ratio = tmp$ratio_pre,
dur = tmp$btw_next
)
# If package parameters have already a column named common_duration,
# rename it to avoid overwriting. Data might be data.frame or tibble,
# Use base R for renaming
if ("common_duration" %in% names(package_parameters_orig)) {
message(
"Column name 'common_duration' already exists in package parameters. Renaming to 'common_duration.x'."
)
names(package_parameters_orig)[grep("common_duration", names(package_parameters_orig))] <- paste0(names(package_parameters_orig)[grep("common_duration", names(package_parameters_orig))], ".x")
}
# Report the packages in modes that are not in package parameters
param_packs <- package_parameters_orig[[pack_id]]
missing_packs <- modes[!pack_id %in% param_packs]
n <- nrow(missing_packs)
if (n > 0) {
id_text <- paste0(missing_packs$pack_id,
" (",
missing_packs$common_duration,
")")
id_list <- paste(id_text, collapse = ", ")
message(
paste0(
"Common duration was calculated for ",
n,
" package(s) not listed in package parameters; package ID (common duration in days): ",
id_list,
"."
)
)
modes <- modes[pack_id %in% param_packs]
rm(missing_packs, id_text, id_list)
}
# Merge modes to package parameters
package_parameters_new <- merge(
package_parameters_orig,
modes[, .(pack_id, common_duration)],
by.x = pack_id,
by.y = "pack_id",
all = TRUE
)
# Don't let the new values be smaller or greater than the original minimum and maximum
package_parameters_new[["common_duration"]] <- with(
package_parameters_new,
ifelse(
!is.na(common_duration) & common_duration < get(pack_dur_min),
get(pack_dur_min),
common_duration
)
)
package_parameters_new[["common_duration"]] <- with(
package_parameters_new,
ifelse(
!is.na(common_duration) & common_duration > get(pack_dur_max),
get(pack_dur_max),
common_duration
)
)
} else {
package_parameters_new <- NULL
}
}
if (calculate_pack_dur_usual == T &&
is.null(package_parameters_new))
message("Refill lengths couldn't be re-estimated, probably due to too small data size.")
if (isFALSE(calculate_pack_dur_usual))
message(
"Step 4/6: Common package duration calculation was not selected in the parameters; skipping this step."
)
#####################
# Prepare exposure periods ------------------------------------------------
message("Step 5/6: Preparing drug use periods...")
pre_data[, let(
dup_start = min(date_pre),
dup_last_purchase = max(date_pre),
dup_n_purchases = .N
), by = period]
# Calculate end dates of drug use periods ----
pre_data[lidx, duration := ifelse(dup_n_purchases > 2 & !is.na(rsd) &
ddd_pre > 0, (ddd_pre *
(1 + rsd) / (sliding_average * (
1 + exp(-dup_n_purchases)
))), erfl)]
# ### Limit between minimum and maximum ----
pre_data[lidx, duration := pmin(pmax(duration, min_dur), max_dur)]
pre_data[, duration := ifelse(dup_n_purchases == 1,
pmin(duration, global_max_single),
duration)]
pre_data[lidx, last_ends := dup_last_purchase + duration]
# Calculate hospital days for hospitalizatios started during exposure
if (!is.null(hosp_data)) {
pre_data[lidx, let(
hosp_days_end = calc_hosp_days(
pre_id = pid_pre,
pre_atc = atc_pre,
curr_purc_start = dup_last_purchase,
exp_end = last_ends,
next_start = NA_integer_,
hosp_id = hosp_data$pid_hosp,
hosp_in = hosp_data$admission_date,
hosp_out = hosp_data$discharge_date
)$hosp_days_exp
)]
} else
pre_data[, hosp_days_end := 0]
# Add hospital days to exposure time, limited by allowed maximum
pre_data[lidx, last_dur_hosp := duration + pmin(hosp_days_end, global_hosp_max)]
pre_data[lidx, dup_end := dup_last_purchase + last_dur_hosp]
message("Step 6/6: Post-processing drug use periods...")
# Combine periods if gap is small or end exceeds the next period start
combinable <- TRUE
while (combinable) {
message(paste("Current post processing percentage:", post_process_perc))
pre_data[lidx, prev_start := shift(dup_start, type = "lag"), by = .(pid_pre, atc_pre)]
pre_data[lidx, prev_end := shift(dup_end, type = "lag"), by = .(pid_pre, atc_pre)]
pre_data[lidx, combine_to_past := ifelse((dup_start - prev_end) <= post_process_perc / 100 * (prev_end -
prev_start),
1,
0)]
pre_data[, combine_to_past := ifelse(is.na(combine_to_past), 0, combine_to_past)]
pre_data[, combine_to_past := max(combine_to_past), by = period]
# To avoid excess looping, reduce the percentage by 0.1 each time
post_process_perc <- post_process_perc - 0.1
if (nrow(pre_data[combine_to_past == 1]) == 0) {
combinable <- FALSE
} else {
pre_data[combine_to_past == 1 , let(period = NA_integer_, dup_start = NA_integer_)]
pre_data[, period := nafill(period, type = "locf"), by = .(pid_pre, atc_pre)]
pre_data[, dup_start := nafill(dup_start, type = "locf"), by = .(pid_pre, atc_pre)]
lidx <- pre_data[, .I[.N], by = .(period)]$V1
pre_data[, combine_to_past := NULL]
}
}
# Calculate hospital days for the period
# Collect all hospital days between purchases from other than last purchase
pre_data[!lidx, dup_hosps := hosp_days_any]
# From last purchase, use the hospital days that started during exposure
pre_data[lidx, dup_hosps := pmin(hosp_days_end, global_hosp_max)]
pre_data[, dup_hospital_days := sum(dup_hosps), by = period]
pre_data[, let(
dup_n_purchases = .N,
dup_last_purchase = max(date_pre),
dup_total_DDD = sum(ddd_pre)
), by = period]
# Collect periods
pre2dupdata <- pre_data[lidx, .(
period,
pid_pre,
atc_pre,
dup_start,
dup_end,
dup_hospital_days,
dup_n_purchases,
dup_last_purchase,
dup_total_DDD
)]
pre2dupdata[, dup_days := round(dup_end - dup_start)]
pre2dupdata[, dup_temporal_average_DDDs := round(dup_total_DDD / dup_days, 3)]
# Period number is messed, make new.
pre2dupdata[, period := 1:.N]
cols <- c("dup_start", "dup_last_purchase", "dup_end")
pre2dupdata[, (cols) := lapply(.SD, as.Date, origin = "1970-01-01"), .SDcols = cols]
npids <- length(unique(pre2dupdata$pid_pre))
setcolorder(
pre2dupdata,
c(
"period",
"pid_pre",
"atc_pre",
"dup_start",
"dup_end",
"dup_days",
"dup_hospital_days",
"dup_n_purchases",
"dup_last_purchase",
"dup_total_DDD",
"dup_temporal_average_DDDs"
)
)
setnames(pre2dupdata,
c("pid_pre", "atc_pre"),
c(pre_person_id, pre_atc))
# Inform user
message(
"Drug use periods calculated. ",
nrow(pre2dupdata),
" periods created for ",
npids,
" persons."
)
# Return results ----------------------------------------------------------
if (calculate_pack_dur_usual) {
return(list(periods = pre2dupdata[], pack_info = package_parameters_new[]))
} else {
return(list(periods = pre2dupdata[]))
}
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.