In ready4-dev/map2aqol: Transform Youth Outcomes to Health Utility Predictions

Example datasets

To illustrate the input data requirements and the practical application of youthu functions, we adapt a replication dataset from the youthu's sister package youthvars to create synthetic (fake) single group and matched group datasets. For our purposes we are going to use those records where a study participants had a baseline PHQ9 score of under 20.

data("replication_popl_tb", package = "youthvars")
seed_ds_tb <- replication_popl_tb %>% youthvars::transform_raw_ds_for_analysis() %>% dplyr::filter(fkClientID %in% (replication_popl_tb %>% dplyr::filter(round=="Baseline" & PHQ9<20) %>% dplyr::pull(fkClientID)))

We need to further adapt this dataset. To to so, we first make a list object ds_smry_ls that summarises the desired features of these datasets, such as:

the names of the predictors for utility to be included in the dataset;
the name of variables for unique unit identifier, data collection round, data collection date, costs and health utility measure;
(for matched dataset only) the name of the variable for comparison group and the allowable values for that variable;
the allowed values of the data collection round variable, the dates during which baseline data measures are collected and distribution parameters for the duration between baseline and follow-up timepoints;
population (not sample) parameters to generate gamma distributed costs for periods prior to study entry and between baseline and follow-up measurements for the whole single group dataset or the control arm of a matched dataset.

In the below example, our fake baseline data collection times are randomly sampled from a sequence of 180 consecutive days from the last year, the time spent in the study for each participant is sampled from a truncated normal distribution, bounded between 160 and 220 days with a mean of 180 days, and mean costs in the period prior to study entry were \$400, while the costs of resources consumed by participants over the lifetime of the study have a mean of \$1500. The names of the predictor variables we will be using are specified as PHQ9 and SOFAS.

ds_smry_ls <- list(bl_start_date_dtm = Sys.Date() - lubridate::days(300),
                   bl_end_date_dtm = Sys.Date() - lubridate::days(120),
                   cmprsn_var_nm_1L_chr = "study_arm_chr",
                   cmprsn_groups_chr = c("Intervention","Control"),
                   costs_mean_dbl = c(400,1500),
                   costs_sd_dbl = c(100,220),
                   costs_var_nm_1L_chr = "costs_dbl",
                   date_var_nm_1L_chr = "date_psx",
                   duration_args_ls = list(a = 160, b = 220, mean = 180, sd = 7),
                   duration_fn = truncnorm::rtruncnorm,
                   id_var_nm_1L_chr = "fkClientID",
                   predr_var_nms = c("PHQ9", "SOFAS"),
                   round_var_nm_1L_chr = "round",
                   round_lvls_chr = c("Baseline","Follow-up"), 
                   utl_var_nm_1L_chr = "AQoL6D_HU")

We pass both the ds_smry_ls summary list and the seed_ds_tb seed dataset to the make_sngl_grp_ds{.R} function to create an example single group dataset.

sngl_grp_ds_tb <- make_sngl_grp_ds(seed_ds_tb,ds_smry_ls = ds_smry_ls)

We can split a subset of our single group dataset into two propensity score matched groups - one each for intervention and control - using the make_matched_ds{.R} function. We also use this function to build in our assumptions about group differences in changes in PHQ9, SOFAS and costs measured at follow-up. The below example assumes population (not sample) mean change differences between Intervention and Control arms of -2 for PHQ9, -2 for SOFAS and +\$300 for costs.

matched_ds_tb <- make_matched_ds(sngl_grp_ds_tb, 
                                 cmprsn_smry_tb = tibble::tibble(var_nms_chr = c(ds_smry_ls$predr_var_nms, ds_smry_ls$costs_var_nm_1L_chr),
                                 fns_ls = list(stats::rnorm,stats::rnorm,stats::rnorm),
                                 abs_mean_diff_dbl = c(2,2,300),
                                 diff_sd_dbl = c(2,2,200),
                                 multiplier_dbl = c(-1,-1,1),
                                 min_dbl = c(0,0,0),
                                 max_dbl = c(27,100,Inf),
                                 integer_lgl = c(T,T,F)), 
                                 ds_smry_ls = ds_smry_ls)