Note: This vignette uses fake data - it is for illustrative purposes only and should not be used to inform decision making. The specific package includes ready4 framework model modules that form part of the ready4 youth mental health economic model. Currently, these modules are not optimised to be used directly, but are instead intended for use in other model modules. For example, the TTU package includes modules that extend specific modules to help implement utility mapping studies. However, to illustrate the main features of specific modules this vignette demonstrates how specific modules could be used independently. In practice, workflow illustrated in this article would probably need to be performed iteratively in order to identify the optimal model types, predictors and covariates and to update default values to ensure model convergence.

library(ready4)
library(scorz)
library(specific)
# Remove this chunk once it clear that these libraries are not required for vignette rendering
# library(ready4show)
# library(ready4use)
# library(scorz)
# library(betareg)
# library(youthvars)

Set consent policy

By default, modules in the specific package will request your consent before writing files to your machine. This is the safest option. However, as there are many files that need to be written locally for this program to execute, you can overwrite this default by supplying the value "Y" to methods with a consent_1L_chr argument.

consent_1L_chr <- "" # Default value - asks for consent prior to writing each file.
consent_1L_chr <- "Y" # Gives consent to write files without additional requests.

Import data

We start by ingesting our data. As this example uses EQ-5D data, we import a ScorzEuroQol5 ready4 framework module (created using the steps described in this vignette from the scorz pacakge) into a SpecificConverter Module and then apply the metamorphose method to convert it into a SpecificModel module.

X <- SpecificConverter(a_ScorzProfile = ready4use::Ready4useRepos(gh_repo_1L_chr = "ready4-dev/scorz", 
                                                                  gh_tag_1L_chr = "Documentation_0.0") %>%
                         ingest(fls_to_ingest_chr = "ymh_ScorzEuroQol5",  metadata_1L_lgl = F)) %>% 
  metamorphose() 
class(X)

Inspect data

ds_tb <- X %>% procureSlot("a_YouthvarsProfile@a_Ready4useDyad@ds_tb") 

The dataset we are using has a total of r nrow(ds_tb) records at two timepoints on r length(ds_tb$uid %>% unique()) study participants. The first six records are reproduced below.

X %>%
  procureSlot("a_YouthvarsProfile@a_Ready4useDyad") %>%
  exhibit(display_1L_chr = "head", scroll_box_args_ls = list(width = "100%"))

To source dataset of X is contained in the a_YouthvarsProfile slot and is a YouthvarsSeries module. For more information about methods that can be used to explore this dataset, read this vignette from the youthvars package.

Specify parameters

In preparation for exploring our dataset, we need to declare a set of model parameters in a b_SpecificParameters slot of X. This can be done in one step, or in sequential steps. In this example, we will proceed sequentially.

Dependent variable

The dependent variable (total EQ-5D utility score) has already been specified when we imported the data from the ScorzEuroQol5 module.

procureSlot(X, "b_SpecificParameters@depnt_var_nm_1L_chr")

We can now add details of the allowable range of dependent variable values.

X <- renewSlot(X, "b_SpecificParameters@depnt_var_min_max_dbl", c(-1,1))

Candidate predictors

We can now specify the names of candidate predictor variables.

X <- renewSlot(X, "b_SpecificParameters@candidate_predrs_chr", c("K10_int","Psych_well_int")) 

We next add meta-data about each candidate predictor variable in the form of a specific_predictors object.

X <- renewSlot(X, "b_SpecificParameters@predictors_lup", class_chr = "integer", class_fn_chr = c("youthvars::youthvars_k10_aus","as.integer"), covariate_lgl = F, increment_dbl = 1,
               long_name_chr = c("Kessler Psychological Distress - 10 Item Total Score", "Overall Wellbeing Measure (Winefield et al. 2012)"), max_val_dbl = c(50,90), min_val_dbl = c(10,18), mdl_scaling_dbl = 0.01,
               short_name_chr = c("K10_int","Psych_well_int"))

The specific_predictors object that we have added to X can be inspected using the exhibitSlot method.

exhibitSlot(X, "b_SpecificParameters@predictors_lup", scroll_box_args_ls = list(width = "100%"))

Covariates

We also specify the covariates that we aim to explore in conjunction with each candidate predictor.

X <- renewSlot(X, "b_SpecificParameters@candidate_covars_chr", c("d_sex_birth_s", "d_age",  "d_sexual_ori_s", "d_studying_working"))

Descriptive variables

We also specify variables that we will use for generating descriptive statistics about the dataset.

X <- renewSlot(X,"b_SpecificParameters@descv_var_nms_chr", c("d_age","Gender","d_relation_s", "d_sexual_ori_s", "Region", "d_studying_working")) 

Temporal variables

The name of the dataset variable for data collection timepoint and all of its unique values were imported when converting the ScorzEuroQol5 module.

procureSlot(X,"a_YouthvarsProfile@timepoint_var_nm_1L_chr")
procureSlot(X,"a_YouthvarsProfile@timepoint_vals_chr")

However, we also need to specify the name of the variable that contains the datestamp for each dataset record.

X <- renewSlot(X, "b_SpecificParameters@msrmnt_date_var_nm_1L_chr", "data_collection_dtm")

Candidate models

X was created with a default set of candidate models, stored as a specific_models sub-module, which can be inspected using the exhibitSlot method.

exhibitSlot(X, "b_SpecificParameters@candidate_mdls_lup", scroll_box_args_ls = list(width = "100%"))

We can choose to select just a subset of these to explore using the renewSlot method. As this is an illustrative example, we have restricted the models we will explore to just four types, passing the relevant row numbers to the slice_indcs_int argument.

X <- renewSlot(X, "b_SpecificParameters@candidate_mdls_lup", slice_indcs_int = c(1L,5L,7L,8L))

Other parameters

Depending on the type of analysis we plan on undertaking, we can also specify parameters such as the number of folds to use in cross validation, the maximum number of model runs to allow and a seed to ensure reproducibility of results. In this case we are going to use the default values generated when we first created X.

procureSlot(X, "b_SpecificParameters@folds_1L_int")
procureSlot(X, "b_SpecificParameters@max_mdl_runs_1L_int")
procureSlot(X, "b_SpecificParameters@seed_1L_int")

Model testing

Before we start to use the data stored in X to undertake modelling, we must first validate that it contains all necessary (and internally consistent) data by using the ratify method. The call to ratify will update any variable names that are likely to cause problems when generating reports (e.g. through inclusion of characters like "_" in the variable name that can cause problems when rendering LaTeX documents).

X <- ratify(X)

Set-up workspace

We add details of the directory to which we will write all output. In this example we create a temporary directory (tempdir()), but in practice this would be an existing directory on your local machine.

X <- renewSlot(X, "paths_chr", tempdir())

It can be useful to save fake data (useful for demonstrating the generalisability and replicability of an analysis) and real data (required for write-up and reproducibility) is distinctly labelled directories. By default, X is created with a flag to save all output in a sub-directory "Real". As we are using fake data, we can override this value.

X <- renewSlot(X, "b_SpecificParameters@fake_1L_lgl", T)

We can now write a number of sub-directories to our specified output directory.

X <- author(X, what_1L_chr = "workspace", consent_1L_chr = consent_1L_chr)

Descriptives

The first set of outputs we write to our output directories is a set of descriptive tables and plots.

X <- author(X, consent_1L_chr = consent_1L_chr, digits_1L_int = 3L,  what_1L_chr = "descriptives")

Model comparisons

The investigate method can now be used to compare the candidate models we have specified earlier. In so doing it will transform X into a SpecificPredictors object.

X <- investigate(X, consent_1L_chr = consent_1L_chr, depnt_var_max_val_1L_dbl = 0.99, session_ls = sessionInfo())
class(X)

The investigate method will write each model to be tested to a new sub-directory of our output directory.

plt_nms_chr <- paste0("Plot ",1:5) # Make fn
# Also written to that sub-directory are a number of plots for each model, that can be inspected using the `depict` method.
depict(X, mdl_indcs_int = 2)

The investigate method also outputs a table summarising the performance of each of the candidate models.

exhibit(X, what_1L_chr = "mdl_cmprsn", type_1L_chr = "results") 

We can now identify the highest performing model in each category of candidate model based on the testing R^2^ statistic.

# Check below if fix needed for NA_ returns (one option within ctg) has been implemented
procure(X, what_1L_chr = "prefd_mdls") 

We can override these automated selections and instead incorporate other considerations (possibly based on judgments informed by visual inspection of the plots and the desirability of constraining predictions to a maximum value of one). We do this in the following command, specifying new preferred model types, in descending order of preference.

X <- renew(X, new_val_xx = c("BET_LGT", "OLS_CLL"), type_1L_chr = "results", what_1L_chr = "prefd_mdls")

Use most preferred model to compare all candidate predictors

We can now compare all of our candidate predictors (with and without candidate covariates) using the most preferred model type.

X <- investigate(X, consent_1L_chr = consent_1L_chr)
class(X)

Now, we compare the performance of single predictor models of our preferred model type (in our case, a r X@c_SpecificResults@a_SpecificShareable@shareable_outp_ls$mdl_smry_ls$mdl_types_lup %>% ready4::get_from_lup_obj(target_var_nm_1L_chr = "long_name_chr",match_value_xx = X@c_SpecificResults@a_SpecificShareable@shareable_outp_ls$mdl_smry_ls$prefd_mdl_types_chr[1], match_var_nm_1L_chr = "short_name_chr", evaluate_1L_lgl = F)) for each candidate predictor. The last call to the investigate{.R} saved the tested models along with model plots in a sub-directory of our output directory. These results are also viewable as a table.

exhibit(X, scroll_box_args_ls = list(width = "100%"), type_1L_chr = "results", what_1L_chr = "predr_cmprsn")

The most recent call to the investigate{.R} method also saved single predictor R model objects (one for each candidate predictors) along with the two plots for each model in a sub-directory of our output directory. The performance of each single predictor model can also be summarised in a table.

exhibit(X, type_1L_chr = "results", what_1L_chr = "fxd_sngl_cmprsn")

Updated versions of each of the models in the previous step (this time with covariates added) are saved to a new subdirectory of the output directory and we can summarise the performance of each of the updated models, along with all signficant model terms, in a table.

exhibit(X, scroll_box_args_ls = list(width = "100%"), type_1L_chr = "results", what_1L_chr = "fxd_full_cmprsn")

We can now identify which, if any, of the candidate covariates we previously specified are significant predictors in any of the models.

procure(X, type_1L_chr = "results", what_1L_chr = "signt_covars")

We can override the covariates to select, potentially because we want to select only covariates that are significant for all or most of the models. However, in the below example we have opted not to do so and continue to use r ifelse(is.na(X@c_SpecificResults@a_SpecificShareable@shareable_outp_ls$mdl_smry_ls$signt_covars_chr),"no covariates",paste0(X@c_SpecificResults@a_SpecificShareable@shareable_outp_ls$mdl_smry_ls$signt_covars_chr, collapse = "and ")) as selected by the algorithm in the previous step.

# X <- renew(X, new_val_xx = c("COVARIATE OF YOUR CHOICE", "ANOTHER COVARIATE"), type_1L_chr = "results", what_1L_chr = "prefd_covars")

Test preferred model with preferred covariates for each candidate predictor

We now conclude our model testing by rerunning the previous step, except confining our covariates to those we prefer.

X <- investigate(X, consent_1L_chr = consent_1L_chr)
class(X)

The previous call to the write_mdls_with_covars_cmprsn{.R} function saves the tested models along with two plots for each model in the "E_Predrs_W_Covars_Sngl_Mdl_Cmprsn" sub-directory of "Output".

Apply preferred model types and predictors to longitudinal data

The next main step is to use the preferred model types and covariates identified from the preceding analysis of cross-sectional data in longitudinal analysis.

Longitudinal mixed modelling

Prior to undertaking longitudinal mixed modelling, we need to check the appropriateness of the default values for modelling parameters that are stored in X. These include the number of model iterations, and any custom control parameters and priors (by default, empty lists).

procureSlot(X, "b_SpecificParameters@iters_1L_int")

In many cases there will be no need to specify any custom control parameters or priors and using the defaults may speed up execution.

procureSlot(X, "b_SpecificParameters@control_ls")
procureSlot(X,"b_SpecificParameters@prior_ls")

However, in this example using the default control parameters would result in warning messages suggesting a change to the adapt_delta control value (default = 0.8). Modifying the adapt_delta control parameter value can address this issue.

X <- renewSlot(X, "b_SpecificParameters@control_ls", new_val_xx = list(adapt_delta = 0.99))
X <- investigate(X, consent_1L_chr = consent_1L_chr)
class(X)

The last call to investigate{.R} function wrote the models it tests to a sub-directory of the output directory along with plots for each model.

Create shareable outputs

The model objects created by the preceding analysis are not suitable for sharing as they contain duplicates of the source dataset. To create model objects that can be shared (where dataset copies are replaced with fake data) use the authorData method.

X <- authorData(X, consent_1L_chr = consent_1L_chr)

Purge dataset copies

For the purposes of efficient computation, multiple objects containing copies of the source dataset were saved to our output directory during the analysis process. We therefore need to delete all of these copies by supplying "purge_write" to the type_1L_chr argument of the author method.

X <- author(X, consent_1L_chr = consent_1L_chr, type_1L_chr = "purge_write")
Z <- ready4use::Ready4useRepos(gh_repo_1L_chr = "ready4-dev/specific", # Replace with details of your repo.
                               gh_tag_1L_chr = "Documentation_0.0") # You must have write permissions.
share(X, fl_nm_1L_chr = "eq5d_ttu_SpecificMixed", repos_Ready4useRepos = Z)

A copy of the module X is available for download as the file eq5d_ttu_SpecificMixed.RDS from the "Documentation_0.0" release of the specific package.



ready4-dev/specific documentation built on Oct. 13, 2023, 7:54 a.m.