# hurdle: Full Bayesian Models to handle missingness in Economic... In missingHE: Missing Outcome Data in Health Economic Evaluation

## Description

Full Bayesian cost-effectiveness models to handle missing data in the outcomes using Hurdle models under a variatey of alternative parametric distributions for the effect and cost variables. Alternative assumptions about the mechanisms of the structural values are implemented using a hurdle approach. The analysis is performed using the `BUGS` language, which is implemented in the software `JAGS` using the function `jags`. The output is stored in an object of class 'missingHE'.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22``` ```hurdle( data, model.eff, model.cost, model.se = se ~ 1, model.sc = sc ~ 1, se = 1, sc = 0, dist_e, dist_c, type, prob = c(0.025, 0.975), n.chains = 2, n.iter = 20000, n.burnin = floor(n.iter/2), inits = NULL, n.thin = 1, ppc = FALSE, save_model = FALSE, prior = "default", ... ) ```

## Arguments

 `data` A data frame in which to find the variables supplied in `model.eff`, `model.cost` (model formulas for effects and costs) and `model.se`, `model.sc` (model formulas for the structural effect and cost models). Among these, effectiveness, cost and treatment indicator (only two arms) variables must always be provided and named 'e', 'c' and 't', respectively. `model.eff` A formula expression in conventional `R` linear modelling syntax. The response must be a health economic effectiveness outcome ('e') whose name must correspond to that used in `data`. Any covariates in the model must be provided on the right-hand side of the formula. If there are no covariates, `1` should be specified on the right hand side of the formula. By default, covariates are placed on the "location" parameter of the distribution through a linear model. Random effects can also be specified for each model parameter. See details for how these can be specified. `model.cost` A formula expression in conventional `R` linear modelling syntax. The response must be a health economic cost outcome ('c') whose name must correspond to that used in `data`. Any covariates in the model must be provided on the right-hand side of the formula. If there are no covariates, `1` should be specified on the right hand side of the formula. By default, covariates are placed on the "location" parameter of the distribution through a linear model. A joint bivariate distribution for effects and costs can be specified by including 'e' on the right-hand side of the formula for the costs model. Random effects can also be specified for each model parameter. See details for how these can be specified. `model.se` A formula expression in conventional `R` linear modelling syntax. The response must be indicated with the term 'se'(structural effects). Any covariates in the model must be provided on the right-hand side of the formula. If there are no covariates, `1` should be specified on the right hand side of the formula. By default, covariates are placed on the "probability" parameter for the structural effects through a logistic-linear model. Random effects can also be specified for each model parameter. See details for how these can be specified. `model.sc` A formula expression in conventional `R` linear modelling syntax. The response must be indicated with the term 'sc'(structural costs). Any covariates in the model must be provided on the right-hand side of the formula. If there are no covariates, `1` should be specified on the right hand side of the formula. By default, covariates are placed on the "probability" parameter for the structural costs through a logistic-linear model. Random effects can also be specified for each model parameter. See details for how these can be specified. `se` Structural value to be found in the effect variables defined in `data`. If set to `NULL`, no structural value is chosen and a standard model for the effects is run. `sc` Structural value to be found in the cost variables defined in `data`. If set to `NULL`, no structural value is chosen and a standard model for the costs is run. `dist_e` Distribution assumed for the effects. Current available chocies are: Normal ('norm'), Beta ('beta'), Gamma ('gamma'), Exponential ('exp'), Weibull ('weibull'), Logistic ('logis'), Poisson ('pois'), Negative Binomial ('nbinom') or Bernoulli ('bern'). `dist_c` Distribution assumed for the costs. Current available chocies are: Normal ('norm'), Gamma ('gamma') or LogNormal ('lnorm'). `type` Type of structural value mechanism assumed. Choices are Structural Completely At Random (SCAR), and Structural At Random (SAR). `prob` A numeric vector of probabilities within the range (0,1), representing the upper and lower CI sample quantiles to be calculated and returned for the imputed values. `n.chains` Number of chains. `n.iter` Number of iterations. `n.burnin` Number of warmup iterations. `inits` A list with elements equal to the number of chains selected; each element of the list is itself a list of starting values for the `JAGS` model, or a function creating (possibly random) initial values. If `inits` is `NULL`, `JAGS` will generate initial values for all the model parameters. `n.thin` Thinning interval. `ppc` Logical. If `ppc` is `TRUE`, the estimates of the parameters that can be used to generate replications from the model are saved. `save_model` Logical. If `save_model` is `TRUE` a `txt` file containing the model code is printed in the current working directory. `prior` A list containing the hyperprior values provided by the user. Each element of this list must be a vector of length two containing the user-provided hyperprior values and must be named with the name of the corresponding parameter. For example, the hyperprior values for the standard deviation parameter for the effects can be provided using the list `prior = list('sigma.prior.e' = c(0, 100))`. For more information about how to provide prior hypervalues for different types of parameters and models see details. If `prior` is set to 'default', the default values will be used. `...` Additional arguments that can be provided by the user. Examples are `d_e` and `d_c`, which should correspond to two binary indicator vectors with length equal to the number of rows of `data`. By default these variables are constructed within the function based on the observed data but it is possible for the user to directly provide them as a means to explore some Structural Not At Random (SNAR) mechanism assumptions about one or both outcomes. Individuals whose corresponding indicator value is set to `1` or `0` will be respectively associated with the structural or non-structural component in the model. Other optional arguments are `center = TRUE`, which centers all the covariates in the model or the additional arguments that can be provided to the function `bcea` to summarise the health economic evaluation results.

## Details

Depending on the distributions specified for the outcome variables in the arguments `dist_e` and `dist_c` and the type of structural value mechanism specified in the argument `type`, different hurdle models are built and run in the background by the function `hurdle`. These are mixture models defined by two components: the first one is a mass distribution at the spike, while the second is a parametric model applied to the natural range of the relevant variable. Usually, a logistic regression is used to estimate the probability of incurring a "structural" value (e.g. 0 for the costs, or 1 for the effects); this is then used to weigh the mean of the "non-structural" values estimated in the second component. A simple example can be used to show how hurdle models are specified. Consider a data set comprising a response variable y and a set of centered covariate X_j.Specifically, for each subject in the trial i = 1, ..., n we define an indicator variable d_i taking value `1` if the i-th individual is associated with a structural value and `0` otherwise. This is modelled as:

d_i ~ Bernoulli(π_i)

logit(π_i) = γ_0 + ∑γ_j X_j

where

• π_i is the individual probability of a structural value in y.

• γ_0 represents the marginal probability of a structural value in y on the logit scale.

• γ_j represents the impact on the probability of a structural value in y of the centered covariates X_j.

When γ_j = 0, the model assumes a 'SCAR' mechanism, while when γ_j != 0 the mechanism is 'SAR'. For the parameters indexing the structural value model, the default prior distributions assumed are the following:

• γ_0 ~ Logisitic(0, 1)

• γ_j ~ Normal(0, 0.01)

When user-defined hyperprior values are supplied via the argument `prior` in the function `hurdle`, the elements of this list (see Arguments) must be vectors of length `2` containing the user-provided hyperprior values and must take specific names according to the parameters they are associated with. Specifically, the names accepted by missingHE are the following:

• location parameters α_0, β_0: "mean.prior.e"(effects) and/or "mean.prior.c"(costs)

• auxiliary parameters σ: "sigma.prior.e"(effects) and/or "sigma.prior.c"(costs)

• covariate parameters α_j, β_j: "alpha.prior"(effects) and/or "beta.prior"(costs)

• marginal probability of structural values γ_0: "p.prior.e"(effects) and/or "p.prior.c"(costs)

• covariate parameters in the model of the structural values γ_j (if covariate data provided): "gamma.prior.e"(effects) and/or "gamma.prior.c"(costs)

For simplicity, here we have assumed that the set of covariates X_j used in the models for the effects/costs and in the model of the structural effect/cost values is the same. However, it is possible to specify different sets of covariates for each model using the arguments in the function `hurdle` (see Arguments).

For each model, random effects can also be specified for each parameter by adding the term + (x | z) to each model formula, where x is the fixed regression coefficient for which also the random effects are desired and z is the clustering variable across which the random effects are specified (must be the name of a factor variable in the dataset). Multiple random effects can be specified using the notation + (x1 + x2 | site) for each covariate that was included in the fixed effects formula. Random intercepts are included by default in the models if a random effects are specified but they can be removed by adding the term 0 within the random effects formula, e.g. + (0 + x | z).

## Value

An object of the class 'missingHE' containing the following elements

data_set

A list containing the original data set provided in `data` (see Arguments), the number of observed and missing individuals , the total number of individuals by treatment arm and the indicator vectors for the structural values

model_output

A list containing the output of a `JAGS` model generated from the functions `jags`, and the posterior samples for the main parameters of the model and the imputed values

cea

A list containing the output of the economic evaluation performed using the function `bcea`

type

A character variable that indicate which type of structural value mechanism has been used to run the model, either `SCAR` or `SAR` (see details)

Andrea Gabrio

## References

Ntzoufras I. (2009). Bayesian Modelling Using WinBUGS, John Wiley and Sons.

Daniels, MJ. Hogan, JW. (2008). Missing Data in Longitudinal Studies: strategies for Bayesian modelling and sensitivity analysis, CRC/Chapman Hall.

Baio, G.(2012). Bayesian Methods in Health Economics. CRC/Chapman Hall, London.

Gelman, A. Carlin, JB., Stern, HS. Rubin, DB.(2003). Bayesian Data Analysis, 2nd edition, CRC Press.

Plummer, M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. (2003).

`jags`, `bcea`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62``` ```# Quck example to run using subset of MenSS dataset MenSS.subset <- MenSS[50:100, ] # Run the model using the hurdle function assuming a SCAR mechanism # Use only 100 iterations to run a quick check model.hurdle <- hurdle(data = MenSS.subset, model.eff = e ~ 1,model.cost = c ~ 1, model.se = se ~ 1, model.sc = sc ~ 1, se = 1, sc = 0, dist_e = "norm", dist_c = "norm", type = "SCAR", n.chains = 2, n.iter = 100, ppc = FALSE) # Print the results of the JAGS model print(model.hurdle) # # Use dic information criterion to assess model fit pic.dic <- pic(model.hurdle, criterion = "dic", module = "total") pic.dic # # Extract regression coefficient estimates coef(model.hurdle) # # Assess model convergence using graphical tools # Produce histograms of the posterior samples for the mean effects diag.hist <- diagnostic(model.hurdle, type = "histogram", param = "mu.e") # # Compare observed effect data with imputations from the model # using plots (posteiror means and credible intervals) p1 <- plot(model.hurdle, class = "scatter", outcome = "effects") # # Summarise the CEA information from the model summary(model.hurdle) # Further examples which take longer to run model.hurdle <- hurdle(data = MenSS, model.eff = e ~ u.0,model.cost = c ~ e, model.se = se ~ u.0, model.sc = sc ~ 1, se = 1, sc = 0, dist_e = "norm", dist_c = "norm", type = "SAR", n.chains = 2, n.iter = 500, ppc = FALSE) # # Print results for all imputed values print(model.hurdle, value.mis = TRUE) # Use looic to assess model fit pic.looic<-pic(model.hurdle, criterion = "looic", module = "total") pic.looic # Show density plots for all parameters diag.hist <- diagnostic(model.hurdle, type = "denplot", param = "all") # Plots of imputations for all data p1 <- plot(model.hurdle, class = "scatter", outcome = "all") # Summarise the CEA results summary(model.hurdle) # # ```