# Introduction to missingHE" In missingHE: Missing Outcome Data in Health Economic Evaluation

BCEA::ceac.plot(NN.sel$cea)  For more information on how to interpret and customise these plots, see the BCEA package. ## Pattern mixture models The second type of missingness model available in missingHE are pattern mixture models, which can be fitted using the function pattern. These require the specification of two models. • The models are denoted with the terms model.eff and model.cost, and refer to the effectiveness and cost models in a similar way to what shown for selection in terms of related arguments, including the type of distributons and missingness assumptions. If the model is fitted under MAR, the argument Delta_e and Delta_c must be set to$0$. These are the priors on the sensitivity parameters which can be used to specify a MNAR assumption and that should therefore be removed under MAR. • However, in contrast to selection models, the models for$e$and$c$in pattern are fitted within each missingness pattern in the dataset. Patterns are defined only based on the number of individuals with observed and missing outcome data, for a total of$4$maximum number of patterns. Parameters that cannot be identified from the data (because missing) are identified using some modelling restrictions. Two types of restrictions are available in missingHE: "CC" and "AC", which can be set using the argument restriction. The first identifies all unidentified parameters by setting them equal to the parameters estimated from the complete cases, while the second using those estimated from the available cases (patterns where the other outcome is missing). Type help(pattern) for more information on the assumptions of the model. • Although not directly accessible from pattern, the function implicitly fits a model for the probability of being associated with each missingness pattern in the data. This model is estimated using multinomial distributions and weakly-informative priors on all pattern probabilities. Posterior estimates for these parameters are then used to compute the weighted mean effects and costs across the patterns. NN.pat=pattern(data = MenSS, model.eff = e ~ u.0, model.cost = c ~ e, type = "MAR", restriction = "CC", n.iter = 1000, Delta_e = 0, Delta_c = 0, dist_e = "norm", dist_c = "norm", ppc = TRUE)  NN.pat=pattern(data = MenSS, model.eff = e ~ u.0, model.cost = c ~ e, type = "MAR", restriction = "CC", n.iter = 1000, Delta_e = 0, Delta_c = 0, dist_e = "norm", dist_c = "norm", ppc = TRUE)  The model above assumes normal distributions for both outcomes under a MAR assumption and uses complete case restrictions to identify the parameters in each pattern. We can inspect the results by doing coef(NN.pat, random = FALSE)  which shows the presence of only two patterns in the dataset, given that estimates from only two patterns are displayed for each model. Note that estimates are exactly the same between the patterns, suggesting that one of the two is the complete case pattern and the other is formed by completely missing individuals (for whom estimates are set equal to those from the complete cases by setting restriction = "CC" inside pattern). Aggregted mean estimates over the patterns can be retrieved using print or, together with summary CEA results, using the summary command summary(NN.pat)  Standard graphical economic outputs based on the model results can again be obtained using functions from the BCEA package. ## Hurdle models The last type of missingness model that can be fitted in missingHE are hurdle models, implemented via the function hurdle. These require the specification of four models. • The first two are the models for$e$and$c$and are very similar to those used in selection. However, hurdle models are not technically speaking missingness models in that they do not allow to choose among specific missingness assumptions. They consist in two-part regressions designed to handle the presence of structural values in the data. The presence/absence of structural values in the effects and costs data can be specified in the function using the arguments se and sc. They must be set to NULL if the structural values in one of the outcomes is absent, and must be set equal to the actual structural value if these are present. • The last two models are fitted to the indicator variables associated with the presence or absence of a structural value for each individual in the data, denoted with the terms model.se and model.sc. These models estimate the probability of being associated with a structural effect and cost value using logistic regressions in a similar fashion to the models model.me and model.mc in selection. Once these probabilities are esitmated, the overall mean effects and costs in each arm are obtained through a weighted average between the means of the non-structural component (obtained from the models of$e$and$c$) and the corresponding probabilities of having a structural value. • Due to the construction of hurdle models, the argument type takes different values compared to the standard MAR/MNAR assumptions in that the assumptions of the model are related to the probability of having a structural value rather than a missing value. missingHE allows to choose among Structural Completely At Random (SCAR) and Structural At Random (SAR) assumptions, the difference being the absence or presence of some covariate in the model for the structural probabilities. Within a Bayesian approach, hurdle models can be extended to impute missing values without the need of any ad-hoc imputation steps. We refer to help(hurdle) for more details on the assumptions behind hurdle models. NN.hur=hurdle(data = MenSS, model.eff = e ~ u.0, model.cost = c ~ e, model.se = se ~ 1, model.sc = sc ~ age, type = "SAR", se = 1, sc = 0, n.iter = 1000, dist_e = "norm", dist_c = "norm", ppc = TRUE)  NN.hur=hurdle(data = MenSS, model.eff = e ~ u.0, model.cost = c ~ e, model.se = se ~ 1, model.sc = sc ~ age, type = "SAR", se = 1, sc = 0, n.iter = 1000, dist_e = "norm", dist_c = "norm", ppc = TRUE)  The fitted model allows for the presence of structural ones in$e$(se = 1) and zeros in$c$(sc = 0) under a SAR assumoptions using age as a predictor for estimating the probability of having a structural value for both outcomes. We can extract the results from the regressions of$e$and$c$by typing coef(NN.hur, random = FALSE)  If interest is in the estimates for the parameters indexing the models of se and sc, the entire posterior distributions for these (as well as those of any other parameter of the model) can be extracted by accessing the elements stored inside the model_output list, available by typing NN.hur$model_output.

Finally, economic results can be summarised as

summary(NN.hur)


and further exploration of the results can be done using the package BCEA.

## Model assessment

Before even looking at the results of the models fitted using selection, pattern or hurdle, it is recommended to check for possible issues in the convergence of the MCMC algorithm which, if present, may hinder the validity of the inferences. This is standard practice when fitting models based on iterative simulation methods, such as MCMC, where a larger number of iterations may be required to ensure the stability of the results.

missingHE allows to implement different types of convergence diagnostics for each type of model via the function diagnostic. For example, consider the selection model that we fitted before and saved into the object NN.sel. We can examine posterior density plots for the mean effects by treatment arm by typing

diagnostic(NN.sel, type = "denplot", param = "mu.e", theme = NULL)


The plots above do not indicate any potential issue in terms of failed convergence since estimates from both chains seem to overlap quite well (i.e. a single posterior distribution for each parameter seems to be reached).

1. The argument type is used to select the desired type of diagnostic measure (see help(diagnostic) for a list of all types available). For example, we can look at trace plots for the mean costs estimated from the pattern mixture model by typing
diagnostic(NN.pat, type = "traceplot", param = "mu.c", theme = NULL)

1. The argument param denotes the parameter for which the diagnostics should be shown. The list of parameters that can be selected varies depending on the type of model fitted (e.g. selection or hurdle) and the assumptions made (e.g. MAR or MNAR). Type help(diagnostic) for the full list of parameters available for each type of model and assumptions. It is also possible to set param = "all" to display the diagnostic results of all parameters in the model together. For example, we can look at the autocorrelation plots for the posterior distribution of the probability of having a structural zero costs in the hurdle model by typing
diagnostic(NN.hur, type = "acf", param = "p.c", theme = "base")

1. The argument theme selects the type of backgroung theme to be used for plotting, chosen among a pre-defined set of themes whose names can be seen by typing help(diagnostic).

## Checking imputations

It is possible to look at how missing outcome values are imputed by each type of model using the generic function plot that, when applied to an object generated by missingHE functions, such as the model stored in NN.sel, produces the following output

plot(NN.sel, class = "scatter", outcome = "all")


The four plots show the observed values (black dots) and the posterior distribution of the imputed values (red dots and lines) by type of outcome (effects top, costs bottom) and treatment group (control left, intervention right).

1. The argument class specifies what type of graphical output should be displayed, either a scatter plot (scatter - default option) or a histogram (histogram). For example, we can show the histogram of the imputations produced by the pattern mixture model by typing
plot(NN.pat, class = "histogram", outcome = "all")

1. The argument outcome specifies for which outcome the results should be shown. Available choices include either all variables in both groups (all - default), only the effects or costs variables (effects and costs), only the variables in a specific group (arm1 and arm2) or a combination of these. For example, we can look at the distributions of the imputed costs in the control group for the hurdle model by typing
plot(NN.hur, class = "scatter", outcome = "costs_arm1")


## Model comparison and fit

We can check the fit of the models to the observed data by looking at posterior predictive checks (PPC) and compare the fit of altenative model specifications via preditive information criteria (PIC). Both measures are really useful when checking whether or not the results from the model align with the information from the observed data and for choosing the best models among those considered.

### PPC

The idea behind PPCs consists in using the estimated parameters from the model to generate replications of the data, which can then be compared with the observed values to detect possible inconsistencies in the replications. The main objective is to see whether the model is able to capture some aspects of the data which are of interest (e.g. mean, skeness, proportions of structural values, etc...), which would suggest a good fit of the model.

You can implement different types of checks in missingHE using the function ppc. For example, we can look at replicates of the histograms of the data for the effects in the control group based on the results of the selection model by typing

ppc(NN.sel, type = "histogram", outcome = "effects_arm1", ndisplay = 8)

1. The argument type selects the type of check to display. Different types of plots can be drawn using specific names. See help(ppc) for the full list of choices. The argument ndisplay indicates the number of replications that should be displayed for the comparison. For example, we can compare the observed and replicated kernel densities for the effects in the control group based on the results from the pattern mixture model by typing
ppc(NN.pat, type = "dens", outcome = "effects_arm1", ndisplay = 8)

1. The argument outcome chooses the type of variables for which results should be displayed. Available options include: all for both effects and costs in each treatment group, effects and costs for the corresponding outcomes in each group, and a combination of these. See help(ppc) for the list of all options. For example, we can look at overlayed densities between observed and replicated data for all variables based on the results from the hurdle model by typing
ppc(NN.hur, type = "dens_overlay", outcome = "all", ndisplay = 25)


### PIC

PICs compare the fit to the observed data form alternative model specifications in terms of a measure based on the loglikelihood of the model (deviance) and a penalty term for model complexity (effective number of parameters). The key message is that models associated with lower PIC values have a better fit to the observed data compared with models associated with higher PIC values. It is very important to remember that, when dealing with partially-observed data, the fit of the model can only be assessed based on the observed values. Thus, comparison by means of PICs is always partial since the fit to the unobserved values can never be checked. This is why it is generally not recommened to compare models fitted under MNAR assumptions as the comparison may be completely meaningless.

Three main types of PICs can be selected in missingHE via the pic function. Choices include: the Deviance Information Criterion (DIC), the Widely Applicable Inofrmation Criterion (WAIC), and the Leave-One-Out Information Criterion (LOOIC). Among these, the latter two are typically preferred as they are calculated on the full posterior distribution of the model and do not suffer from some potential drawbacks (e.g. reparameterisation of the model) that may instead affect the DIC. Type help(pic) for more details about these measures. For example, we can compare the fit to the observed data from the three models fitted using WAIC by typing

pic_sel <- pic(NN.sel, criterion = "waic", module = "both")
pic_pat <- pic(NN.pat, criterion = "waic", module = "both")
pic_hur <- pic(NN.hur, criterion = "waic", module = "both")

#print results
c(pic_sel$waic, pic_pat$waic, pic_hur$waic)  The results indicate a much better fit of the hurdle model compared to the others, with a WAIC estimate which is negative. This is reasonable since hurdle models can capture the structural values which are instead ignored by selection or pattern mixture models. However, hurdle models do not allow the exploration of MNAR assumptions and therefore their results are entirely based on MAR. The argument criterion specifies the type of PIC to use for the assessment, while module indicates for which parts of the model the measure should be evaluated. Choices are: total (default), which shoul be used for comparing models having the same structure; both, which uses both the models for$e$and$c\$ but not the auxiliary models (e.g. those for me and mc in selection); e or c, which use only the model for the effects or costs.

## Try the missingHE package in your browser

Any scripts or data that you put into this service are public.

missingHE documentation built on July 1, 2020, 5:50 p.m.