View source: R/disbayes_hier.R
disbayes_hier | R Documentation |
A variant of disbayes
in which data from different areas can be
related in a hierarchical model and, optionally, the effect of gender can be
treated as additive with the effect of area. This is much more computationally
intensive than the basic model in disbayes
. Time trends are not
supported in this function.
disbayes_hier(
data,
group,
gender = NULL,
inc_num = NULL,
inc_denom = NULL,
inc_prob = NULL,
inc_lower = NULL,
inc_upper = NULL,
prev_num = NULL,
prev_denom = NULL,
prev_prob = NULL,
prev_lower = NULL,
prev_upper = NULL,
mort_num = NULL,
mort_denom = NULL,
mort_prob = NULL,
mort_lower = NULL,
mort_upper = NULL,
rem_num = NULL,
rem_denom = NULL,
rem_prob = NULL,
rem_lower = NULL,
rem_upper = NULL,
age = "age",
cf_init = 0.01,
eqage = 30,
eqagehi = NULL,
cf_model = "default",
inc_model = "smooth",
rem_model = "const",
prev_zero = FALSE,
sprior = c(1, 1, 1),
hp_fixed = NULL,
nfold_int_guess = 5,
nfold_int_upper = 100,
nfold_slope_guess = 5,
nfold_slope_upper = 100,
mean_int_prior = c(0, 10),
mean_slope_prior = c(5, 5),
gender_int_priorsd = 0.82,
gender_slope_priorsd = 0.82,
inc_prior = c(1.1, 0.1),
rem_prior = c(1.1, 1),
method = "opt",
draws = 1000,
iter = 10000,
stan_control = NULL,
...
)
data |
Data frame containing some of the variables below. The variables below are provided as character strings naming columns in this data frame. For each disease measure available, one of the following three combinations of variables must be specified: (1) numerator and denominator (2) estimate and denominator (3) estimate with lower and upper credible limits. Mortality must be supplied, and at least one of incidence and prevalence. If remission is assumed to be possible, then remission data should also be supplied (see below). Estimates refer to the probability of having some event within a year, rather than rates. Rates per year $r$ can be converted to probabilities $p$ as $p = 1 - exp(-r)$, assuming the rate is constant within the year. For estimates based on registry data assumed to cover the whole population, then the denominator will be the population size. |
group |
Variable in the data representing the area (or other grouping factor). |
gender |
If |
inc_num |
Numerator for the incidence data, assumed to represent the
observed number of new cases within a year among a population of size
|
inc_denom |
Denominator for the incidence data. The function Note that to include extra uncertainty beyond that implied by a published interval, the numerator and denominator could be multiplied by a constant, for example, multiplying both the numerator and denominator by 0.5 would give the data source half its original weight. |
inc_prob |
Estimate of the incidence probability |
inc_lower |
Lower credible limit for the incidence estimate |
inc_upper |
Upper credible limit for the incidence estimate |
prev_num |
Numerator for the estimate of prevalence, i.e. number of people currently with a disease. |
prev_denom |
Denominator for the estimate of prevalence (e.g. the size of the survey used to obtain the prevalence estimate) |
prev_prob |
Estimate of the prevalence probability |
prev_lower |
Lower credible limit for the prevalence estimate |
prev_upper |
Upper credible limit for the prevalence estimate |
mort_num |
Numerator for the estimate of the mortality probability, i.e number of deaths attributed to the disease under study within a year |
mort_denom |
Denominator for the estimate of the mortality probability (e.g. the population size, if the estimates were obtained from a comprehensive register) |
mort_prob |
Estimate of the mortality probability |
mort_lower |
Lower credible limit for the mortality estimate |
mort_upper |
Upper credible limit for the mortality estimate |
rem_num |
Numerator for the estimate of the remission probability, i.e number of people observed to recover from the disease within a year. Remission data should be supplied if remission is permitted in the model, either as a numerator and denominator or as an estimate and lower credible interval. Conversely, if no remission data are supplied, then remission is assumed to be impossible. These "data" may represent a prior judgement rather than observation - lower denominators or wider credible limits represent weaker prior information. |
rem_denom |
Denominator for the estimate of the remission probability |
rem_prob |
Estimate of the remission probability |
rem_lower |
Lower credible limit for the remission estimate |
rem_upper |
Upper credible limit for the remission estimate |
age |
Variable in the data indicating the year of age. This must start at age zero, but can end at any age. |
cf_init |
Initial guess at a typical case fatality value, for an average age. |
eqage |
Case fatalities (and incidence and remission rates) are assumed to be equal for all ages below this age, inclusive, when using the smoothed model. |
eqagehi |
Case fatalities (and incidence and remission rates) are assumed to be equal for all ages above this age, inclusive, when using the smoothed model. |
cf_model |
The following alternative models for case fatality are supported:
In all models, case fatality is a smooth function of age. |
inc_model |
Model for how incidence varies with age.
|
rem_model |
Model for how remission varies with age. Currently
supported models are |
prev_zero |
If |
sprior |
Rates of the exponential prior distributions used to penalise the coefficients of the spline model. The default of 1 should adapt appropriately to the data, but Higher values give stronger smoothing, or lower values give weaker smoothing, if required. This can be a named vector with names This can also be an unnamed vector of three elements, where the first refers to the spline model for incidence, the second for case fatality, the third for remission. If one of the rates (e.g. remission) is not being modelled with a spline, any number can be supplied here and it is just ignored. |
hp_fixed |
A list with one named element for each hyperparameter to be fixed. The value should be either
If the element is either The hyperparameters that can be fixed are
For example, to fix the case fatality smoothness to 1.2, fix the incidence
smoothness to its posterior mode, and estimate all the other hyperparameters,
specify |
nfold_int_guess |
Prior guess at the ratio of case fatality between a high risk (97.5% quantile) and low risk (2.5% quantile) area. |
nfold_int_upper |
Prior upper 95% credible limit for the ratio in average case fatality between a high risk (97.5% quantile) and low risk (2.5% quantile) area. |
nfold_slope_guess, nfold_slope_upper |
This argument and the next argument define the prior distribution for the variance in the random linear effects of age on log case fatality. They define a prior guess and upper 95% credible limit for the ratio of case fatality slopes between a high trend (97.5% quantile) and low risk (2.5% quantile) area. (Note that the model is not exactly linear, since departures from linearity are defined through a spline model. See the Jackson et al. paper for details.). |
mean_int_prior |
Vector of two elements giving the prior mean and standard deviation respectively for the mean random intercept for log case fatality. |
mean_slope_prior |
Vector of two elements giving the prior mean and standard deviation respectively for the mean random slope for log case fatality. |
gender_int_priorsd |
Prior standard deviation for the additive effect of gender on log case fatality |
gender_slope_priorsd |
Prior standard deviation for the additive effect of gender on the linear age slope of log case fatality |
inc_prior |
Vector of two elements giving the Gamma shape and rate parameters of the
prior for the incidence rate. Only used if |
rem_prior |
Vector of two elements giving the Gamma shape and rate parameters of the
prior for the remission rate, used in both |
method |
String indicating the inference method, defaulting to
If If If the optimisation fails to converge (non-zero return code), try increasing the
number of iterations from the default 1000, e.g. If there is an error message that mentions If |
draws |
Number of draws from the normal approximation to the posterior
when using |
iter |
Number of iterations for MCMC sampling, or maximum number of iterations for optimization. |
stan_control |
( |
... |
Further arguments passed to |
A list including the following components
call
: Function call that was used.
fit
: An object containing posterior samples from the fitted model,
in the stanfit
format returned by the stan
function in the rstan package.
method
: Optimisation method that was chosen.
nage
: Number of years of age in the data
narea
: Number of areas (or other grouping variable that defines the hierarchical model).
ng
: Number of genders (or other categorical variable whose effect is treated as
additive with the area effect).
groups
: Names of the areas (or other grouping variable), taken from the factor levels in the
original data.
genders
: Names of the genders (or other categorical variable), taken from the factor levels in the
original data.
dat
: A list containing the input data in the form of numerators
and denominators.
stan_data
: Full list of data supplied to Stan
stan_inits
: Full list of parameter initial values supplied to Stan
trend
: Whether a time trend was modelled
hp_fixed
Values of any hyperparameters that are fixed during the main model fit.
Jackson C, Zapata-Diomedi B, Woodcock J. "Bayesian multistate modelling of incomplete chronic disease burden data" https://arxiv.org/abs/2111.14100
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.