View source: R/bage_mod-functions.R
| set_datamod_miscount | R Documentation |
Specify a data model for the outcome in a Poisson model, where the outcome is subject to undercount and overcount.
set_datamod_miscount(mod, prob, rate)
mod |
An object of class |
prob |
The prior for the probability
that a person or event in the target
population will correctly enumerated.
A data frame with a variable
called |
rate |
The prior for the overcoverage rate.
A data frame with a variable
called |
The miscount data model is essentially a combination of the undercount and overcount data models. It assumes that reported outcome is the sum of two quantities:
Units from target population, undercounted People or events belonging to the target population, in which each unit's inclusion probability is less than 1.
Overcount People or events that do not belong to target population, or that are counted more than once.
If, for instance, a census enumerates 91 people from a true population of 100, but also mistakenly enumerates a further 6 people, then
the true value for the outcome variable is 100
the value for the undercounted target population is 91,
the value for the overcount is 6, and
the observed value for the outcome variable is 91 + 6 = 97.
A revised version of mod.
prob argumentThe prob argument specifies a prior
distribution for the probability
that a person or event in the target
population is included in the
reported outcome. prob is a
data frame with a variable called "mean",
a variable called "disp", and, optionally,
one or more 'by' variables.
For instance, a prob of
data.frame(sex = c("Female", "Male"),
mean = c(0.95, 0.92),
disp = c(0.02, 0.015))
implies that the expected value for the inclusion probability is 0.95 for females and 0.92 for males, with slightly more uncertainty for females than for males.
rate argumentThe rate argument specifies a prior
distribution for the overcoverage
rate. rate is a
data frame with a variable called "mean",
a variable called "disp", and, optionally,
one or more 'by' variables.
For instance, a rate of
data.frame(mean = 0.03, disp = 0.1)
implies that the expected value for the overcoverage rate is 0.03, with a dispersion of 0.1. Since no 'by' variables are included, the same mean and dispersion values are applied to all cells.
The model for the observed outcome is
y_i^{\text{obs}} = u_i + v_i
u_i \sim \text{Binomial}(y_i^{\text{true}}, \pi_{g[i]})
v_i \sim \text{Poisson}(\kappa_{h[i]} \gamma_i w_i)
\pi_g \sim \text{Beta}(m_g^{(\pi)} / d_g^{(\pi)}, (1-m_g^{(\pi)}) / d_g^{(\pi)})
\kappa_h \sim \text{Gamma}(1/d_h^{(\kappa)}, 1/(d_h^{(\kappa)} m_h^{(\kappa)}))
where
y_i^{\text{obs}} is the observed outcome for cell i;
y_i^{\text{true}} is the true outcome for cell i;
\gamma_i is the rate for cell i;
w_i is exposure for cell i;
\pi_{g[i]} is the probability that a member of the
target population in cell i is correctly enumerated in that cell;
\kappa_{h[i]} is the overcoverage rate for cell i;
m_g^{(\pi)} is the expected value for \pi_g
(specified via prob);
d_g^{(\pi)} is disperson for \pi_g (specified via prob);
m_h^{(\kappa)} is the expected value for \kappa_h
(specified via rate); and
d_h^{(\kappa)} is disperson for \kappa_h (specified via rate).
mod_pois() Specify a Poisson model
augment() Original data plus estimated values,
including estimates of true value for
the outcome variable
components() Estimated values for
model parameters, including inclusion
probabilities and overcount rates
set_datamod_undercount() An undercount-only
data model
set_datamod_overcount() An overcount-only
data model
datamods All data models implemented in bage
confidential Confidentialization
procedures modeled in bage
Mathematical Details vignette
## specify 'prob' and 'rate'
prob <- data.frame(sex = c("Female", "Male"),
mean = c(0.95, 0.97),
disp = c(0.05, 0.05))
rate <- data.frame(mean = 0.03, disp = 0.15)
## specify model
mod <- mod_pois(divorces ~ age * sex + time,
data = nzl_divorces,
exposure = population) |>
set_datamod_miscount(prob = prob, rate = rate)
mod
## fit model
mod <- mod |>
fit()
mod
## original data, plus imputed values for outcome
mod |>
augment()
## parameter estimates
library(dplyr)
mod |>
components() |>
filter(term == "datamod")
## the data have in fact been confidentialized,
## so we account for that, in addition
## to accounting for undercoverage and
## overcoverage
mod <- mod |>
set_confidential_rr3() |>
fit()
mod
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.