View source: R/bage_mod-functions.R
| set_datamod_noise | R Documentation |
Specify a data model in which
observed outcome = true outcome + error,
where the error has a symmetric distribution with mean 0.
If the true outcome has a normal distribution, then the error has a normal distribution. If the true outcome has a Poisson distribution, then the error has a symmetric Skellam distribution.
set_datamod_noise(mod, sd)
mod |
An object of class |
sd |
Standard deviation of measurement errors. A single number, or a data frame with 'by' variables. |
The model assumes that the outcome variable
is unbiased. If there is in fact evidence
of biases, then this evidence should be
used to create a de-biased version of the
outcome variable in data, and this de-biased
version should be used by mod_norm() or
mod_pois().
If set_datamod_noise() is used with a Poisson
model, then the dispersion term for
the Poisson rates must be set to zero.
This can be done using set_disp(),
though set_datamod_noise() will also
do so.
A revised version of mod.
The Skellam distribution is confined to integers, but can take positive and negative values.
If
X_1 \sim \text{Poisson}(\mu_1)
X_2 \sim \text{Poisson}(\mu_2)
then
Y = X_1 - X_2
has a \text{Skellam}(\mu_1, \mu_2) distribution.
If \mu_1 = \mu_2, then the distribution
is symmetric.
sd argumentsd can be a single number, in which
case the same standard deviation
is used for all cells.
sd can also be a data frame with a
with a variable called "sd" and
one or more columns with 'by' variables.
For instance, a sd of
data.frame(sex = c("Female", "Male"),
sd = c(330, 240))
implies that measurement errors have standard deviation 330 for females and 240 for males.
The model for the observed outcome is
y_i^{\text{obs}} = y_i^{\text{true}} + \epsilon_i
with
\epsilon_i \sim \text{N}(0, s_{g[i]}^2)
if y_i^{\text{true}} has a normal distribution, and
\epsilon_i \sim \text{Skellam}(0.5 s_{g[i]}^2, 0.5 s_{g[i]}^2)
if y_i^{\text{true}} has a Poisson distribution, where
y_i^{\text{obs}} is the observed outcome for cell i;
y_i^{\text{true}} is the true outcome for cell i;
\epsilon_i is the measurement error for cell i; and
s_{g\lbrack i\rbrack } is the standard deviation of
the measurement error for cell i.
mod_norm() Specify a normal model
mod_pois() Specify a Poisson model
augment() Original data plus estimated values,
including estimates of true value for outcome
datamods Data models implemented in bage
Mathematical Details vignette
## Normal model ------------------------------
## prepare outcome variable
library(dplyr, warn.conflicts = FALSE)
spend <- nld_expenditure |>
mutate(log_spend = log(value + 1))
## specify model
mod <- mod_norm(log_spend ~ age * diag + year,
data = spend,
weights = 1) |>
set_datamod_noise(sd = 0.1)
## fit model
mod <- mod |>
fit()
mod
## create new aggregated diagnositic
## group variable
library(dplyr, warn.conflicts = FALSE)
spend <- spend |>
mutate(diag_ag = case_when(
diag == "Neoplasms" ~ diag,
diag == "Not allocated" ~ diag,
TRUE ~ "Other"
))
## assume size of measurement errors
## varies across these aggregated groups
sd_diag <- data.frame(diag_ag = c("Neoplasms",
"Not allocated",
"Other"),
sd = c(0.05, 0.2, 0.1))
## fit model that uses diagnostic-specific
## standard deviations
mod <- mod_norm(log_spend ~ age * diag + year,
data = spend,
weights = 1) |>
set_datamod_noise(sd = sd_diag)
## Poisson model -----------------------------
mod <- mod_pois(deaths ~ month,
data = usa_deaths,
exposure = 1) |>
set_datamod_noise(sd = 200)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.