sim_apply | R Documentation |
sim_apply()
applies a function that produces quantities of
interest to each set of simulated coefficients produced by sim()
; these
calculated quantities form the posterior sampling distribution for the
quantities of interest. Capabilities are available for parallelization.
sim_apply(sim, FUN, verbose = TRUE, cl = NULL, ...)
sim |
a |
FUN |
a function to be applied to each set of simulated coefficients. See Details. |
verbose |
|
cl |
a cluster object created by |
... |
optional arguments passed to |
sim_apply()
applies a function, FUN
, to each set of simulated
coefficients, similar to apply()
. This function should return a numeric
vector containing one or more estimated quantities. This should be a named
vector to more easily keep track of the meaning of each estimated quantity.
Care should be taken to ensure that the returned vector is the same length
each time FUN
is called. NA
s are allowed in the output but should be
avoided if possible.
The arguments to FUN
can be specified in a few ways. If FUN
has an
argument called coefs
, a simulated set of coefficients will be passed to
this argument, and FUN
should compute and return a quantity based on the
coefficients (e.g., the difference between two coefficients if one wants to
test whether two coefficients are equal). If FUN
has an argument called
fit
, a model fit object of the same type as the one originally supplied
to sim()
(e.g., an lm
or glm
object) will be passed to this argument,
where the coefficients of the fit object have been replaced by the
simulated coefficients generated by sim()
, and FUN
should compute and
return a quantity based on the model fit (e.g., a computation based on the
output of predict()
). If neither coefs
nor fit
are the names of
arguments to FUN
, the model fit object with replaced coefficients will be
supplied to the first argument of FUN
.
When custom coefficients are supplied to sim()
, i.e., when the coefs
argument to sim()
is not left at its default value, FUN
must accept a
coefs
argument and a warning will be thrown if it accepts a fit
argument. This is because sim_apply()
does not know how to reconstruct
the original fit object with the new coefficients inserted. The quantities
computed by sim_apply()
must therefore be computed directly from the
coefficients.
If FUN
is not supplied at all, the simulated values of the coefficients will be returned in the output with a warning. Set FUN
to NULL
or verbose
to FALSE
to suppress this warning.
sim_apply()
with multiply imputed dataWhen using misim()
and sim_apply()
with multiply imputed data, the
coefficients are supplied to the model fit corresponding to the imputation
identifier associated with each set of coefficients, which means if FUN
uses a dataset extracted from a model (e.g., using insight::get_data()
), it will do so from the model fit in
the corresponding imputation.
The original estimates (see Value below) are computed as the mean of the
estimates across the imputations using the original coefficients averaged
across imputations. That is, first, the coefficients estimated in the
models in the imputed datasets are combined to form a single set of pooled
coefficients; then, for each imputation, the quantities of interest are
computed using the pooled coefficients; finally, the mean of the resulting
estimates across the imputations are taken as the "original" estimates.
Note this procedure is only valid for quantities with symmetric sampling
distributions, which excludes quantities like risk ratios and odds ratios,
but includes log risk ratios and log odds ratios. The desired quantities
can be transformed from their log versions using
transform()
.
A clarify_est
object, which is a matrix with a column for each
estimated quantity and a row for each simulation. The original estimates
(FUN
applied to the original coefficients or model fit object) are stored
in the attribute "original"
. The "sim_hash"
attribute contains the
simulation hash produced by sim()
.
sim()
for generating the simulated coefficients
summary.clarify_est()
for computing p-values and confidence intervals for
the estimated quantities
plot.clarify_est()
for plotting estimated
quantities and their simulated posterior sampling distribution.
data("lalonde", package = "MatchIt")
fit <- lm(re78 ~ treat + age + race + nodegree + re74,
data = lalonde)
coef(fit)
set.seed(123)
s <- sim(fit, n = 500)
# Function to compare predicted values for two units
# using `fit` argument
sim_fun <- function(fit) {
pred1 <- unname(predict(fit, newdata = lalonde[1,]))
pred2 <- unname(predict(fit, newdata = lalonde[2,]))
c(pred1 = pred1, pred2 = pred2)
}
est <- sim_apply(s, sim_fun, verbose = FALSE)
# Add difference between predicted values as
# additional quantity
est <- transform(est, `diff 1-2` = pred1 - pred2)
# Examine estimates and confidence intervals
summary(est)
# Function to compare coefficients using `coefs`
# argument
sim_fun <- function(coefs) {
setNames(coefs["racewhite"] - coefs["racehispan"],
"wh - his")
}
est <- sim_apply(s, sim_fun, verbose = FALSE)
# Examine estimates and confidence intervals
summary(est)
# Another way to do the above:
est <- sim_apply(s, FUN = NULL)
est <- transform(est,
`wh - his` = `racewhite` - `racehispan`)
summary(est, parm = "wh - his")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.