Description Usage Arguments Value Note See Also Examples
View source: R/posterior_predict.R
The posterior predictive distribution is the distribution of the outcome implied by the model after using the observed data to update our beliefs about the unknown parameters in the model. Simulating data from the posterior predictive distribution using the observed predictors is useful for checking the fit of the model. Drawing from the posterior predictive distribution at interesting values of the predictors also lets us visualize how a manipulation of a predictor affects (a function of) the outcome(s). With new observations of predictor variables we can use the posterior predictive distribution to generate predicted outcomes.
1 2 3 4 5 |
object |
A fitted model object returned by one of the
rstap modeling functions. See |
newsubjdata |
Optionally, a data frame of the subject-specific data
in which to look for variables with which to predict.
If omitted, the original datasets are used. If |
newdistdata |
If newsubjdata is provided a data frame of the subject-distance must also be given for models with a spatial component |
newtimedata |
If newsubjdata is provided, a data frame of the subject-time data must also be given for models with a temporal component |
draws |
An integer indicating the number of draws to return. The default and maximum number of draws is the size of the posterior sample. |
subject_ID |
name of column to join on between subject_data and bef_data |
group_ID |
name of column to join on between |
re.form |
If |
fun |
An optional function to apply to the results. |
seed |
An optional |
offset |
A vector of offsets. Only required if |
... |
optional arguments to pass to pp_args |
A draws
by nrow(newdata)
matrix of simulations from the
posterior predictive distribution. Each row of the matrix is a vector of
predictions generated using a single draw of the model parameters from the
posterior distribution. The returned matrix will also have class
"ppd"
to indicate it contains draws from the posterior predictive
distribution.
For binomial models with a number of trials greater than one (i.e., not
Bernoulli models), if newsubjdata
is specified then it must include all
variables needed for computing the number of binomial trials to use for the
predictions. For example if the left-hand side of the model formula is
cbind(successes, failures)
then both successes
and
failures
must be in newdata
. The particular values of
successes
and failures
in newdata
do not matter so
long as their sum is the desired number of trials. If the left-hand side of
the model formula were cbind(successes, trials - successes)
then
both trials
and successes
would need to be in newsubjdata
,
probably with successes
set to 0
and trials
specifying
the number of trials.
Examples of posterior predictive checking can also be found in the rstanarm vignettes and demos.
predictive_error
and predictive_interval
.
1 2 3 4 5 6 7 8 9 10 11 12 13 | if (!exists("example_model")) example(example_model)
yrep <- posterior_predict(example_model)
table(yrep)
# If using new data the all pertinent data must be submitted to the function including subject_ID
# The same distance and time datasets below are used in the original function
# Which will associate the same spatio-temporal exposure to this subject's new fixed covariates.
newdata <- data.frame(subj_ID = 1, measure_ID = 1, centered_income = 0, sex = 0, centered_age = 0)
pps <- posterior_predict(example_model, newsubjdata = newdata,
newdistdata= subset(distdata,subj_ID == 1, measure_ID == 1),
newtimedata = subset(timedata, subj_ID == 1, measure_ID == 1),
subject_ID = "subj_ID", group_ID = "measure_ID" )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.