posterior_predict.stapreg: Draw from posterior predictive distribution
In rstap: Spatial Temporal Aggregated Predictor Models via 'stan'

Description Usage Arguments Value Note See Also Examples

The posterior predictive distribution is the distribution of the outcome implied by the model after using the observed data to update our beliefs about the unknown parameters in the model. Simulating data from the posterior predictive distribution using the observed predictors is useful for checking the fit of the model. Drawing from the posterior predictive distribution at interesting values of the predictors also lets us visualize how a manipulation of a predictor affects (a function of) the outcome(s). With new observations of predictor variables we can use the posterior predictive distribution to generate predicted outcomes.

## S3 method for class 'stapreg'
posterior_predict(object, newsubjdata = NULL,
  newdistdata = NULL, newtimedata = NULL, draws = NULL,
  subject_ID = NULL, group_ID = NULL, re.form = NULL, fun = NULL,
  seed = NULL, offset = NULL, ...)

`object`	A fitted model object returned by one of the rstap modeling functions. See `stapreg-objects`.
`newsubjdata`	Optionally, a data frame of the subject-specific data in which to look for variables with which to predict. If omitted, the original datasets are used. If `newsubjdata` is provided and any variables were transformed (e.g. rescaled) in the data used to fit the model, then these variables must also be transformed in `newsubjdata`. This only applies if variables were transformed before passing the data to one of the modeling functions and not if transformations were specified inside the model formula. Also see the Note section below for a note about using the `newdata` argument with with binomial models.
`newdistdata`	If newsubjdata is provided a data frame of the subject-distance must also be given for models with a spatial component
`newtimedata`	If newsubjdata is provided, a data frame of the subject-time data must also be given for models with a temporal component
`draws`	An integer indicating the number of draws to return. The default and maximum number of draws is the size of the posterior sample.
`subject_ID`	name of column to join on between subject_data and bef_data
`group_ID`	name of column to join on between `subject_data` and bef_data that uniquely identifies the correlated groups (e.g. visits,schools). Currently only one group (e.g. a measurement ID) can be accounted for in a spatial temporal setting.
`re.form`	If `object` contains `group-level` parameters, a formula indicating which group-level parameters to condition on when making predictions. `re.form` is specified in the same form as for `predict.merMod`. The default, `NULL`, indicates that all estimated group-level parameters are conditioned on. To refrain from conditioning on any group-level parameters, specify `NA` or `~0`. The `newdata` argument may include new levels of the grouping factors that were specified when the model was estimated, in which case the resulting posterior predictions marginalize over the relevant variables.
`fun`	An optional function to apply to the results. `fun` is found by a call to `match.fun` and so can be specified as a function object, a string naming a function, etc.
`seed`	An optional `seed` to use.
`offset`	A vector of offsets. Only required if `newsubjdata` is specified and an `offset` argument was specified when fitting the model.
`...`	optional arguments to pass to pp_args

A draws by nrow(newdata) matrix of simulations from the posterior predictive distribution. Each row of the matrix is a vector of predictions generated using a single draw of the model parameters from the posterior distribution. The returned matrix will also have class "ppd" to indicate it contains draws from the posterior predictive distribution.

For binomial models with a number of trials greater than one (i.e., not Bernoulli models), if newsubjdata is specified then it must include all variables needed for computing the number of binomial trials to use for the predictions. For example if the left-hand side of the model formula is cbind(successes, failures) then both successes and failures must be in newdata. The particular values of successes and failures in newdata do not matter so long as their sum is the desired number of trials. If the left-hand side of the model formula were cbind(successes, trials - successes) then both trials and successes would need to be in newsubjdata, probably with successes set to 0 and trials specifying the number of trials.

Examples of posterior predictive checking can also be found in the rstanarm vignettes and demos.

predictive_error and predictive_interval.

if (!exists("example_model")) example(example_model)
yrep <- posterior_predict(example_model)
table(yrep)

 
# If using new data the all pertinent data must be submitted to the function including subject_ID
# The same distance and time datasets below are used in the original function
# Which will associate the same spatio-temporal exposure to this subject's new fixed covariates.
newdata <- data.frame(subj_ID = 1, measure_ID = 1, centered_income = 0, sex = 0, centered_age = 0) 
pps <- posterior_predict(example_model, newsubjdata = newdata,
                         newdistdata= subset(distdata,subj_ID == 1, measure_ID == 1),
                         newtimedata = subset(timedata, subj_ID == 1, measure_ID == 1),
                         subject_ID = "subj_ID", group_ID = "measure_ID" )