Plots of Posterior Predictive Checks
Description
This may be used to plot, or save plots of, samples in an object of
class iterquad.ppc
. A variety of plots is provided.
Usage
1 2 3 
Arguments
x 
This required argument is an object of class 
Style 
This optional argument specifies one of several styles of plots, and
defaults to 
Data 
This optional argument accepts the data set used when updating the
model. Data is required only with certain plot styles, including

Rows 
This optional argument is for a vector of row numbers that
specify the records associated by row in the object of class

PDF 
This logical argument indicates whether or not the user wants Laplace's Demon to save the plots as a .pdf file. 
... 
Additional arguments are unused. 
Details
This function can be used to produce a variety of posterior predictive
plots, and the style of plot is selected with the Style
argument. Below are some notes on the styles of plots.
Covariates
requires Data
to be specified, and also
requires that the covariates are named X
or x
. A plot
is produced for each covariate column vector against yhat, and is
appropriate when y is not categorical.
Covariates, Categorical DV
requires Data
to be
specified, and also requires that the covariates are named X
or
x
. A plot is produced for each covariate column vector against
yhat, and is appropriate when y is categorical.
Density
plots show the kernel density of the posterior
predictive distribution for each selected row of y (all are selected
by default). A vertical red line indicates the position of the
observed y along the xaxis. When the vertical red line is close to
the middle of a normal posterior predictive distribution, then there
is little discrepancy between y and the posterior predictive
distribution. When the vertical red line is in the tail of the
distribution, or outside of the kernel density altogether, then
there is a large discrepancy between y and the posterior predictive
distribution. Large discrepancies may be considered outliers, and
moreover suggest that an improvement in model fit should be
considered.
DW
plots the distributions of the DurbinWatson (DW) test
statistics (Durbin and Watson, 1950), both observed
(d.obs as a transparent, black density) and replicated
(d.rep as a transparent, red density). The distribution
of d.obs is estimated from the model, and
d.rep is simulated from normal residuals without
autocorrelation, where the number of simulations are the same as the
observed number. This DW test may be applied to the residuals of
univariate timeseries models (or otherwise ordered residuals) to
detect firstorder autocorrelation. Autocorrelated residuals are not
independent. The DW test is applicable only when the residuals are
normallydistributed, higherorder autocorrelation is not present, and
y is not used also as a lagged predictor. The DW test statistic,
d[obs], occurs in the interval (0,4), where 0 is
perfect positive autocorrelation, 2 is no autocorrelation, and 4 is
perfect negative autocorrelation. The following summary is reported on
the plot: the mean of d[obs] (and its 95% probability
interval), the probability that d[obs] >
d[rep], and whether or not autocorrelation is found. Positive
autocorrelation is reported when the observed process is greater than
the replicated process in 2.5% of the samples, and negative
autocorrelation is reported when the observed process is greater than
the replicated process in 97.5% of the samples.
DW, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These plots compare each columnwise
vector of residuals with a univariate DurbinWatson test, as in
DW
above. This plot is appropriate when Y is multivariate, not
categorical, and residuals are desired to be tested columnwise for
firstorder autocorrelation.
ECDF
(Empirical Cumulative Distribution Function) plots compare
the ECDF of y with three ECDFs of yhat based on the 2.5%, 50%
(median), and 97.5% of its distribution. The ECDF(y) is defined as
the proportion of values less than or equal to y. This plot is
appropriate when y is univariate and at least ordinal.
Fitted
plots compare y with the probability interval of its
replicate, and provide loess smoothing. This plot is appropriate when
y is univariate and not categorical.
Fitted, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exists in the data
set with exactly that name. These plots compare each columnwise
vector of y in Y with its replicates and provide loess smoothing.
This plot is appropriate when Y is multivariate, not categorical, and
desired to be seen columnwise.
Fitted, Multivariate, R
requires Data
to be
specified, and also requires that variable Y
exists in the data
set with exactly that name. These plots compare each rowwise
vector of y in Y with its replicates and provide loess smoothing.
This plot is appropriate when Y is multivariate, not categorical, and
desired to be seen rowwise.
JarqueBera
plots the distributions of the JarqueBera (JB)
test statistics (Jarque and Bera, 1980), both observed
(JB.obs as a transparent black density) and replicated
(JB.rep as a transparent red density). The
distribution of JB.obs is estimated from the model,
and JB.rep is simulated from normal residuals, where
the number of simulations are the same as the observed number. This
JarqueBera test may be applied to the residuals of
univariate models to test for normality. The JarqueBera test does not
test normality per se, but whether or not the distribution has
kurtosis and skewness that match a normal distribution, and is
therefore a test of the moments of a normal distribution. The
following summary is reported on the plot: the mean of
JB[obs] (and its 95% probability interval), the
probability that JB[obs] > JB[rep], and
whether or not normality is indicated. Nonnormality is reported when
the observed process is greater than the replicated process in either
2.5% or 97.5% of the samples.
JarqueBera, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These plots compare each columnwise
vector of residuals with a univariate JarqueBera test, as in
JarqueBera
above. This plot is appropriate when Y is
multivariate, not categorical, and residuals are desired to be
tested columnwise for normality.
Mardia
plots the distributions of the skewness (K3) and
kurtosis (K4) test statistics (Mardia, 1970), both observed
(K3.obs and K4.obs as transparent
black density) and replicated (K3.rep and
K4.rep as transparent red density). The distributions
of K3.obs and K4.obs are estimated
from the model, and both K3.rep K4.rep
are simulated from multivariate normal residuals, where the number of
simulations are the same as the observed number. This Mardia's test
may be applied to the residuals of multivariate models to test for
multivariate normality. Mardia's test does not test for multivariate
normality per se, but whether or not the distribution has kurtosis and
skewness that match a multivariate normal distribution, and is
therefore a test of the moments of a multivariate normal
distribution. The following summary is reported on the plots: the
means of K3[obs] and K4[obs] (and
the associated 95% probability intervals), the probabilities that
K3[obs] > K3[rep] and
K4[obs] > K4[rep], and whether or not
multivariate normality is indicated. Nonnormality is reported when
the observed process is greater than the replicated process in either
2.5% or 97.5% of the samples. Mardia
requires Data
to
be specified, and also requires that variable Y
exist in the
data set with exactly that name. Y
must be a N
x P matrix of N records and P variables. Source
code was modified from the deprecated package QRMlib.
Predictive Quantiles
plots compare y with the predictive
quantile (PQ) of its replicate. This may be useful in looking for
patterns with outliers. Instances outside of the gray lines are
considered outliers.
Residual Density
plots the residual density of the median of
the samples. A vertical red line occurs at zero. This plot may be
useful for inspecting a distributional assumption of residual
variance. This plot is appropriate when y is univariate and
continuous.
Residual Density, Multivariate C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These are columnwise plots of residual
density, given the median of the samples. These plots may be useful
for inspecting a distributional assumption of residual variance.
This plot is appropriate when Y is multivariate, continuous, and
densities are desired to be seen columnwise.
Residual Density, Multivariate R
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These are rowwise plots of residual
density, given the median of the samples. These plots may be useful
for inspecting a distributional assumption of residual variance.
This plot is appropriate when Y is multivariate, continuous, and
densities are desired to be seen rowwise.
Residuals
plots compare y with its residuals. The probability
interval is plotted as a line. This plot is appropriate when y
is univariate.
Residuals, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These are plots of each columnwise
vector of residuals. The probability interval is plotted as a
line. This plot is appropriate when Y is multivariate, not
categorical, and the residuals are desired to be seen columnwise.
Residuals, Multivariate, R
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These are plots of each rowwise
vector of residuals. The probability interval is plotted as a
line. This plot is appropriate when Y is multivariate, not
categorical, and the residuals are desired to be seen rowwise.
SpaceTime by Space
requires Data
to be specified, and
also requires that the following variables exist in the data set with
exactly these names: latitude
, longitude
, S
, and
T
. These spacetime plots compare the S x T matrix Y with the S
x T matrix Yrep, producing one timeseries plot per point s in space,
for a total of S plots. Therefore, these are timeseries plots for
each point s in space across T timeperiods. See TimeSeries
plots below.
SpaceTime by Time
requires Data
to be specified, and
also requires that the following variables exist in the data set with
exactly these names: latitude
, longitude
, S
, and
T
. These spacetime plots compare the S x T matrix Y with the S
x T matrix Yrep, producing one spatial plot per timeperiod, and T
plots will be produced. See Spatial
plots below.
Spatial
requires Data
to be specified, and also requires
that the following variables exist in the data set with exactly these
names: latitude
and longitude
. This spatial plot shows
yrep plotted according to its coordinates, and is colorcoded so that
higher values of yrep become more red, and lower values become more
yellow.
Spatial Uncertainty
requires Data
to be specified, and
also requires that the following variables exist in the data set with
exactly these names: latitude
and longitude
. This
spatial plot shows the probability interval of yrep plotted according
to its coordinates, and is colorcoded so that wider probability
intervals become more red, and lower values become more yellow.
TimeSeries
plots compare y with its replicate, including the
median and probability interval quantiles. This plot is appropriate
when y is univariate and ordered by time.
TimeSeries, Multivariate, C
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These plots compare each columnwise
timeseries in Y with its replicate, including the median and
probability interval quantiles. This plot is appropriate when y is
multivariate and each timeseries is indexed by column in Y.
TimeSeries, Multivariate, R
requires Data
to be
specified, and also requires that variable Y
exist in the data
set with exactly that name. These plots compare each rowwise
timeseries in Y with its replicate, including the median and
probability interval quantiles. This plot is appropriate when y is
multivariate and each timeseries is indexed by row in Y, such as is
typically true in panel models.
Author(s)
Statisticat, LLC. software@bayesianinference.com
References
Durbin, J., and Watson, G.S. (1950). "Testing for Serial Correlation in Least Squares Regression, I." Biometrika, 37, p. 409–428.
Jarque, C.M. and Bera, A.K. (1980). "Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals". Economics Letters, 6(3), p. 255–259.
Mardia, K.V. (1970). "Measures of Multivariate Skewness and Kurtosis with Applications". Biometrika, 57(3), p. 519–530.
See Also
IterativeQuadrature
and
predict.iterquad
.
Examples
1  ### See the IterativeQuadrature function for an example.
