plotenvelope | R Documentation |
Produces diagnostic plots of a fitted model y
, and
adds global envelopes constructed by simulating new residuals, to see how
departures from expected trends compare to what might be expected if the
fitted model were correct. Global envelopes are constructed using the
GET
package (Myllymäki et al 2017) for simultaneous control of error rates over the
whole plot, by simulating new responses from the fitted model then recomputing residuals
(which can be computationally intensive), or alternatively, by simulating residuals directly from the (multivariate) normal distribution
(faster, but not always a smart move). Options for diagnostic plots to construct are a residual vs fits,
a normal quantile plot, or scale-location plot, along the lines of plot.lm
.
plotenvelope(
y,
which = 1:2,
sim.method = "refit",
n.sim = 199,
conf.level = 0.95,
type = "st",
overlay = TRUE,
transform = NULL,
main = c("Residuals vs Fitted Values", "Normal Quantile Plot", "Scale-Location Plot"),
xlab = c("Fitted values", "Theoretical Quantiles", "Fitted Values"),
ylab = c("Residuals", "Residuals", expression(sqrt("|Residuals|"))),
col = NULL,
line.col = if (add.smooth) c("slateblue4", "olivedrab", "slateblue4") else
rep("olivedrab", 3),
envelope.col = adjustcolor(line.col, 0.1),
add.smooth = TRUE,
plot.it = TRUE,
resFunction = NULL,
predFunction = NULL,
fitMin = if (inherits(y, "glm") | inherits(y, "manyglm")) -6 else -Inf,
...
)
y |
is any object that responds to |
which |
a vector specifying the diagnostic plots to construct:
These are the first three options in |
sim.method |
How should residuals be simulated? The default for most objects is |
n.sim |
the number of simulated sets of residuals to be generated, to which the observed residuals will be compared. The default is 199 datasets. |
conf.level |
the confidence level to use in constructing the envelope. |
type |
the type of global envelope to construct, see
|
overlay |
logical. For multivariate data, determines whether residuals from different responses are overlaid on a single plot (default), or plotted separately. |
transform |
a character vector pointing to a function that should be applied to both
axes of the normal quantile plot. The most common use is to set |
main |
the plot title (if a plot is produced). A vector of three titles, one for each plot. If only one value is given that will be used for all plots. |
xlab |
|
ylab |
|
col |
color of points |
line.col |
a vector of length three containing the colors of the lines on the three diagnostic plots. Defaults to "slateblue4" for smoothers and to "olivedrab" otherwise. Because it's cool. |
envelope.col |
color of the global envelope around the expected trend. All data points should always stay within this envelope
(and will for a proportion |
add.smooth |
logical defaults to |
plot.it |
logical. Should the result be plotted? If not, a list of analysis outputs is returned, see Value. |
resFunction |
the function used to compute residuals for all diagnostic plots. Defaults
to |
predFunction |
the function used to compute predicted values in residual vs fits or
scale-location plots. Defaults to |
fitMin |
the minimum fitted value to use in plots, any fitted value less than this will be truncated to
|
... |
further arguments sent through to |
A challenge when interpreting diagnostic plots is understanding the extent to which
deviations from the expected pattern could be due to random noise (sampling variation)
rather than actual assumption violations. This function is intended to assess this,
by simulating multiple realizations of residuals (and fitted values) in situations where
assumptions are satisfied, and plotting a global simulation envelope around these at level conf.level
.
This function can take any fitted model, and construct any of three diagnostic plots, as determined by which
:
Residual vs fits plot (optionally, with a smoother)
Normal quantile plot
Scale-Location plot (optionally, with smoother)
and see if the trend is behaving as expected if the model were true. As long as
the fitted model responds to residuals
and predict
(and when sim.method="refit"
, simulate
and update
) then a simulation envelope
will be constructed for each plot.
Simulation envelopes are global, constructed using the GET-package
, meaning that
(for example) a 95% global envelope on a quantile plot should contain all residuals for 95% of datasets
that satisfy model assumptions. So if any data points lie outside the
quantile plot's envelope we have evidence that assumptions of the fitted model are not satisfied.
The GET-package
was originally constructed to place envelopes around functions, motivated by
the problem of diagnostic testing of spatial processes (Myllymäki et al 2017), but it can equally
well be applied here, by treating the set of residuals (ordered according to the x-axis) as point-wise evaluations of a function.
For residual vs fits and scale-location plots, if do.smooth=TRUE
, global envelopes are constructed for
the smoother, not for the data, hence we are looking to see if the smoother
is wholly contained within the envelope. The smoother is constructed using gam
from the mgcv
package with maximum likelihood estimation (method="ML"
).
Note that the global envelopes only tell you if there is evidence of violation of model assumptions – they do not tell you whether the violations are large enough to worry about. For example, in linear models, violations of normality are usually much less important than violations of linearity or equal variance. And in all cases, modest violations tend to only have modest effects on the validity of inferences, so you need to think about how big observed violations are rather than just thinking about whether or not anything is outside its simulation envelope.
The method used to simulate data for the global envelopes is controlled by sim.method
.
The default method (sim.method="refit"
) uses a parametric bootstrap approach: simulate
new responses from the fitted model, refit the model and recompute residuals and fitted values.
This directly assesses whether trends in observed trends depart from what would be expected if the fitted model
were correct, without any further assumptions. For complex models or large datasets this would however be super-slow.
A fast, more approximate alternative (sim.method="norm"
) is to simulate new (multivariate) normal residuals
repeatedly and use these to assess whether trends in the observed data depart from what would be expected
for independent (multivariate) normal residuals. If residuals are expected to be standard
normal, a more refined check is to simulate from the standard normal using (sim.method="stand.norm"
).
This might for example be useful when diagnosing a model fitted using the mvabund
package (Wang et al. 2012),
since this calculates Dunn-Smyth residuals (Dunn & Smyth 1996), which are approximately standard normal when assumptions are satisfied.
If y
is a glm
with non-Gaussian family then residuals will not be normal and "refit"
is the
only appropriate option, hence other choices will be overridden with a warning reporting that this
has been done.
Note that for Multivariate Linear Models (mlm
), cresiduals
and cpredict
are used to construct residuals and fitted values (respectively) from the full conditional models
(that is, models constructed by regressing each response against all other responses
together with predictors). This is done because full conditionals are diagnostic of joint
distributions, so any violation of multivariate normality is expressed as a violation of
linear model assumptions on full conditionals. Results for all responses are overlaid on a single plot,
future versions of this function may have an overlay option to separately plot each response.
The simulated data and subsequent analysis are also used to obtain a P-value
for the test that model assumptions are correct, for each plot. This tests if sample residuals or their smoothers
are unusually far from the values expected of them if model assumptions were satisfied. For details see
global_envelope_test
.
Up to three diagnostic plots with simulation envelopes are returned, and additionally a list of three objects used in plotting, for plots 1-3 respectively. Each is a list with five components:
x
X-values used for the envelope. In plots 1 and 3 this is the fitted values, or if add.smooth=TRUE
, this is 500 equally spaced points covering
the range of fitted values. For plot 2, this is sorted normal quantiles corresponding to observed data.
y
The Y-values used for the envelope. In plots 1 and 3 this is the residuals, or with add.smooth=TRUE
, this is the values of the smoother corresponding to x
. For plot 2,
this is the sorted residuals.
lo
The lower bound on the global envelope, for each value of x
.
hi
The upper bound on the global envelope, for each value of x
.
p.value
A P-value for the test that observed smoother or data are consistent with what
would be expected if the fitted model were correct, computed in global_envelope_test
.
David Warton <david.warton@unsw.edu.au>
Dunn, P. K., & Smyth, G. K. (1996), Randomized quantile residuals. J. Comp. Graphical Stat. 5, 236-244. doi:10.1080/10618600.1996.10474708
Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017), Global envelope tests for spatial processes. J. R. Stat. Soc. B, 79: 381-404. doi:10.1111/rssb.12172
Wang, Y. I., Naumann, U., Wright, S. T., & Warton, D. I. (2012), mvabund
- an R
package for model-based analysis of multivariate abundance data. Methods in Ecology and Evolution, 3, 471-474. doi:10.1111/j.2041-210X.2012.00190.x
Warton DI (2022) Eco-Stats - Data Analysis in Ecology, from t-tests to multivariate abundances. Springer, ISBN 978-3-030-88442-0
cpredict
, cresiduals
, qqenvelope
# fit a Poisson regression to random data:
y = rpois(50,lambda=1)
x = 1:50
rpois_glm = glm(y~x,family=poisson())
plotenvelope(rpois_glm,n.sim=59)
# Fit a multivariate linear model to the iris dataset:
data(iris)
Y = with(iris, cbind(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width))
iris_mlm=lm(Y~Species,data=iris)
# check normality assumption:
plotenvelope(iris_mlm,n.sim=59,which=2)
# A few more plots, with envelopes around data not smoothers:
## Not run: plotenvelope(iris_mlm, which=1:3, add.smooth=FALSE)
# Note minor violation on the scale/location plot.
## End(Not run)
# Repeat but with smoothers and with separate plots for each response and
# a multiple testing adjustment to sim envelopes:
## Not run: plotenvelope(iris_mlm, which=1:3, overlay=FALSE, conf.level=1-0.05/4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.