plotResiduals: Generic res ~ pred scatter plot with spline or quantile...

View source: R/plots.R

plotResidualsR Documentation

Generic res ~ pred scatter plot with spline or quantile regression on top

Description

The function creates a generic residual plot with either spline or quantile regression to highlight patterns in the residuals. Outliers are highlighted in red by default (but see Details).

Usage

plotResiduals(simulationOutput, form = NULL, quantreg = NULL,
  rank = TRUE, asFactor = NULL, smoothScatter = NULL,
  quantiles = c(0.25, 0.5, 0.75), absoluteDeviation = FALSE, ...)

Arguments

simulationOutput

An object, usually a DHARMa object, from which residual values can be extracted. Alternatively, a vector with residuals or a fitted model can be provided, which will then be transformed into a DHARMa object.

form

Optional predictor against which the residuals should be plotted. Default is to used the predicted(simulationOutput).

quantreg

Whether to perform a quantile regression based on testQuantiles or a smooth spline around the mean. Default NULL chooses T for nObs < 2000, and F otherwise.

rank

If T, the values provided in form will be rank transformed. This will usually make patterns easier to spot visually, especially if the distribution of the predictor is skewed. If form is a factor, this has no effect.

asFactor

Should a numeric predictor provided in form be treated as a factor. Default is to choose this for < 10 unique values, as long as enough predictions are available to draw a boxplot.

smoothScatter

if T, a smooth scatter plot will plotted instead of a normal scatter plot. This makes sense when the number of residuals is very large. Default NULL chooses T for nObs > 10000, and F otherwise.

quantiles

For a quantile regression, which quantiles should be plotted. Default is 0.25, 0.5, 0.75.

absoluteDeviation

If T, switch from displaying normal quantile residuals to absolute deviation from the mean expectation of 0.5 (calculated as 2 * abs(res - 0.5)). The purpose of this is to test explicitly for heteroskedasticity, see details.

...

Additional arguments to plot / boxplot.

Details

The function plots residuals against a predictor (by default against the fitted value, extracted from the DHARMa object, or any other predictor).

Outliers are highlighted in red as default (for information on definition and interpretation of outliers, see testOutliers). This can be changed by setting options(DHARMaSignalColor = "red") to a different color. See getOption("DHARMaSignalColor") for the current setting.

To provide a visual aid for detecting deviations from uniformity in the y-direction, the plot function calculates an (optional) quantile regression of the residuals, by default for the 0.25, 0.5 and 0.75 quantiles. Since the residuals should be uniformly distributed for a correctly specified model, the theoretical expectations for these regressions are straight lines at 0.25, 0.5 and 0.75, shown as dashed black lines on the plot. However, even for a perfect model, some deviation from these expectations is to be expected by chance, especially if the sample size is small. The function therefore tests whether the deviation of the fitted quantile regression from the expectation is significant, using testQuantiles. If so, the significant quantile regression is highlighted in red (as default) and a warning is displayed in the plot.

Overdispersion typically manifests itself as Q1 (0.25) deviating towards 0 and Q3 (0.75) deviating towards 1. Heteroskedasticity manifests itself as non-parallel quantile lines. To diagnose heteroskedasticity and overdispersion, it can be helpful to additionally plot the absolute deviation of the residuals from the mean expectation of 0.5, using the option absoluteDeviation = T. In this case, we would again expect Q1-Q3 quantile lines at 0.25, 0.5, 0.75, but greater dispersion (also locally in the case of heteroskedasticity) always manifests itself in deviations towards 1.

The quantile regression can take some time to calculate, especially for larger data sets. For this reason, quantreg = F can be set to generate a smooth spline instead. This is the default for n > 2000.

If form is a factor, a boxplot will be plotted instead of a scatter plot. The distribution for each factor level should be uniformly distributed, so the box should go from 0.25 to 0.75, with the median line at 0.5 (within-group). To test if deviations from those expecations are significant, KS-tests per group and a Levene test for homogeneity of variances is performed. See testCategorical for details.

Value

If quantile tests are performed, the function returns them invisibly.

Note

If nObs > 10000, the scatter plot is replaced by graphics::smoothScatter

#' @note The color for highlighting outliers and quantile lines/splines with significant tests can be changed by setting options(DHARMaSignalColor = "red") to a different color. See getOption("DHARMaSignalColor") for the current setting. This is convenient for a color-blind friendly display, since red and black are difficult for some people to separate.

See Also

plotQQunif, testQuantiles, testOutliers

Examples

testData = createData(sampleSize = 200, family = poisson(), 
                      randomEffectVariance = 1, numGroups = 10)
fittedModel <- glm(observedResponse ~ Environment1, 
                   family = "poisson", data = testData)
simulationOutput <- simulateResiduals(fittedModel = fittedModel)

######### main plotting function #############

# for all functions, quantreg = T will be more
# informative, but slower

plot(simulationOutput, quantreg = FALSE)

#############  Distribution  ######################

plotQQunif(simulationOutput = simulationOutput, 
           testDispersion = FALSE,
           testUniformity = FALSE,
           testOutliers = FALSE)

hist(simulationOutput )

#############  residual plots  ###############

# rank transformation, using a simulationOutput
plotResiduals(simulationOutput, rank = TRUE, quantreg = FALSE)

# smooth scatter plot - usually used for large datasets, default for n > 10000
plotResiduals(simulationOutput, rank = TRUE, quantreg = FALSE, smoothScatter = TRUE)

# residual vs predictors, using explicit values for pred, residual 
plotResiduals(simulationOutput, form = testData$Environment1, 
              quantreg = FALSE)

# if pred is a factor, or if asFactor = TRUE, will produce a boxplot
plotResiduals(simulationOutput, form = testData$group)

# to diagnose overdispersion and heteroskedasticity it can be useful to 
# display residuals as absolute deviation from the expected mean 0.5
plotResiduals(simulationOutput, absoluteDeviation = TRUE, quantreg = FALSE)

# All these options can also be provided to the main plotting function

# If you want to plot summaries per group, use
simulationOutput = recalculateResiduals(simulationOutput, group = testData$group)
plot(simulationOutput, quantreg = FALSE) 
# we see one residual point per RE



DHARMa documentation built on Oct. 18, 2024, 5:09 p.m.