check.resid: Residual Diagnostics
In misty: Miscellaneous Functions 'T. Yanagida'

check.resid

R Documentation

Residual Diagnostics

Description

This function performs residual diagnostics for linear models estimated by using the lm() function and for multilevel and linear mixed-effects models estimated by using the lmer() function from the lme4 package to detect nonlinearity (partial residual or component-plus-residual plots), nonconstant error variance (predicted values vs. residuals plot), and non-normality of residuals (Q-Q plot and histogram with density plot).

Usage

check.resid(model, type = c("linear", "homo", "normal"),
            resid = c("unstand", "stand", "student"), plot = TRUE,
            point.shape = 21, point.fill = "gray80", point.size = 1,
            line1 = TRUE, line2 = TRUE, linetype1 = "solid",
            linetype2 = "dashed", linewidth1 = 1, linewidth2 = 1,
            line.col1 = "#0072B2", line.col2 = "#D55E00", bar.width = NULL,
            bar.n = 30, bar.col = "black", bar.fill = "gray95",
            strip.text.size = 11, label.size = 10, axis.text.size = 10,
            xlim = NULL, ylim = NULL, xbreaks = ggplot2::waiver(),
            ybreaks = ggplot2::waiver(), check = TRUE)

Arguments

`model`	a fitted model of class `"lm"`, `"lmerMod"`, or `"lmerModLmerTest"`.
`type`	a character string specifying the type of the plot, i.e., `"linear"` for partial (component-plus-residual) plots, `"homo"` (default) for predicted values vs. residuals plot, and `"normal"` for Q-Q plot and histogram with a density plot. Note that partial residual or component-plus-residual plots are not available for models with interaction terms
`resid`	a character string specifying the type of residual used for the partial (component-plus-residual) plots or Q-Q plot and histogram, i.e., `"unstand"` for unstandardized residuals `"stand"` for standardized residuals, and `"student"` for studentized residuals. By default, studentized residuals are used for predicted values vs. residuals plot and unstandardized residuals are used for Q-Q plot and histogram. Note that studentized residuals are not available for multilevel and linear mixed-effects models when requesting Q-Q plots and histograms.
`plot`	logical: if `TRUE` (default), a plot is drawn.
`point.shape`	a numeric value for specifying the argument `shape` in the `geom_point` function.
`point.fill`	a character string or numeric value for specifying the argument `fill` the `geom_point` function.
`point.size`	a numeric value for specifying the argument `size` in the `geom_point` function.
`line1`	logical: if `TRUE` (default), regression line is drawn in the partial (component-plus-residual) plots, horizontal line is drawn in the predicted values vs. residuals plot, and t-distribution or normal distribution curve is drawn in the histogram.
`line2`	logical: if `TRUE` (default), Loess smooth line is drawn in the partial (component-plus-residual) plots, loess smooth lines are drawn in the predicted values vs. residuals plot, and density curve is drawn in the histogram.
`linetype1`	a character string or numeric value for specifying the argument `linetype` in the `geom_smooth`, `geom_hline`, or `stat_function` function.
`linetype2`	a character string or numeric value for specifying the argument `linetype` in the `geom_smooth` or `geom_density` function.
`linewidth1`	a numeric value for specifying the argument `linewidth` in the `geom_smooth`, `geom_hline`, or `stat_function` function.
`linewidth2`	a numeric value for specifying the argument `linewidth` in the `geom_smooth` or `geom_density` function.
`line.col1`	a character string or numeric value for specifying the argument `color` in the `geom_smooth`, `geom_hline`, or `stat_function` function.
`line.col2`	a character string or numeric value for specifying the argument `color` in the `geom_smooth` or `geom_density` function.
`bar.width`	a numeric value for specifying the argument `bins` in the `geom_bar` function.
`bar.n`	a numeric value for specifying the argument `bins` in the `geom_bar` function.
`bar.col`	a character string or numeric value for specifying the argument `color` in the `geom_bar` function.
`bar.fill`	a character string or numeric value for specifying the argument `fill` in the `geom_bar` function.
`strip.text.size`	a numeric value for specifying the argument `size` in the `element_text` function of the `strip.text` argument within the `theme` function.
`label.size`	a numeric value for specifying the argument `size` in the `element_text` function of the `axis.title` argument within the `theme` function.
`axis.text.size`	a numeric value for specifying the argument `size` in the `element_text` function of the `axis.text` argument within the `theme` function.
`xlim`	a numeric vector with two elements for specifying the argument `limits` in the `scale_x_continuous` function.
`ylim`	a numeric vector with two elements for specifying the argument `limits` in the `scale_y_continuous` function.
`xbreaks`	a numeric vector for specifying the argument `breaks` in the `scale_x_continuous` function.
`ybreaks`	a numeric vector for specifying the argument `breaks` in the `scale_y_continuous` function.
`check`	logical: if `TRUE` (default), argument specification is checked.

Details

Nonlinearity

The violation of the assumption of linearity implies that the model cannot accurately capture the systematic pattern of the relationship between the outcome and predictor variables. In other words, the specified regression surface does not accurately represent the relationship between the conditional mean values of Y and the Xs. That means the average error E(\varepsilon) is not 0 at every point on the regression surface (Fox, 2015).

In multiple regression, plotting the outcome variable Y against each predictor variable X can be misleading because it does not reflect the partial relationship between Y and X (i.e., statistically controlling for the other Xs), but rather the marginal relationship between Y and X (i.e., ignoring the other Xs). Partial residual plots or component-plus-residual plots should be used to detect nonlinearity in multiple regression. The partial residual for the jth predictor variable is defined as

e_i^{(j)} = b_jX_{ij} + e_i

The linear component of the partial relationship between Y and X_j is added back to the least-squares residuals, which may include an unmodeled nonlinear component. Then, the partial residual e_i^{(j)} is plotted against the predictor variable X_j. Nonlinearity may become apparent when a non-parametric regression smoother is applied.

By default, the function plots each predictor against the partial residuals, and draws the linear regression and the loess smooth line to the partial residual plots.

Nonconstant Error Variance

The violation of the assumption of constant error variance, often referred to as heteroscedasticity, implies that the variance of the outcome variable around the regression surface is not the same at every point on the regression surface (Fox, 2015).

Plotting residuals against the outcome variable Y instead of the predicted values \hat{Y} is not recommended because Y = \hat{Y} + e. Consequently, the linear correlation between the outcome variable Y and the residuals e is \sqrt{1 - R^2} where R is the multiple correlation coefficient. In contrast, plotting residuals against the predicted values \hat{Y} is much easier to examine for evidence of nonconstant error variance as the correlation between \hat{Y} and e is 0. Note that the least-squares residuals generally have unequal variance Var(e_i) = \sigma^2 / (1 - h_i) where h is the leverage of observation i, even if errors have constant variance \sigma^2. The studentized residuals e^*_i, however, have a constant variance under the assumption of the regression model. Residuals are studentized by dividing them by \sigma^2_i(\sqrt{(1 - h_i)} where \sigma^2_i is the estimate of \sigma^2 obtained after deleting the ith observation, and h_i is the leverage of observation i (Meuleman et al, 2015).

By default, the function plots the predicted values against the studentized residuals. It also draws a horizontal line at 0, a loess smooth lines for all residuals as well as separate loess smooth lines for positive and negative residuals.

Non-normality of Residuals

Statistical inference under the violation of the assumption of normally distributed errors is approximately valid in all but small samples. However, the efficiency of least squares is not robust because the least-squares estimator is the most efficient and unbiased estimator only when the errors are normally distributed. For instance, when error distributions have heavy tails, the least-squares estimator becomes much less efficient compared to robust estimators. In addition, error distributions with heavy-tails result in outliers and compromise the interpretation of conditional means because the mean is not an accurate measure of central tendency in a highly skewed distribution. Moreover, a multimodal error distribution suggests the omission of one or more discrete explanatory variables that naturally divide the data into groups (Fox, 2016).

By default, the function plots a Q-Q plot of the unstandardized residuals, and a histogram of the unstandardized residuals and a density plot. Note that studentized residuals follow a t-distribution with n - k - 2 degrees of freedom where n is the sample size and k is the number of predictors. However, the normal and t-distribution are nearly identical unless the sample size is small. Moreover, even if the model is correct, the studentized residuals are not an independent random sample from t_{n - k - 2}. Residuals are correlated with each other depending on the configuration of the predictor values. The correlation is generally negligible unless the sample size is small.

Value

Returns an object of class misty.object, which is a list with following entries:

`call`	function call
`type`	type of analysis
`model`	model specified in `model`
`args`	specification of function arguments
`plotdat`	data frame used for the plot
`plot`	ggplot2 object for plotting the residuals

Note

This function uses a modified copy of the partial() and calc_ranef() function in the remef package by Sven Hohenstein and Reinhold Kliegl (2025) when requesting partial residual plots for linear mixed-effects models.

Author(s)

Takuya Yanagida takuya.yanagida@univie.ac.at

References

Fox, J. (2016). Applied regression analysis and generalized linear models (3rd ed.). Sage Publications, Inc.

Hohenstein, S., & Kliegl, R. (2025). remef: Remove Partial Effects. R package version 1.0.7, https://github.com/hohenstein/remef

Meuleman, B., Loosveldt, G., & Emonds, V. (2015). Regression analysis: Assumptions and diagnostics. In H. Best & C. Wolf (Eds.), The SAGE handbook of regression analysis and causal inference (pp. 83-110). Sage.

Examples

#----------------------------------------------------------------------------
# Linear Model

# Estimate linear model
mod.lm <- lm(Ozone ~ Solar.R + Wind + Temp, data = airquality)

# Example 1a: Partial (component-plus-residual) plots
check.resid(mod.lm, type = "linear")

# Example 1b: Predicted values vs. residuals plot
check.resid(mod.lm, type = "homo")

# Example 1c: Q-Q plot and histogram with density plot
check.resid(mod.lm, type = "normal")

# Example 1d: Extract data and ggplot2 object
object <- check.resid(mod.lm, type = "linear", plot = FALSE)

# Data frame
object$plotdat

# ggplot object
object$plot

## Not run: 
#----------------------------------------------------------------------------
# Multilevel and Linear Mixed-Effects Model

# Estimate two-level mixed-effects model
mod.lmer <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

# Example 2a: Partial (component-plus-residual) plots
check.resid(mod.lmer, type = "linear")

# Example 2b: Predicted values vs. residuals plot
check.resid(mod.lmer, type = "homo")

# Example 2c: Q-Q plot and histogram with density plot
check.resid(mod.lmer, type = "normal")

## End(Not run)

misty documentation built on Aug. 18, 2025, 5:16 p.m.

misty index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

misty
Miscellaneous Functions 'T. Yanagida'

check.resid: Residual Diagnostics
In misty: Miscellaneous Functions 'T. Yanagida'

Residual Diagnostics

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to check.resid in misty...

R Package Documentation

Browse R Packages

We want your feedback!

misty Miscellaneous Functions 'T. Yanagida'

check.resid: Residual Diagnostics In misty: Miscellaneous Functions 'T. Yanagida'

Residual Diagnostics

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to check.resid in misty...

R Package Documentation

Browse R Packages

We want your feedback!

misty
Miscellaneous Functions 'T. Yanagida'

check.resid: Residual Diagnostics
In misty: Miscellaneous Functions 'T. Yanagida'