| plot.gkwreg | R Documentation |
Produces a comprehensive set of diagnostic plots for assessing the adequacy
of a fitted Generalized Kumaraswamy (GKw) regression model (objects of class
"gkwreg"). The function offers flexible plot selection, multiple
residual types, and support for both base R graphics and ggplot2 with
extensive customization options. Designed for thorough model evaluation
including residual analysis, influence diagnostics, and goodness-of-fit
assessment.
## S3 method for class 'gkwreg'
plot(
x,
which = 1:6,
type = c("quantile", "pearson", "deviance"),
family = NULL,
caption = NULL,
main = "",
sub.caption = "",
ask = NULL,
use_ggplot = FALSE,
arrange_plots = FALSE,
nsim = 100,
level = 0.9,
sample_size = NULL,
theme_fn = NULL,
save_diagnostics = FALSE,
...
)
x |
An object of class |
which |
Integer vector specifying which diagnostic plots to produce.
If a subset of the plots is required, specify a subset of the numbers 1:6.
Defaults to
|
type |
Character string indicating the type of residuals to be used for
plotting. Defaults to
|
family |
Character string specifying the distribution family assumptions
to use when calculating residuals and other diagnostics. If |
caption |
Titles for the diagnostic plots. Can be specified in three ways:
Default captions are:
|
main |
Character string to be prepended to individual plot captions
(from the |
sub.caption |
Character string used as a common subtitle positioned
above all plots (especially when multiple plots are arranged). If |
ask |
Logical. If |
use_ggplot |
Logical. If |
arrange_plots |
Logical. Only relevant if |
nsim |
Integer. Number of simulations used to generate the confidence
envelope in the half-normal plot ( |
level |
Numeric. The confidence level (between 0 and 1) for the simulated
envelope in the half-normal plot ( |
sample_size |
Integer or |
theme_fn |
A function. Only relevant if |
save_diagnostics |
Logical. If |
... |
Additional graphical parameters passed to the underlying plotting
functions. For base R graphics, these are standard |
Diagnostic plots are essential for evaluating the assumptions and adequacy of
fitted regression models. This function provides a comprehensive suite of
standard diagnostic tools adapted specifically for gkwreg objects,
which model bounded responses in the (0,1) interval.
The choice of residual type (type) is important and depends on the
diagnostic goal:
Quantile Residuals (type = "quantile"):
Recommended as default for bounded response models. These residuals
are constructed to be approximately N(0,1) under a correctly specified model,
making standard diagnostic tools (QQ-plots, hypothesis tests) directly
applicable. They are particularly effective for detecting model
misspecification in the distributional family or systematic bias.
Pearson Residuals (type = "pearson"):
Standardized residuals that account for the mean-variance relationship.
Useful for assessing whether the assumed variance function is appropriate.
If plots show patterns or non-constant spread, this suggests the variance
model may be misspecified.
Deviance Residuals (type = "deviance"):
Based on the contribution of each observation to the model deviance.
Often have more symmetric distributions than Pearson residuals and are
useful for identifying observations that fit poorly according to the
likelihood criterion.
Plot 1 - Residuals vs. Observation Indices:
Purpose: Detect temporal patterns or autocorrelation
What to look for: Random scatter around zero. Any systematic patterns (trends, cycles, clusters) suggest autocorrelation or omitted time-varying predictors.
Action: If patterns are detected, consider adding time-related predictors or modeling autocorrelation structure.
Plot 2 - Cook's Distance:
Purpose: Identify influential observations affecting coefficient estimates
What to look for: Points exceeding the 4/n reference line have high influence. These observations, if removed, would substantially change model estimates.
Action: Investigate high-influence points for data entry errors, outliers, or legitimately unusual cases. Consider sensitivity analysis.
Plot 3 - Leverage vs. Fitted Values:
Purpose: Identify observations with unusual predictor combinations
What to look for: Points exceeding the 2p/n reference line have high leverage. These are unusual in predictor space but may or may not be influential.
Action: High leverage points deserve scrutiny but are only problematic if they also have large residuals (check Plots 1, 4).
Plot 4 - Residuals vs. Linear Predictor:
Purpose: Detect non-linearity and heteroscedasticity
What to look for: Random scatter around zero with constant spread. Curved patterns suggest non-linear relationships. Funnel shapes indicate heteroscedasticity (non-constant variance).
Action: For non-linearity, add polynomial terms or use splines. For heteroscedasticity, consider alternative link functions or variance models.
Plot 5 - Half-Normal Plot with Envelope:
Purpose: Assess overall distributional adequacy
What to look for: Points should follow the reference line and stay within the simulated envelope. Systematic deviations indicate distributional misspecification. Isolated points outside the envelope suggest outliers.
Action: If many points fall outside the envelope, try a different distributional family or check for outliers and data quality issues.
Plot 6 - Predicted vs. Observed:
Purpose: Overall model fit and prediction accuracy
What to look for: Points should cluster around the 45-degree line. Systematic deviations above or below indicate over- or under-prediction. Large scatter indicates poor predictive performance.
Action: Poor fit suggests missing predictors, incorrect functional form, or inappropriate distributional family.
The new named list interface for caption allows elegant partial
customization:
# OLD WAY (still supported): Must repeat all 6 titles plot(model, caption = c( "Residuals vs. Observation Indices", "Cook's Distance Plot", "MY CUSTOM TITLE FOR PLOT 3", # Only want to change this "Residuals vs. Linear Predictor", "Half-Normal Plot of Residuals", "Predicted vs. Observed Values" )) # NEW WAY: Specify only what changes plot(model, caption = list( "3" = "MY CUSTOM TITLE FOR PLOT 3" )) # Plots 1,2,4,5,6 automatically use defaults # Customize multiple plots plot(model, caption = list( "1" = "Time Series of Residuals", "5" = "Distributional Assessment" ))
The vector interface remains fully supported for backward compatibility.
Several arguments default to NULL, triggering intelligent automatic behavior:
sub.caption = NULL: Automatically generates subtitle from model call
ask = NULL: Automatically prompts only when needed (multiple plots
on interactive device)
theme_fn = NULL: Automatically uses theme_minimal when
use_ggplot = TRUE
You can override these by explicitly setting values:
plot(model, sub.caption = "") # Disable subtitle plot(model, ask = FALSE) # Never prompt plot(model, theme_fn = theme_classic) # Custom theme
For large datasets (n > 10,000):
Use sample_size to work with a random subset (e.g.,
sample_size = 2000)
Reduce nsim for half-normal plot (e.g., nsim = 50)
Use base R graphics (use_ggplot = FALSE) for faster rendering
Skip computationally intensive plots: which = c(1,2,4,6)
(excludes half-normal plot)
Base R Graphics (use_ggplot = FALSE):
Faster rendering, especially for large datasets
No external dependencies beyond base R
Traditional R look and feel
Interactive ask prompting supported
Customize via ... parameters (standard par() settings)
ggplot2 Graphics (use_ggplot = TRUE):
Modern, publication-quality aesthetics
Consistent theming via theme_fn
Grid arrangement support via arrange_plots
Requires ggplot2 package (and optionally gridExtra or
ggpubr for arrangements)
No interactive ask prompting (ggplot limitation)
Invisibly returns either:
The original fitted model object x (if
save_diagnostics = FALSE, the default). This allows piping or
chaining operations.
A list containing diagnostic measures (if save_diagnostics = TRUE),
including:
data: Data frame with observation indices, observed values,
fitted values, residuals, Cook's distance, leverage, and linear predictors
model_info: List with model metadata (n, p, thresholds,
family, type, etc.)
half_normal: Data frame with half-normal plot data and
envelope (if which includes 5)
The function is primarily called for its side effect of generating diagnostic plots. The invisible return allows:
# Silent plotting plot(model) # Or capture for further use diag <- plot(model, save_diagnostics = TRUE) head(diag$data)
Lopes, J. E.
Dunn, P. K., & Smyth, G. K. (1996). Randomized Quantile Residuals. Journal of Computational and Graphical Statistics, 5(3), 236-244. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.1996.10474708")}
Cook, R. D. (1977). Detection of Influential Observation in Linear Regression. Technometrics, 19(1), 15-18. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00401706.1977.10489493")}
Atkinson, A. C. (1985). Plots, Transformations and Regression. Oxford University Press.
gkwreg for fitting Generalized Kumaraswamy regression models
residuals.gkwreg for extracting different types of residuals
fitted.gkwreg for extracting fitted values
summary.gkwreg for model summaries
plot.lm for analogous diagnostics in linear models
ggplot for ggplot2 graphics system
grid.arrange for arranging ggplot2 plots
# EXAMPLE 1: Basic Usage with Default Settings
# Simulate data
library(gkwdist)
set.seed(123)
n <- 200
x1 <- runif(n, -2, 2)
x2 <- rnorm(n)
# True model parameters
alpha_true <- exp(0.7 + 0.3 * x1)
beta_true <- exp(1.2 - 0.2 * x2)
# Generate response
y <- rkw(n, alpha = alpha_true, beta = beta_true)
df <- data.frame(y = y, x1 = x1, x2 = x2)
# Fit model
model <- gkwreg(y ~ x1 | x2, data = df, family = "kw")
# Generate all diagnostic plots with defaults
par(mfrow = c(3, 2))
plot(model, ask = FALSE)
# EXAMPLE 2: Selective Plots with Custom Residual Type
# Focus on key diagnostic plots only
par(mfrow = c(3, 1))
plot(model,
which = c(2, 4, 5), # Cook's distance, Resid vs LinPred, Half-normal
type = "pearson"
) # Use Pearson residuals
# Check for influential points (plot 2) and non-linearity (plot 4)
par(mfrow = c(2, 1))
plot(model,
which = c(2, 4),
type = "deviance"
)
# EXAMPLE 3: Caption Customization - New Named List Interface
# Customize only specific plot titles (RECOMMENDED NEW WAY)
par(mfrow = c(3, 1))
plot(model,
which = c(1, 4, 6),
caption = list(
"1" = "Time Pattern Check",
"4" = "Linearity Assessment",
"6" = "Predictive Accuracy"
)
)
# Customize subtitle and main title
par(mfrow = c(2, 1))
plot(model,
which = c(1, 5),
main = "Model Diagnostics",
sub.caption = "Kumaraswamy Regression - Training Data",
caption = list("5" = "Normality Check with 95% Envelope")
)
# Suppress subtitle entirely
par(mfrow = c(3, 2))
plot(model, sub.caption = "")
# EXAMPLE 4: Backward Compatible Caption (Vector Interface)
# OLD WAY - still fully supported
par(mfrow = c(3, 2))
plot(model,
which = 1:6,
caption = c(
"Residual Pattern Analysis",
"Influence Diagnostics",
"Leverage Assessment",
"Linearity Check",
"Distributional Fit",
"Prediction Quality"
)
)
# EXAMPLE 5: ggplot2 Graphics with Theming
# Modern publication-quality plots
plot(model,
use_ggplot = TRUE,
arrange_plots = TRUE
)
# With custom theme
plot(model,
use_ggplot = TRUE,
theme_fn = ggplot2::theme_bw,
arrange_plots = TRUE
)
# With classic theme and custom colors (via ...)
plot(model,
use_ggplot = TRUE,
theme_fn = ggplot2::theme_classic,
arrange_plots = TRUE
)
# EXAMPLE 6: Arranged Multi-Panel ggplot2 Display
# Requires gridExtra or ggpubr package
plot(model,
which = 1:4,
use_ggplot = TRUE,
arrange_plots = TRUE, # Arrange in grid
theme_fn = ggplot2::theme_minimal
)
# Focus plots in 2x2 grid
plot(model,
which = c(2, 3, 4, 6),
use_ggplot = TRUE,
arrange_plots = TRUE,
caption = list(
"2" = "Influential Cases",
"3" = "High Leverage Points"
)
)
# EXAMPLE 7: Half-Normal Plot Customization
# Higher precision envelope (more simulations)
par(mfrow = c(1, 2))
plot(model,
which = 5,
nsim = 500, # More accurate envelope
level = 0.95
) # 95% confidence level
# Quick envelope for large datasets
plot(model,
which = 5,
nsim = 500, # Faster computation
level = 0.90
)
# EXAMPLE 8: Different Residual Types Comparison
# Compare different residual types
par(mfrow = c(2, 2))
plot(model, which = 4, type = "quantile", main = "Quantile")
plot(model, which = 4, type = "pearson", main = "Pearson")
plot(model, which = 4, type = "deviance", main = "Deviance")
par(mfrow = c(1, 1))
# Quantile residuals for half-normal plot (recommended)
plot(model, which = 5, type = "quantile")
# EXAMPLE 9: Family Comparison Diagnostics
# Compare diagnostics under different distributional assumptions
# Helps assess if alternative family would fit better
par(mfrow = c(2, 2))
plot(model,
which = c(5, 6),
family = "kw", # Original family
main = "Kumaraswamy"
)
plot(model,
which = c(5, 6),
family = "beta", # Alternative family
main = "Beta"
)
par(mfrow = c(1, 1))
# EXAMPLE 10: Large Dataset - Performance Optimization
# Simulate large dataset
set.seed(456)
n_large <- 50000
x1_large <- runif(n_large, -2, 2)
x2_large <- rnorm(n_large)
alpha_large <- exp(0.5 + 0.2 * x1_large)
beta_large <- exp(1.0 - 0.1 * x2_large)
y_large <- rkw(n_large, alpha = alpha_large, beta = beta_large)
df_large <- data.frame(y = y_large, x1 = x1_large, x2 = x2_large)
model_large <- gkwreg(y ~ x1 | x2, data = df_large, family = "kw")
# Optimized plotting for large dataset
par(mfrow = c(2, 2), mar = c(3, 3, 2, 2))
plot(model_large,
which = c(1, 2, 4, 6), # Skip computationally intensive plot 5
sample_size = 2000, # Use random sample of 2000 observations
ask = FALSE
) # Don't prompt
# If half-normal plot needed, reduce simulations
par(mfrow = c(1, 1))
plot(model_large,
which = 5,
sample_size = 1000, # Smaller sample
nsim = 50
) # Fewer simulations
# EXAMPLE 11: Saving Diagnostic Data for Custom Analysis
# Extract diagnostic measures without plotting
par(mfrow = c(1, 1))
diag_data <- plot(model_large,
which = 1:6,
save_diagnostics = TRUE
)
# Examine structure
str(diag_data)
# Access diagnostic measures
head(diag_data$data) # Residuals, Cook's distance, leverage, etc.
# Identify influential observations
influential <- which(diag_data$data$cook_dist > diag_data$model_info$cook_threshold)
cat("Influential observations:", head(influential), "\n")
# High leverage points
high_lev <- which(diag_data$data$leverage > diag_data$model_info$leverage_threshold)
cat("High leverage points:", head(high_lev), "\n")
# Custom diagnostic plot using saved data
plot(diag_data$data$fitted, diag_data$data$resid,
xlab = "Fitted Values", ylab = "Residuals",
main = "Custom Diagnostic Plot",
col = ifelse(diag_data$data$cook_dist >
diag_data$model_info$cook_threshold, "red", "black"),
pch = 16
)
abline(h = 0, col = "gray", lty = 2)
legend("topright", legend = "Influential", col = "red", pch = 16)
# EXAMPLE 12: Interactive Plotting Control
# ask = TRUE Force prompting between plots (useful for presentations)
# Disable prompting (batch processing)
par(mfrow = c(3, 2))
plot(model,
which = 1:6,
ask = FALSE
) # Never prompts
# EXAMPLE 13: Base R Graphics Customization via ...
# Customize point appearance
par(mfrow = c(2, 2))
plot(model,
which = c(1, 4, 6),
pch = 16, # Filled circles
col = "steelblue", # Blue points
cex = 0.8
) # Smaller points
# Multiple customizations
plot(model,
which = 2,
pch = 21, # Circles with border
col = "black", # Border color
bg = "lightblue", # Fill color
cex = 1.2, # Larger points
lwd = 2
) # Thicker lines
# EXAMPLE 14: Comparing Models
# Fit competing models
model_kw <- gkwreg(y ~ x1 | x2, data = df, family = "kw")
model_beta <- gkwreg(y ~ x1 | x2, data = df, family = "beta")
# Compare diagnostics side-by-side
par(mfrow = c(2, 2))
# Kumaraswamy model
plot(model_kw, which = 5, main = "Kumaraswamy - Half-Normal")
plot(model_kw, which = 6, main = "Kumaraswamy - Pred vs Obs")
# Beta model
plot(model_beta, which = 5, main = "Beta - Half-Normal")
plot(model_beta, which = 6, main = "Beta - Pred vs Obs")
par(mfrow = c(1, 1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.