View source: R/plot_influence.R
plot_influence | R Documentation |
Function offers three different measures of influence including internal/external studentized residuals, and Cook's distance. Both a data frame and a plot of each observations influence on the OLS estimate are returned to help identify data points as possible outliers and/or high leverage.
plot_influence(
df = NULL,
formula_obj = NULL,
id_col = NULL,
influence_meas = "cook",
label_threshold = 3,
label_color = "red",
title = NULL,
subtitle = NULL,
x_title = "Observation ID",
y_title = "Influence Value",
rot_y_tic_label = FALSE,
x_limits = NULL,
x_major_breaks = waiver(),
y_limits = NULL,
y_major_breaks = waiver(),
y_minor_breaks = waiver(),
axis_text_size = 11,
pts_color = "black",
pts_fill = "white",
pts_shape = 21,
pts_stroke = 1,
pts_alpha = 1,
pts_size = 1,
show_major_grids = TRUE,
show_minor_grids = TRUE
)
df |
A data frame with columns for observed response and predictors |
formula_obj |
A formula object following the rules of |
id_col |
An optional argument that names the column from data frame 'df' providing
each observation with a unique identification value. If this argument is |
influence_meas |
A string that defines the type of influence measure to apply. Acceptable values include "internal", "external", "dffits", and "cook" for internal/external studentized residuals, difference in fits, and Cook's distance respectively. |
label_threshold |
A numeric that sets the measurement threshold beyond which observations will be labeled with their id. |
label_color |
A string that sets the label/point color for observations whose absolute measurement is greater than the 'label_threshold'. |
title |
A string that sets the plot title. |
subtitle |
A string that sets the plot subtitle. |
x_title |
A string that sets the observed response x axis title. If |
y_title |
A string that sets the fitted response y axis title. If |
rot_y_tic_label |
A logical which if |
x_limits |
A numeric 2 element vector that sets the minimum and maximum for the x axis. |
x_major_breaks |
A numeric vector or function that defines the exact major tic locations along the x axis. |
y_limits |
A numeric 2 element vector that sets the minimum and maximum for the y axis.
Use |
y_major_breaks |
A numeric vector or function that defines the exact major tic locations along the y axis. |
y_minor_breaks |
A numeric vector or function that defines the exact minor tic locations along the y axis. |
axis_text_size |
A numeric that sets the font size along the axis'. Default is 11. |
pts_color |
A string that sets the color of the points. |
pts_fill |
A string that sets the fill color of the points. |
pts_shape |
A numeric integer that sets the shape of the points. Typical values are 21 “circle”, 22 “square”, 23 “diamond”, 24 “up triangle”, 25 “down triangle”. |
pts_stroke |
A numeric that sets the drawing width for a point shape. |
pts_alpha |
A numeric value that sets the alpha level of |
pts_size |
A numeric value that sets the size of the points. |
show_major_grids |
A logical that controls the appearance of major grids. |
show_minor_grids |
A logical that controls the appearance of minor grids. |
Function returns a named list with both a data frame of influence measures “influence_df” for each observation along with a scatter plot of observations versus influence measure “influence_plot”.
library(wooldridge)
library(ggplot2)
library(data.table)
library(RplotterPkg)
library(RregressPkg)
rdchem_dt <- data.table::as.data.table(wooldridge::rdchem) |>
_[, .(rdintens, sales, profmarg)]
formula_obj <- rdintens ~ sales + profmarg
rdchem_influence_lst <- RregressPkg::plot_influence(
df = rdchem_dt,
formula_obj = formula_obj,
influence_meas = "cook",
label_threshold = 3.0,
title = "Cook's Distance for Data Point Influence",
subtitle = "Source: Wooldridge::rdchem",
rot_y_tic_label = TRUE
)
a_plot <- rdchem_influence_lst$plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.