plot_influence: Function performs measures of influence to the submitted OLS...

View source: R/plot_influence.R

plot_influenceR Documentation

Function performs measures of influence to the submitted OLS model

Description

Function offers three different measures of influence including internal/external studentized residuals, and Cook's distance. Both a data frame and a plot of each observations influence on the OLS estimate are returned to help identify data points as possible outliers and/or high leverage.

Usage

plot_influence(
  df = NULL,
  formula_obj = NULL,
  id_col = NULL,
  influence_meas = "cook",
  label_threshold = 3,
  label_color = "red",
  title = NULL,
  subtitle = NULL,
  x_title = "Observation ID",
  y_title = "Influence Value",
  rot_y_tic_label = FALSE,
  x_limits = NULL,
  x_major_breaks = waiver(),
  y_limits = NULL,
  y_major_breaks = waiver(),
  y_minor_breaks = waiver(),
  axis_text_size = 11,
  pts_color = "black",
  pts_fill = "white",
  pts_shape = 21,
  pts_stroke = 1,
  pts_alpha = 1,
  pts_size = 1,
  show_major_grids = TRUE,
  show_minor_grids = TRUE
)

Arguments

df

A data frame with columns for observed response and predictors

formula_obj

A formula object following the rules of stats::lm() construction. For example: y ~ log(a) + b + I(b^2).

id_col

An optional argument that names the column from data frame 'df' providing each observation with a unique identification value. If this argument is NULL then data frame row numbers are used for identification. Unless you have less than 30 observations, it is best to stay with row numbers and modify the 'x_limits' and 'x_major_breaks' arguments.

influence_meas

A string that defines the type of influence measure to apply. Acceptable values include "internal", "external", "dffits", and "cook" for internal/external studentized residuals, difference in fits, and Cook's distance respectively.

label_threshold

A numeric that sets the measurement threshold beyond which observations will be labeled with their id.

label_color

A string that sets the label/point color for observations whose absolute measurement is greater than the 'label_threshold'.

title

A string that sets the plot title.

subtitle

A string that sets the plot subtitle.

x_title

A string that sets the observed response x axis title. If NULL then the x axis title does not appear. The default is "Observation ID".

y_title

A string that sets the fitted response y axis title. If NULL then the y axis title does not appear. The default is "Influence Value".

rot_y_tic_label

A logical which if TRUE rotates the y tic labels 90 degrees for enhanced readability.

x_limits

A numeric 2 element vector that sets the minimum and maximum for the x axis.

x_major_breaks

A numeric vector or function that defines the exact major tic locations along the x axis.

y_limits

A numeric 2 element vector that sets the minimum and maximum for the y axis. Use NA to refer to the existing minimum and maximum.

y_major_breaks

A numeric vector or function that defines the exact major tic locations along the y axis.

y_minor_breaks

A numeric vector or function that defines the exact minor tic locations along the y axis.

axis_text_size

A numeric that sets the font size along the axis'. Default is 11.

pts_color

A string that sets the color of the points.

pts_fill

A string that sets the fill color of the points.

pts_shape

A numeric integer that sets the shape of the points. Typical values are 21 “circle”, 22 “square”, 23 “diamond”, 24 “up triangle”, 25 “down triangle”.

pts_stroke

A numeric that sets the drawing width for a point shape.

pts_alpha

A numeric value that sets the alpha level of pts_color.

pts_size

A numeric value that sets the size of the points.

show_major_grids

A logical that controls the appearance of major grids.

show_minor_grids

A logical that controls the appearance of minor grids.

Value

Function returns a named list with both a data frame of influence measures “influence_df” for each observation along with a scatter plot of observations versus influence measure “influence_plot”.

Examples

library(wooldridge)
library(ggplot2)
library(data.table)
library(RplotterPkg)
library(RregressPkg)

rdchem_dt <- data.table::as.data.table(wooldridge::rdchem) |>
_[, .(rdintens, sales, profmarg)]

formula_obj <- rdintens ~ sales + profmarg
rdchem_influence_lst <- RregressPkg::plot_influence(
  df = rdchem_dt,
  formula_obj = formula_obj,
  influence_meas = "cook",
  label_threshold = 3.0,
  title = "Cook's Distance for Data Point Influence",
  subtitle = "Source: Wooldridge::rdchem",
  rot_y_tic_label = TRUE
)
a_plot <- rdchem_influence_lst$plot


deandevl/RregressPkg documentation built on Feb. 5, 2025, 12:11 p.m.