eda_lm: Regression plot (with optional LOESS fit)

View source: R/eda_lm.R

eda_lmR Documentation

Regression plot (with optional LOESS fit)

Description

eda_lm generates a scatter plot with a fitted regression line. A loess line can also be added to the plot for model comparison. The axes are scaled such that their respective standard deviations match axes unit length.

Usage

eda_lm(
  dat,
  x,
  y,
  xlab = NULL,
  ylab = NULL,
  px = 1,
  py = 1,
  tukey = FALSE,
  show.par = TRUE,
  reg = TRUE,
  poly = 1,
  robust = FALSE,
  w = NULL,
  sd = TRUE,
  mean.l = TRUE,
  asp = TRUE,
  grey = 0.6,
  pch = 21,
  p.col = "grey50",
  p.fill = "grey80",
  size = 0.8,
  alpha = 0.8,
  q = FALSE,
  q.val = c(0.16, 0.84),
  q.type = 5,
  loe = FALSE,
  lm.col = rgb(1, 0.5, 0.5, 0.8),
  loe.col = rgb(0.3, 0.3, 1, 1),
  stats = FALSE,
  stat.size = 0.8,
  loess.d = list(family = "symmetric", span = 0.7, degree = 1),
  rlm.d = list(psi = "psi.bisquare"),
  ...
)

Arguments

dat

Data frame.

x

Column assigned to the x axis.

y

Column assigned to the y axis.

xlab

X label for output plot.

ylab

Y label for output plot.

px

Power transformation to apply to the x-variable.

py

Power transformation to apply to the y-variable.

tukey

Boolean determining if a Tukey transformation should be adopted (FALSE adopts a Box-Cox transformation).

show.par

Boolean determining if power transformation should be displayed in the plot.

reg

Boolean indicating whether a least squares regression line should be plotted.

poly

Polynomial order.

robust

Boolean indicating if robust regression should be used.

w

Weight to pass to regression model.

sd

Boolean determining if standard deviation lines should be plotted.

mean.l

Boolean determining if the x and y mean lines should be added to the plot.

asp

Boolean determining if the plot aspect ratio should equal the ratio of the x and y standard deviations. A value of FALSE defaults to the base plot's default aspect ratio. A value of TRUE uses the aspect ratio sd(x)/sd(y).

grey

Grey level to apply to plot elements (0 to 1 with 1 = black).

pch

Point symbol type.

p.col

Color for point symbol.

p.fill

Point fill color passed to bg (Only used for pch ranging from 21-25).

size

Point size (0-1).

alpha

Point transparency (0 = transparent, 1 = opaque). Only applicable if rgb() is not used to define point colors.

q

Boolean determining if grey quantile boxes should be plotted.

q.val

F-values to use to define the quantile box parameters. Defaults to mid 68 are used to generate the box.

q.type

Quantile type. Defaults to 5 (Cleveland's f-quantile definition).

loe

Boolean indicating if a loess curve should be fitted.

lm.col

Regression line color.

loe.col

LOESS curve color.

stats

Boolean indicating if regression summary statistics should be displayed.

stat.size

Text size of stats output in plot.

loess.d

A list of arguments passed to the loess.smooth function. A robust loess is used by default.

rlm.d

A list of arguments passed to the MASS::rlm function.

...

Not used.

Details

The function will plot a regression line and, if requested, a loess fit. The function adopts the least squares fitting technique by default. It defaults to a first order polynomial fit. The polynomial order can be specified via the poly argument.

The plot displays the +/- 1 standard deviations as dashed lines. In theory, if both x and y values follow a perfectly Normal distribution, roughly 68 percent of the points should fall in between these lines.

The true 68 percent of values can be displayed as grey rectangles by setting q=TRUE. It uses the quantile function to compute the upper and lower bounds defining the inner 68 percent of values. If the data follow a Normal distribution, the grey rectangle edges should coincide with the +/- 1SD dashed lines.
If you wish to show the interquartile ranges (IQR) instead of the inner 68 percent of values, simply set q.val = c(0.25,0.75).

The plot has the option to re-express the values via the px and py arguments. But note that if the re-expression produces NaN values (such as if a negative value is logged) those points will be removed from the plot. This will result in fewer observations being plotted. If observations are removed as result of a re-expression a warning message will be displayed in the console.
There-expression powers are shown in the upper right side of the plot. To suppress the display of the re-expressions set show.par = FALSE.

If the robust argument is set to TRUE, MASS's built-in robust fitting model, rlm is used to fit the regression line to the data. rlm arguments can be passed as a list via the rlm.d argument.

Value

Returns residuals, intercept and coefficient(s) from an OLS fit if reg = TRUE. Returns NULL otherwise.

  • residuals: Regression model residuals

  • a: Intercept

  • b: Polynomial coefficient(s)

See Also

plot and loess.smooth functions

Examples


# Add a regular (OLS) regression model and loess smooth to the data
eda_lm(mtcars, wt, mpg, loe = TRUE)

# Add the inner 68% quantile to compare the true 68% of data to the SD
eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE)

# Show the IQR box
eda_lm(mtcars, wt, mpg, loe = TRUE, q = TRUE, sd = FALSE, q.val = c(0.25,0.75))

# Fit an OLS to the Income for Female vs Male
df2 <- read.csv("https://mgimond.github.io/ES218/Data/Income_education.csv")
eda_lm(df2, x=B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            loe = TRUE)

# Add the inner 68% quantile to compare the true 68% of data to the SD
eda_lm(df2, x = B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            q = TRUE)

# Apply a transformation to x and y axes: x -> 1/3 and y -> log
eda_lm(df2, x = B20004013, y = B20004007, xlab = "Female", ylab = "Male",
            px = 1/3, py = 0, q = TRUE, loe = TRUE)

# Fit a second order polynomial
eda_lm(mtcars, hp, mpg, poly = 2)

# Fit a robust regression model
eda_lm(mtcars, hp, mpg, robust = TRUE)

mgimond/tukeyedar documentation built on Sept. 24, 2024, 4:46 p.m.