find_transformations: Transformations for simple linear regression

Description Usage Arguments Details Author(s) References Examples

View source: R/find_transformations.R

Description

This function takes a simple linear regression model and finds the transformation of x and y that results in the highest R2

Usage

1
find_transformations(M,powers=seq(from=-3,to=3,by=.25),threshold=0.02,...)

Arguments

M

A simple linear regression model fitted with lm

powers

A sequence of powers to try for x and y. By default this ranges from -3 to 3 in steps of 0.25. If 0 is a valid power, then the logarithm is used instead.

threshold

Report all models that have an R2 that is within threshold of the model with the highest R2

...

Additional arguments to plot such as pch and cex.

Details

The relationship between y and x may not be linear. However, some transformation of y may have a linear relationship with some transformation of x. This function considers simple linear regression with x and y raised to powers between -3 and 3 (in 0.25 increments) by default. The function outputs a list of the top models as gauged by R^2 (all models within 0.02 of the highest R^2). Note: there is no guarantee that these "best" transformations are actually good, since a large R^2 can be produced by outliers created during transformations. A plot of the transformation is also provided.

It is exceedingly rare that the "best" transformation is raising x and y to the 1 power (i.e., the original variables). Transformations are typically used only when there are issues in the residuals plots, highly skewed variables, or physical/logical justifications.

Note: if a variable has 0s or negative numbers, only integer transformations are considered.

Author(s)

Adam Petrie

References

Introduction to Regression and Modeling

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  #Straightforward example
  data(BULLDOZER)
	M <- lm(SalePrice~YearMade,data=BULLDOZER)
	find_transformations(M,pch=20,cex=0.3)

  #Results are very misleading since selected models have high R2 due to outliers
  data(MOVIE)
  M <- lm(Total~Weekend,data=MOVIE)
	find_transformations(M,powers=seq(-2,2,by=0.5),threshold=0.05)
	 

regclass documentation built on March 26, 2020, 8:02 p.m.