SLM: Simple linear regression model and multicollinearity

View source: R/SLM.R

SLMR Documentation

Simple linear regression model and multicollinearity

Description

The function analyzes the presence of near worrying multicollinearity in the Simple Linear Model (SLM).

Usage

SLM(X, dummy = FALSE)

Arguments

X

A numeric design matrix that should contain two independent variables (intercept included).

dummy

A logical value that indicates if there are dummy variables in the design matrix X. By default dummy=FALSE.

Details

The analysis of the presence of near worrying multicolllinearity in the SLM has been systematically ignored in some existing statistical softwares. However, it is possible to find worrying non essential multicollinearity in the SLM. In this case, the linear relation will be given by a second variable of X with very little variablity. For this reason, the coeficient of variation is calculated when the variable is quantitative and the proportion of ones if the variable is non-quantitative.

Value

If dummy=TRUE:

Prop

Proportion of ones in the dummy variable.

CN

Condition Number of X.

If dummy=FALSE:

CV

Coeficient of variation of the second variable in X.

VIF

Variance Inflation Factor.

CN

Condition Number of X.

ki

Stewart's index of X.

Note

The VIF only detects the near essential multicollinearity and for this reason it is not appropriate to detect multicollinearity in the SLM. Indeed, in this case, the VIF will be always equal to 1.

Author(s)

R. Salmerón (romansg@ugr.es) and C. García (cbgarcia@ugr.es).

References

R. Salmerón, C. B. García and J. García (2018). Variance Inflation Factor and Condition Number in multiple linear regression. Journal of Statistical Computation and Simulation, 88 (12), 2365-2384.

L. R. Klein and A.S. Goldberger (1964). An economic model of the United States, 1929-1952. North Holland Publishing Company, Amsterdan.

H. Theil (1971). Principles of Econometrics. John Wiley & Sons, New York.

See Also

PROPs, CV, CN, ki.

Examples

# Henri Theil's textile consumption data modified
data(theil)
head(theil)
cte = array(1,length(theil[,2]))
theil.X = cbind(cte,theil[,-(1:2)])
SLM(theil.X, TRUE)

# Klein and Goldberger data on consumption and wage income
data(KG)
head(KG)
cte = array(1,length(KG[,1]))
KG.X = cbind(cte,KG[,-1])
SLM(KG.X)

# random
x1 = array(1,25)
x2 = sample(1:50,25)
x = cbind(x1,x2)
head(x)
SLM(x)

# random
x1 = array(1,25)
x2 = rnorm(25,100,1)
x = cbind(x1,x2)
head(x)
SLM(x)

# random
x1 = array(1,25)
x2 = sample(cbind(array(1,25),array(0,25)),25)
x = cbind(x1,x2)
head(x)
SLM(x, TRUE)

multiColl documentation built on July 21, 2022, 9:06 a.m.