condiNumber: Print matrix condition numbers column-by-column

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/condiNumber.R

Description

This function prints the condition number of a matrix while adding columns one-by-one. This is useful for testing multicollinearity and other numerical problems. It is a generic function with a default method, and a method for maxLik objects.

Usage

1
2
3
4
5
6
condiNumber(x, ...)
## Default S3 method:
condiNumber(x, exact = FALSE, norm = FALSE,
   printLevel=print.level, print.level=1, digits = getOption( "digits" ), ... )
## S3 method for class 'maxLik'
condiNumber(x, ...)

Arguments

x

numeric matrix, condition numbers of which are to be printed

exact

logical, should condition numbers be exact or approximations (see kappa)

norm

logical, whether the columns should be normalised to have unit norm

printLevel

numeric, positive value will output the numbers during the calculations. Useful for interactive work.

print.level

same as ‘printLevel’, for backward compatibility

digits

minimal number of significant digits to print (only relevant if argument print.level is larger than zero).

...

Further arguments to condiNumber.default are currently ignored; further arguments to condiNumber.maxLik are passed to condiNumber.default.

Details

Statistical model often fail because of a high correlation between the explanatory variables in the linear index (multicollinearity) or because the evaluated maximum of a non-linear model is virtually flat. In both cases, the (near) singularity of the related matrices may help to understand the problem.

condiNumber inspects the matrices column-by-column and indicates which variables lead to a jump in the condition number (cause singularity). If the matrix column name does not immediately indicate the problem, one may run an OLS model by estimating this column using all the previous columns as explanatory variables. Those columns that explain almost all the variation in the current one will have very high t-values.

Value

Invisible vector of condition numbers by column. If the start values for maxLik are named, the condition numbers are named accordingly.

Author(s)

Ott Toomet

References

Greene, W. (2012): Econometrics Analysis, 7th edition, p. 130.

See Also

kappa

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
   set.seed(0)
   ## generate a simple nearly multicollinear dataset
   x1 <- runif(100)
   x2 <- runif(100)
   x3 <- x1 + x2 + 0.000001*runif(100) # this is virtually equal to x1 + x2
   x4 <- runif(100)
   y <- x1 + x2 + x3 + x4 + rnorm(100)
   m <- lm(y ~ -1 + x1 + x2 + x3 + x4)
   print(summary(m)) # note the outlandish estimates and standard errors
                     # while R^2 is 0.88. This suggests multicollinearity
   condiNumber(model.matrix(m))   # note the value 'explodes' at x3
   ## we may test the results further:
   print(summary(lm(x3 ~ -1 + x1 + x2)))
   # Note the extremely high t-values and R^2: x3 is (almost) completely
   # explained by x1 and x2

Example output

Loading required package: miscTools

Please cite the 'maxLik' package as:
Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.

If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
https://r-forge.r-project.org/projects/maxlik/

Call:
lm(formula = y ~ -1 + x1 + x2 + x3 + x4)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.01496 -0.70762 -0.02821  0.60782  2.39831 

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
x1 -1.374e+05  3.762e+05  -0.365    0.716
x2 -1.374e+05  3.762e+05  -0.365    0.716
x3  1.374e+05  3.762e+05   0.365    0.716
x4  4.862e-01  3.204e-01   1.518    0.132

Residual standard error: 1.044 on 96 degrees of freedom
Multiple R-squared:  0.8808,	Adjusted R-squared:  0.8759 
F-statistic: 177.4 on 4 and 96 DF,  p-value: < 2.2e-16

x1 	 1 
x2 	 3.413135 
x3 	 14095268 
x4 	 11680350 

Call:
lm(formula = x3 ~ -1 + x1 + x2)

Residuals:
       Min         1Q     Median         3Q        Max 
-5.579e-07 -1.886e-07 -7.440e-09  2.539e-07  6.849e-07 

Coefficients:
    Estimate Std. Error  t value Pr(>|t|)    
x1 1.000e+00  8.418e-08 11879172   <2e-16 ***
x2 1.000e+00  8.480e-08 11792743   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.014e-07 on 98 degrees of freedom
Multiple R-squared:      1,	Adjusted R-squared:      1 
F-statistic: 6.722e+14 on 2 and 98 DF,  p-value: < 2.2e-16

maxLik documentation built on Nov. 25, 2020, 3 a.m.