vitals: Quickly calculate the rank, condition, positive definiteness,...

Description Usage Arguments Value References Examples

View source: R/vitals.R


This lets you quickly and easily calculate a few important things to know about a model matrix.

The first is the column rank of a matrix. If this returns a value other than the number of columns full.rank will display FALSE.

The second is the ratio the maximum to minimum singular value of the model matrix, which gives the conditioning number, C. When a matrix has a bad condition number that indicates the model is ill-conditioned. This means a linear regression will be extremely sensitive to even the smallest errors and give untrustworthy results. In the worst case, if C is infinite the model matrix is singular and shrinkage methods must be used to obtain a solution at all. Alternatively, the conditioning number can be defined as the ratio of the minimum to maximum singular value of the model matrix, Cinv. Here, larger is better.

A rough estimate of how many digits the estimated y values will have can be manually calculated via the formula mean(D) - log(C), where mean(D) is the average number of decimal digits in the entries of the vector y.
The third 'vital sign' is a check for positive definiteness of the covariance matrix, cov(Matrix). If the covariance matrix has zero or negative eigenvalues, it will fail the positive definiteness check.

Also caculated are the variance inflation factors. These indicate the factor by which the standard error is inflated for a variable due to correlation with other variables. A common threshold for a VIF being bad is 5. Other commonly used thresholds are 3, 6, 8, and, 10. However, these should be interpreted with some degree of caution. If the model has a sufficiently low conditioning number and passes all of the other checks a decent fit may still be obtained (O’Brien, 2007).

A sufficiently bad conditioning number, a lack of positive-definiteness, or lack of full rank can, in the worst case, mean the model matrix may be non-invertible and OLS or GLMs will not work. Shrinkage methods will have to be used to fit the model. Fortunately, this package provides a great number of regularized regression models. Otherwise, sufficiently bad values can result in the model being "ill-posed", which means one of the three following conditions of a well-posed mathematical problem are violated:

1. A solution exists
2. A unique solution exists
3. The output of a function changes continuously with the input(s)


vitals(formula = NULL, data = NULL, matrix = NULL, y = NULL,
  family = "gaussian")



a model formula


a data frame


if not using formula, you can provide a model matrix. The intercept column will be removed if detected.


the vector of y values if providing the model matrix


The glm family being used (required for the VIFs calculation). One of "gaussian" (the default), "binomial", "Gamma", "inverse.gaussian", "poisson", "quasi", "quasibinomial", or "quasipoisson"


a list


O’brien, R.M.(2007) A Caution Regarding Rules of Thumb for Variance Inflation Factors Qual Quant 41: 673.


vitals(matrix = model.matrix(Sepal.Width ~ ., iris)[,-1], y = iris$Sepal.Width)
vitals(Sepal.Width ~ ., iris)

abnormally-distributed/Bayezilla documentation built on Oct. 31, 2019, 1:57 a.m.