# varinf: Variance Inflation Factors In npreg: Nonparametric Regression via Smoothing Splines

 varinf R Documentation

## Variance Inflation Factors

### Description

Computes variance inflation factors for terms of a smooth model.

### Usage

varinf(object, newdata = NULL)


### Arguments

 object an object of class "sm" output by the sm function or an object of class "gsm" output by the gsm function. newdata the data used for variance inflation calculation (if NULL training data are used).

### Details

Let κ_j^2 denote the VIF for the j-th model term.

Values of κ_j^2 close to 1 indicate no multicollinearity issues for the j-th term. Larger values of κ_j^2 indicate that η_j has more collinearity with other terms.

Thresholds of κ_j^2 > 5 or κ_j^2 > 10 are typically recommended for determining if multicollinearity is too much of an issue.

To understand these thresholds, note that

κ_j^2 = \frac{1}{1 - R_j^2}

where R_j^2 is the R-squared for the linear model predicting η_j from the remaining model terms.

### Value

a named vector containing the variance inflation factors for each effect function (in object\$terms).

### Note

Suppose that the function can be written as

η = η_0 + η_1 + η_2 + ... + η_p

where η_0 is a constant (intercept) term, and η_j denotes the j-th effect function, which is assumed to have mean zero. Note that η_j could be a main or interaction effect function for all j = 1, ..., p.

Defining the p \times p matrix C with entries

C_{jk} = \cos(η_j, η_k)

where the cosine is defined with respect to the training data, i.e.,

\cos(η_j, η_k) = \frac{∑_{i=1}^n η_j(x_i) η_k(x_i)}{√{∑_{i=1}^n η_j^2(x_i)} √{∑_{i=1}^n η_k^2(x_i)}}

The variane inflation factors are the diagonal elements of C^{-1}, i.e.,

κ_j^2 = C^{jj}

where κ_j^2 is the VIF for the j-th term, and C^{jj} denotes the j-th diagonal element of the matrix C^{-1}.

### Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

### References

Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. doi: 10.1007/978-1-4614-5369-7

Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi: 10.4135/9781526421036885885

See summary.sm for more thorough summaries of smooth models.

See summary.gsm for more thorough summaries of generalized smooth models.

### Examples

##########   EXAMPLE 1   ##########
### 4 continuous predictors
### no multicollinearity

# generate data
set.seed(1)
n <- 100
fun <- function(x){
sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4])
}
data <- as.data.frame(replicate(4, runif(n)))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# fit model
mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE)

# check vif
varinf(mod)

##########   EXAMPLE 2   ##########
### 4 continuous predictors
### multicollinearity

# generate data
set.seed(1)
n <- 100
fun <- function(x){
sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(3*pi*x[,4])
}
data <- as.data.frame(replicate(3, runif(n)))
data <- cbind(data, c(data[1,2], data[2:n,3]))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)

# check collinearity
cor(data)
cor(sin(3*pi*data[,3]), sin(3*pi*data[,4]))

# fit model
mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE)

# check vif
varinf(mod)



npreg documentation built on July 21, 2022, 1:06 a.m.