varinf | R Documentation |
Computes variance inflation factors for terms of a smooth model.
varinf(object, newdata = NULL)
object |
an object of class "sm" output by the |
newdata |
the data used for variance inflation calculation (if |
Let \kappa_j^2
denote the VIF for the j
-th model term.
Values of \kappa_j^2
close to 1 indicate no multicollinearity issues for the j
-th term. Larger values of \kappa_j^2
indicate that \eta_j
has more collinearity with other terms.
Thresholds of \kappa_j^2 > 5
or \kappa_j^2 > 10
are typically recommended for determining if multicollinearity is too much of an issue.
To understand these thresholds, note that
\kappa_j^2 = \frac{1}{1 - R_j^2}
where R_j^2
is the R-squared for the linear model predicting \eta_j
from the remaining model terms.
a named vector containing the variance inflation factors for each effect function (in object$terms
).
Suppose that the function can be written as
\eta = \eta_0 + \eta_1 + \eta_2 + ... + \eta_p
where \eta_0
is a constant (intercept) term, and \eta_j
denotes the j
-th effect function, which is assumed to have mean zero. Note that \eta_j
could be a main or interaction effect function for all j = 1, ..., p
.
Defining the p \times p
matrix C
with entries
C_{jk} = \cos(\eta_j, \eta_k)
where the cosine is defined with respect to the training data, i.e.,
\cos(\eta_j, \eta_k) = \frac{\sum_{i=1}^n \eta_j(x_i) \eta_k(x_i)}{\sqrt{\sum_{i=1}^n \eta_j^2(x_i)} \sqrt{\sum_{i=1}^n \eta_k^2(x_i)}}
The variane inflation factors are the diagonal elements of C^{-1}
, i.e.,
\kappa_j^2 = C^{jj}
where \kappa_j^2
is the VIF for the j
-th term, and C^{jj}
denotes the j
-th diagonal element of the matrix C^{-1}
.
Nathaniel E. Helwig <helwig@umn.edu>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-1-4614-5369-7")}
Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.4135/9781526421036885885")}
See summary.sm
for more thorough summaries of smooth models.
See summary.gsm
for more thorough summaries of generalized smooth models.
########## EXAMPLE 1 ##########
### 4 continuous predictors
### no multicollinearity
# generate data
set.seed(1)
n <- 100
fun <- function(x){
sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4])
}
data <- as.data.frame(replicate(4, runif(n)))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)
# fit model
mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE)
# check vif
varinf(mod)
########## EXAMPLE 2 ##########
### 4 continuous predictors
### multicollinearity
# generate data
set.seed(1)
n <- 100
fun <- function(x){
sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(3*pi*x[,4])
}
data <- as.data.frame(replicate(3, runif(n)))
data <- cbind(data, c(data[1,2], data[2:n,3]))
colnames(data) <- c("x1v", "x2v", "x3v", "x4v")
fx <- fun(data)
y <- fx + rnorm(n)
# check collinearity
cor(data)
cor(sin(3*pi*data[,3]), sin(3*pi*data[,4]))
# fit model
mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE)
# check vif
varinf(mod)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.