varinf | R Documentation |
Computes variance inflation factors for terms of a smooth model.
varinf(object, newdata = NULL)
object |
an object of class "sm" output by the |
newdata |
the data used for variance inflation calculation (if |
Let κ_j^2 denote the VIF for the j-th model term.
Values of κ_j^2 close to 1 indicate no multicollinearity issues for the j-th term. Larger values of κ_j^2 indicate that η_j has more collinearity with other terms.
Thresholds of κ_j^2 > 5 or κ_j^2 > 10 are typically recommended for determining if multicollinearity is too much of an issue.
To understand these thresholds, note that
κ_j^2 = \frac{1}{1 - R_j^2}
where R_j^2 is the R-squared for the linear model predicting η_j from the remaining model terms.
a named vector containing the variance inflation factors for each effect function (in object$terms
).
Suppose that the function can be written as
η = η_0 + η_1 + η_2 + ... + η_p
where η_0 is a constant (intercept) term, and η_j denotes the j-th effect function, which is assumed to have mean zero. Note that η_j could be a main or interaction effect function for all j = 1, ..., p.
Defining the p \times p matrix C with entries
C_{jk} = \cos(η_j, η_k)
where the cosine is defined with respect to the training data, i.e.,
\cos(η_j, η_k) = \frac{∑_{i=1}^n η_j(x_i) η_k(x_i)}{√{∑_{i=1}^n η_j^2(x_i)} √{∑_{i=1}^n η_k^2(x_i)}}
The variane inflation factors are the diagonal elements of C^{-1}, i.e.,
κ_j^2 = C^{jj}
where κ_j^2 is the VIF for the j-th term, and C^{jj} denotes the j-th diagonal element of the matrix C^{-1}.
Nathaniel E. Helwig <helwig@umn.edu>
Gu, C. (2013). Smoothing spline ANOVA models, 2nd edition. New York: Springer. doi: 10.1007/978-1-4614-5369-7
Helwig, N. E. (2020). Multiple and Generalized Nonparametric Regression. In P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, & R. A. Williams (Eds.), SAGE Research Methods Foundations. doi: 10.4135/9781526421036885885
See summary.sm
for more thorough summaries of smooth models.
See summary.gsm
for more thorough summaries of generalized smooth models.
########## EXAMPLE 1 ########## ### 4 continuous predictors ### no multicollinearity # generate data set.seed(1) n <- 100 fun <- function(x){ sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(4*pi*x[,4]) } data <- as.data.frame(replicate(4, runif(n))) colnames(data) <- c("x1v", "x2v", "x3v", "x4v") fx <- fun(data) y <- fx + rnorm(n) # fit model mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE) # check vif varinf(mod) ########## EXAMPLE 2 ########## ### 4 continuous predictors ### multicollinearity # generate data set.seed(1) n <- 100 fun <- function(x){ sin(pi*x[,1]) + sin(2*pi*x[,2]) + sin(3*pi*x[,3]) + sin(3*pi*x[,4]) } data <- as.data.frame(replicate(3, runif(n))) data <- cbind(data, c(data[1,2], data[2:n,3])) colnames(data) <- c("x1v", "x2v", "x3v", "x4v") fx <- fun(data) y <- fx + rnorm(n) # check collinearity cor(data) cor(sin(3*pi*data[,3]), sin(3*pi*data[,4])) # fit model mod <- sm(y ~ x1v + x2v + x3v + x4v, data = data, tprk = FALSE) # check vif varinf(mod)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.