svyvif: Variance inflation factors (VIF) for general linear models...

View source: R/svyvif.R

svyvifR Documentation

Variance inflation factors (VIF) for general linear models fitted with complex survey data

Description

Compute a VIF for fixed effects, general linear regression models fitted with data collected from one- and two-stage complex survey designs.

Usage

svyvif(mobj, X, w, stvar=NULL, clvar=NULL)

Arguments

mobj

model object produced by svyglm. The following families of models are allowed: binomial, gaussian, poisson, quasibinomial, and quasipoisson. Other families allowed by svyglm will produce an error in svyvif.

X

n \times p matrix of real-valued covariates used in fitting a linear regression; n = number of observations, p = number of covariates in model, excluding the intercept. A column of 1's for an intercept should not be included. X should not contain columns for the strata and cluster identifiers (unless those variables are part of the model). No missing values are allowed.

w

n-vector of survey weights used in fitting the model. No missing values are allowed.

stvar

field in mobj that contains the stratum variable in the complex sample design; use stvar = NULL if there are no strata

clvar

field in mobj that contains the cluster variable in the complex sample design; use clvar = NULL if there are no clusters

Details

svyvif computes a variance inflation factor (VIF) appropriate for linear models and some general linear models (GLMs) fitted from complex survey data (see Liao & Valliant 2012). A VIF measures the inflation of a slope estimate caused by nonorthogonality of the predictors over and above what the variance would be with orthogonality (Theil 1971; Belsley, Kuh, and Welsch 1980). The standard VIF equals 1/(1 - R^2_k) where R_k is the multiple correlation of the k^{th} column of X regressed on the remaining columns. The complex sample value of the VIF for a linear model consists of the standard VIF multiplied by two adjustments denoted in the output as zeta and varrho. The VIF for a GLM is similar (Liao 2010, chap. 5). There is no widely agreed-upon cutoff value for identifying high values of a VIF, although 10 is a common suggestion.

Value

p \times 5 matrix with columns:

svy.vif

complex sample VIF

reg.vif

standard VIF, 1/(1 - R^2_k), that omits the factors, zeta and varrho; R^2_k is an R-square from a weighted least squares regression of the k^{th} x on the other x's in the regression

zeta

1st multiplicative adjustment to reg.vif

varrho

2nd multiplicative adjustment to reg.vif

zeta.x.varrho

product of the two adjustments to reg.vif

R.square

R-square in the regression of the k^{th} x on the other x's, including the intercept

Author(s)

Richard Valliant

References

Belsley, D.A., Kuh, E. and Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: Wiley-Interscience.

Liao, D. (2010). Collinearity Diagnostics for Complex Survey Data. PhD thesis, University of Maryland. http://hdl.handle.net/1903/10881.

Liao, D, and Valliant, R. (2012). Variance inflation factors in the analysis of complex survey data. Survey Methodology, 38, 53-62.

Theil, H. (1971). Principles of Econometrics. New York: John Wiley & Sons, Inc.

Lumley, T. (2010). Complex Surveys. New York: John Wiley & Sons.

Lumley, T. (2018). survey: analysis of complex survey samples. R package version 3.34.

See Also

Vmat

Examples

require(survey)
data(nhanes2007)
X1 <- nhanes2007[order(nhanes2007$SDMVSTRA, nhanes2007$SDMVPSU),]
    # eliminate cases with missing values
delete <- which(complete.cases(X1)==FALSE)
X2 <- X1[-delete,]
nhanes.dsgn <- svydesign(ids = ~SDMVPSU,
                         strata = ~SDMVSTRA,
                         weights = ~WTDRD1, nest=TRUE, data=X2)
    # linear model
m1 <- svyglm(BMXWT ~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL
            + DR1TTFAT + DR1TMFAT, design=nhanes.dsgn)
summary(m1)
    # construct X matrix using model.matrix from stats package
X3 <- model.matrix(~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL + DR1TTFAT + DR1TMFAT,
        data = data.frame(X2))
    # remove col of 1's for intercept with X3[,-1]
svyvif(mobj=m1, X=X3[,-1], w = X2$WTDRD1, stvar=NULL, clvar=NULL)

    # Logistic model
X2$obese <- X2$BMXBMI >= 30
nhanes.dsgn <- svydesign(ids = ~SDMVPSU,
                         strata = ~SDMVSTRA,
                         weights = ~WTDRD1, nest=TRUE, data=X2)
m2 <- svyglm(obese ~ RIDAGEYR + as.factor(RIDRETH1) + DR1TKCAL
             + DR1TTFAT + DR1TMFAT, design=nhanes.dsgn, family="quasibinomial")
summary(m2)
svyvif(mobj=m2, X=X3[,-1], w = X2$WTDRD1, stvar = "SDMVSTRA", clvar = "SDMVPSU")

svydiags documentation built on April 28, 2022, 1:07 a.m.

Related to svyvif in svydiags...