robcov: Robust Covariance Matrix Estimates

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/robcov.s

Description

Uses the Huber-White method to adjust the variance-covariance matrix of a fit from maximum likelihood or least squares, to correct for heteroscedasticity and for correlated responses from cluster samples. The method uses the ordinary estimates of regression coefficients and other parameters of the model, but involves correcting the covariance matrix for model misspecification and sampling design. Models currently implemented are models that have a residuals(fit,type="score") function implemented, such as lrm, cph, coxph, and ordinary linear models (ols). The fit must have specified the x=TRUE and y=TRUE options for certain models. Observations in different clusters are assumed to be independent. For the special case where every cluster contains one observation, the corrected covariance matrix returned is the "sandwich" estimator (see Lin and Wei). This is a consistent estimate of the covariance matrix even if the model is misspecified (e.g. heteroscedasticity, underdispersion, wrong covariate form).

For the special case of ols fits, robcov can compute the improved (especially for small samples) Efron estimator that adjusts for natural heterogeneity of residuals (see Long and Ervin (2000) estimator HC3).

Usage

1
robcov(fit, cluster, method=c('huber','efron'))

Arguments

fit

a fit object from the rms series

cluster

a variable indicating groupings. cluster may be any type of vector (factor, character, integer). NAs are not allowed. Unique values of cluster indicate possibly correlated groupings of observations. Note the data used in the fit and stored in fit$x and fit$y may have had observations containing missing values deleted. It is assumed that if any NAs were removed during the original model fitting, an naresid function exists to restore NAs so that the rows of the score matrix coincide with cluster. If cluster is omitted, it defaults to the integers 1,2,...,n to obtain the "sandwich" robust covariance matrix estimate.

method

can set to "efron" for ols fits (only). Default is Huber-White estimator of the covariance matrix.

Value

a new fit object with the same class as the original fit, and with the element orig.var added. orig.var is the covariance matrix of the original fit. Also, the original var component is replaced with the new Huberized estimates. A component clusterInfo is added to contain elements name and n holding the name of the cluster variable and the number of clusters.

Author(s)

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

References

Huber, PJ. Proc Fifth Berkeley Symposium Math Stat 1:221–33, 1967.

White, H. Econometrica 50:1–25, 1982.

Lin, DY, Wei, LJ. JASA 84:1074–8, 1989.

Rogers, W. Stata Technical Bulletin STB-8, p. 15–17, 1992.

Rogers, W. Stata Release 3 Manual, deff, loneway, huber, hreg, hlogit functions.

Long, JS, Ervin, LH. The American Statistician 54:217–224, 2000.

See Also

bootcov, naresid, residuals.cph, http://gforge.se/gmisc interfaces rms to the sandwich package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# In OLS test against more manual approach
set.seed(1)
n <- 15
x1 <- 1:n
x2 <- sample(1:n)
y <- round(x1 + x2 + 8*rnorm(n))
f <- ols(y ~ x1 + x2, x=TRUE, y=TRUE)
vcov(f)
vcov(robcov(f))
X <- f$x
G <- diag(resid(f)^2)
solve(t(X) %*% X) %*% (t(X) %*% G %*% X) %*% solve(t(X) %*% X)

# Duplicate data and adjust for intra-cluster correlation to see that
# the cluster sandwich estimator completely ignored the duplicates
x1 <- c(x1,x1)
x2 <- c(x2,x2)
y  <- c(y, y)
g <- ols(y ~ x1 + x2, x=TRUE, y=TRUE)
vcov(robcov(g, c(1:n, 1:n)))

# A dataset contains a variable number of observations per subject,
# and all observations are laid out in separate rows. The responses
# represent whether or not a given segment of the coronary arteries
# is occluded. Segments of arteries may not operate independently
# in the same patient.  We assume a "working independence model" to
# get estimates of the coefficients, i.e., that estimates assuming
# independence are reasonably efficient.  The job is then to get
# unbiased estimates of variances and covariances of these estimates.

n.subjects <- 30
ages <- rnorm(n.subjects, 50, 15)
sexes  <- factor(sample(c('female','male'), n.subjects, TRUE))
logit <- (ages-50)/5
prob <- plogis(logit)  # true prob not related to sex
id <- sample(1:n.subjects, 300, TRUE) # subjects sampled multiple times
table(table(id))  # frequencies of number of obs/subject
age <- ages[id]
sex <- sexes[id]
# In truth, observations within subject are independent:
y   <- ifelse(runif(300) <= prob[id], 1, 0)
f <- lrm(y ~ lsp(age,50)*sex, x=TRUE, y=TRUE)
g <- robcov(f, id)
diag(g$var)/diag(f$var)
# add ,group=w to re-sample from within each level of w
anova(g)            # cluster-adjusted Wald statistics
# fastbw(g)         # cluster-adjusted backward elimination
plot(Predict(g, age=30:70, sex='female'))  # cluster-adjusted confidence bands
# or use ggplot(...)

# Get design effects based on inflation of the variances when compared
# with bootstrap estimates which ignore clustering
g2 <- robcov(f)
diag(g$var)/diag(g2$var)


# Get design effects based on pooled tests of factors in model
anova(g2)[,1] / anova(g)[,1]




# A dataset contains one observation per subject, but there may be
# heteroscedasticity or other model misspecification. Obtain
# the robust sandwich estimator of the covariance matrix.


# f <- ols(y ~ pol(age,3), x=TRUE, y=TRUE)
# f.adj <- robcov(f)

Example output

Loading required package: Hmisc
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2

Attaching package: 'Hmisc'

The following objects are masked from 'package:base':

    format.pval, units

Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

          Intercept         x1         x2
Intercept 23.120194 -1.2173862 -1.2173862
x1        -1.217386  0.2119827 -0.0598094
x2        -1.217386 -0.0598094  0.2119827
           Intercept          x1          x2
Intercept 29.7404557 -1.93410300 -0.96997909
x1        -1.9341030  0.20567429 -0.01033737
x2        -0.9699791 -0.01033737  0.12185152
            x1          x2
x1  0.08445102 -0.08079501
x2 -0.08079501  0.10215950
           Intercept          x1          x2
Intercept 29.7404557 -1.93410300 -0.96997909
x1        -1.9341030  0.20567429 -0.01033737
x2        -0.9699791 -0.01033737  0.12185152

 2  5  6  7  8  9 10 11 12 13 14 15 
 1  1  1  2  5  4  2  4  4  1  3  2 
      Intercept             age            age'        sex=male  age * sex=male 
      0.5625813       0.4984323       0.4335642       0.7985545       0.7886355 
age' * sex=male 
      0.7658881 
                Wald Statistics          Response: y 

 Factor                                   Chi-Square d.f. P     
 age  (Factor+Higher Order Factors)       107.90     4    <.0001
  All Interactions                          1.42     2    0.4924
  Nonlinear (Factor+Higher Order Factors)   2.57     2    0.2770
 sex  (Factor+Higher Order Factors)         2.01     3    0.5707
  All Interactions                          1.42     2    0.4924
 age * sex  (Factor+Higher Order Factors)   1.42     2    0.4924
  Nonlinear                                 0.65     1    0.4210
  Nonlinear Interaction : f(A,B) vs. AB     0.65     1    0.4210
 TOTAL NONLINEAR                            2.57     2    0.2770
 TOTAL NONLINEAR + INTERACTION              3.33     3    0.3435
 TOTAL                                    110.08     5    <.0001
      Intercept             age            age'        sex=male  age * sex=male 
      0.6718317       0.5959739       0.5373309       0.8914862       0.8809869 
age' * sex=male 
      0.9321911 
      age  (Factor+Higher Order Factors) 
                               0.8684332 
                        All Interactions 
                               0.9256947 
 Nonlinear (Factor+Higher Order Factors) 
                               1.0827507 
      sex  (Factor+Higher Order Factors) 
                               0.6801785 
                        All Interactions 
                               0.9256947 
age * sex  (Factor+Higher Order Factors) 
                               0.9256947 
                               Nonlinear 
                               0.9321911 
   Nonlinear Interaction : f(A,B) vs. AB 
                               0.9321911 
                         TOTAL NONLINEAR 
                               1.0827507 
           TOTAL NONLINEAR + INTERACTION 
                               1.0547007 
                                   TOTAL 
                               0.8513904 

rms documentation built on March 18, 2021, 9:05 a.m.

Related to robcov in rms...