cooks.distance.glmgee: Cook's Distance for Generalized Estimating Equations

View source: R/geeglm.R

cooks.distance.glmgeeR Documentation

Cook's Distance for Generalized Estimating Equations

Description

Produces an approximation, better known as the one-step aproximation, of the Cook's distance, which is aimed to measure the effect on the estimates of the parameters in the linear predictor of deleting each cluster/observation in turn. This function also can produce a cluster/observation-index plot of the Cook's distance for all parameters in the linear predictor or for some subset of them (via the argument coefs).

Usage

## S3 method for class 'glmgee'
cooks.distance(
  model,
  method = c("Preisser-Qaqish", "full"),
  level = c("clusters", "observations"),
  plot.it = FALSE,
  coefs,
  identify,
  varest = c("robust", "df-adjusted", "model", "bias-corrected"),
  ...
)

Arguments

model

an object of class glmgee.

method

an (optional) character string indicating the method of calculation for the one-step approximation. The options are: the one-step approximation described by Preisser and Qaqish (1996) in which the working-correlation matrix is assumed to be known ("Preisser-Qaqish"); and the "authentic" one-step approximation ("full"). As default, method is set to "Preisser-Qaqish".

level

an (optional) character string indicating the level for which the Cook's distance is required. The options are: cluster-level ("clusters") and observation-level ("observations"). As default, level is set to "clusters".

plot.it

an (optional) logical indicating if the plot of Cook's distance is required or just the data matrix in which that plot is based. As default, plot.it is set to FALSE.

coefs

an (optional) character string which (partially) match with the names of some of the parameters in the linear predictor.

identify

an (optional) integer indicating the number of clusters to identify on the plot of Cook's distance. This is only appropriate if plot.it=TRUE.

varest

an (optional) character string indicating the type of estimator which should be used to the variance-covariance matrix of the interest parameters. The available options are: robust sandwich-type estimator ("robust"), degrees-of-freedom-adjusted estimator ("df-adjusted"), bias-corrected estimator ("bias-corrected"), and the model-based or naive estimator ("model"). As default, varest is set to "robust".

...

further arguments passed to or from other methods. If plot.it=TRUE then ... may be used to include graphical parameters to customize the plot. For example, col, pch, cex, main, sub, xlab, ylab.

Details

The Cook's distance consists of the distance between two estimates of the parameters in the linear predictor using a metric based on the (estimate of the) variance-covariance matrix. For the cluster-level, the first one set of estimates is computed from a dataset including all clusters/observations, and the second one is computed from a dataset in which the i-th cluster is excluded. To avoid computational burden, the second set of estimates is replaced by its one-step approximation. See the dfbeta.glmgee documentation.

Value

A matrix as many rows as clusters/observations in the sample and one column with the values of the Cook's distance.

References

Pregibon D. (1981). Logistic regression diagnostics. The Annals of Statistics 9, 705-724.

Preisser J.S., Qaqish B.F. (1996) Deletion diagnostics for generalised estimating equations. Biometrika 83:551–562.

Hammill B.G., Preisser J.S. (2006) A SAS/IML software program for GEE and regression diagnostics. Computational Statistics & Data Analysis 51:1197-1212.

Vanegas L.H., Rondon L.M., Paula G.A. (2023) Generalized Estimating Equations using the new R package glmtoolbox. The R Journal 15:105-133.

Examples

###### Example 1: Effect of ozone-enriched atmosphere on growth of sitka spruces
data(spruces)
mod1 <- size ~ poly(days,4) + treat
fit1 <- glmgee(mod1, id=tree, family=Gamma(log), data=spruces, corstr="AR-M-dependent")

### Cook's distance for all parameters in the linear predictor
cooks.distance(fit1, method="full", plot.it=TRUE, col="red", lty=1, lwd=1, cex=0.8,
               col.lab="blue", col.axis="blue", col.main="black", family="mono")

### Cook's distance for the parameter associated to the variable 'treat'
cooks.distance(fit1, coef="treat", method="full", plot.it=TRUE, col="red", lty=1,
               lwd=1, col.lab="blue", col.axis="blue", col.main="black", cex=0.8)

###### Example 2: Treatment for severe postnatal depression
data(depression)
mod2 <- depressd ~ visit + group
fit2 <- glmgee(mod2, id=subj, family=binomial(logit), corstr="AR-M-dependent", data=depression)

### Cook's distance for all parameters in the linear predictor
cooks.distance(fit2, method="full", plot.it=TRUE, col="red", lty=1, lwd=1, cex=0.8,
               col.lab="blue", col.axis="blue", col.main="black", family="mono")

### Cook's distance for the parameter associated to the variable 'group'
cooks.distance(fit2, coef="group", method="full", plot.it=TRUE, col="red", lty=1,
               lwd=1, col.lab="blue", col.axis="blue", col.main="black", cex=0.8)


glmtoolbox documentation built on Sept. 11, 2024, 7:32 p.m.