cglm: Fit Generalized Models to cluster correlated data

Description Usage Arguments Details Value References See Also Examples

Description

cglm has the same functionality as glm except that cluster correlated data can be analyzed using the methodology of Chandler and Bate (2007), implemented by the chandwich package.

Usage

1
2
3
4
cglm(formula, family = gaussian, data, weights, subset, na.action,
  start = NULL, etastart, mustart, offset, control = list(...),
  model = TRUE, method = "glm.fit", x = FALSE, y = TRUE,
  contrasts = NULL, cluster = NULL, use_alg = TRUE, ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.

family

a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which glm is called.

weights

an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

start

starting values for the parameters in the linear predictor.

etastart

starting values for the linear predictor.

mustart

starting values for the vector of means.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

control

a list of parameters for controlling the fitting process. For glm.fit this is passed to glm.control.

model

a logical value indicating whether model frame should be included as a component of the returned value.

method

the method to be used in fitting the model. The default method "glm.fit" uses iteratively reweighted least squares (IWLS): the alternative "model.frame" returns the model frame and does no fitting.

User-supplied fitting functions can be supplied either as a function or a character string naming a function, with a function which takes the same arguments as glm.fit. If specified as a character string it is looked up from within the stats namespace.

x

For glm: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.

For glm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n.

y

For glm: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.

For glm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

cluster

A vector or factor indicating from which cluster the respective loglikelihood contributions from loglik originate. Must have the same length as the vector returned by loglik. If cluster is not supplied then it is set inside adjust_loglik under the assumption that each observation forms its own cluster.

use_alg

A logical scalar. Whether or not to provide to adjust_loglik via the arguments alg_deriv and alg_hess functions to evalulate the derivatives (scores) of the loglikelihood with respect to the parameters and the Hessian of the loglikelihood respectively.

...

For glm: arguments to be used to form the default control argument if it is not supplied directly.

For weights: further arguments passed to or from other methods.

Details

Three adjustments to the independence loglikelihood described in Chandler and Bate (2007) are available. The vertical adjustment is described in Section 6 and two horizontal adjustments are described in Sections 3.2 to 3.4. See the descriptions of type and, for the horizontal adjustments, the descriptions of C_cholesky and C_spectral, in the documentation of adjust_loglik.

The adjustments involve first and second derivatives of the loglikelihood with respect to the model parameters. If use_alg = FALSE then these are estimated using jacobian and optimHess.

If the argument cluster is not provided then the adjustment to the loglikelihood is based on a robust sandwich estimator for the covariance matrix of the parameter estimators, under the assumption that each observation forms its own cluster.

If you only wish to adjust parameter covariance matrices, that is, you are not interested in an adjusted loglikelihood, then the sandwich package will provide equivalent results and is more flexible than chandwichGLM. The vignettes in the sandwich package give a very clear account of this general area.

Value

An object of class "chandwich", returned by adjust_loglik. The function summary (i.e. summary.chandwich) can be used to obtain or print a summary of the results. The function conf_intervals can be used to calculate confidence intervals for each parameter. plot.confint can be used to illustrate the an individual confidence interval in a plot. The function conf_region, and its plot method plot.confreg, can be used to plot confidence regions for pairs of parameters.

References

Chandler, R. E. and Bate, S. (2007). Inference for clustered data using the independence loglikelihood. Biometrika, 94(1), 167-183. http://dx.doi.org/10.1093/biomet/asm015

See Also

glm, chandwich, adjust_loglik.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
### Section 5.2 of sandwich vignette at
# https://cran.r-project.org/web/packages/sandwich/vignettes/sandwich-OOP.pdf

## Load the dataset Affairs from the package AER
data("Affairs", package = "AER")

## Fit Binomial regression model
fm_probit <- glm(I(affairs > 0) ~ age + yearsmarried + religiousness +
                 occupation + rating, data = Affairs,
                 family = binomial(link = "probit"))

# Fit the model using clgm(), adjusting the log-likelihood
# The default cluster is used: adjusted SEs viewed as model-robust SEs
c_probit <- cglm(I(affairs > 0) ~ age + yearsmarried + religiousness +
                occupation + rating, data = Affairs,
                family = binomial(link = "probit"))

library(sandwich)
## Check that
# (1) the unadjusted SEs are close to those from chandwichGLM
# (2) the adjusted SEs from sandwich are close to those from chandwichGLM
coeftest(fm_probit)
coeftest(fm_probit, vcov = sandwich)
summary(c_probit)

## Extra facilities provided by chandwich

# Confidence intervals based on the adjusted log-likelihood
cints <- chandwich::conf_intervals(c_probit)
cints
plot(cints, which_par = "occupation")

# Confidence region for a pair of parameters (slow to run)
conf <- chandwich::conf_region(c_probit, which_pars = 2:3)
plot(conf, conf = c(50, 75, 95, 99))

# Adjusted likelihood ratio test to compare nested models
# The p-value is different from, but in this case very similar to,
# the p-value from the Wald test from coeftest() above
chandwich::compare_models(c_probit, fixed_pars = "occupation")

### Poisson GLM

## Example from the help file for stats::glm()
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)

glm.D93 <- glm(counts ~ outcome + treatment, family = poisson)
cglm1 <- cglm(counts ~ outcome + treatment, family = poisson)
cglm2 <- cglm(counts ~ outcome + treatment, family = poisson,
              use_alg = FALSE)
summary(glm.D93)
summary(cglm1)
summary(cglm2)

# Confidence intervals for each parameter
cints <- chandwich::conf_intervals(cglm1)
cints
plot(cints, which_par = 2)

# A confidence region for a pair of parameters
conf <- chandwich::conf_region(cglm1, which_pars = 1:2)
plot(conf, conf = c(50, 75, 95, 99))

### Binomial GLM

# Response vector (0/1 numeric or, in this case, a yes/no factor)
w <- glm(degree ~ religion + gender + age, data = carData::WVS,
         family = binomial)
wc <- cglm(degree ~ religion + gender + age, data = carData::WVS,
           family = binomial)
summary(w)
summary(wc)

# Response (two-column) matrix (number of successes, number of failures)

# [Temporary example from the internet]

cuse <- read.table("http://data.princeton.edu/wws509/datasets/cuse.dat",
                   header=TRUE)
lrfit <- glm(cbind(using, notUsing) ~ age + education + wantsMore,
             family = binomial, data = cuse)

NianchuanChen/chandwichGLM documentation built on May 10, 2019, 3:15 a.m.