drgee: Doubly Robust Generalized Estimating Equations
In drgee: Doubly Robust Generalized Estimating Equations

View source: R/drgee.R

drgee

R Documentation

Doubly Robust Generalized Estimating Equations

Description

drgee is used to estimate an exposure-outcome effect adjusted for additional covariates. The estimation is based on regression models for the outcome, exposure or a combination of both. For clustered data the models may have cluster-specific intercepts.

Usage

drgee(outcome, exposure,
      oformula, eformula, iaformula = formula(~1),
      olink = c("identity", "log", "logit"),
      elink = c("identity", "log", "logit"),
      data, subset = NULL, estimation.method = c("dr", "o", "e"),
      cond = FALSE, clusterid, clusterid.vcov, rootFinder = findRoots,
      intercept = TRUE, ...)

Arguments

`outcome`	The outcome as variable or as a character string naming a variable in the `data` argument. If missing, the outcome is assumed to be the response of `oformula`.
`exposure`	The exposure as variable or as a character string naming a variable in the `data` argument. If missing, the exposure is assumed to be the response of `eformula`.
`oformula`	An expression or formula for the outcome nuisance model.
`eformula`	An expression or formula for the exposure nuisance model.
`iaformula`	An expression or formula where the RHS should contain the variables that "interact" (i.e. are supposed to be multiplied with) with the exposure in the main model. "1" will always added. Default value is no interactions, i.e. `iaformula = formula(~1)`.
`olink`	A character string naming the link function in the outcome nuisance model. Has to be `"identity"`, `"log"` or `"logit"`.Default is `"identity"`.
`elink`	A character string naming the link function in the exposure nuisance model. Has to be `"identity"`, `"log"` or `"logit"`. Default is `"identity"`.
`data`	A data frame or environment containing the variables used. If missing, variables are expected to be found in the calling environment of the calling environment.
`subset`	An optional vector defining a subset of the data to be used.
`estimation.method`	A character string naming the desired estimation method. Choose `"o"` for O-estimation, `"e"` for E-estimation or `"dr"` for DR-estimation. Default is `"dr"`.
`cond`	A logical value indicating whether the nuisance models should have cluster-specific intercepts. Requires a `clusterid` argument.
`rootFinder`	A function to solve a system of non-linear equations. Default is `findRoots`.
`clusterid`	A cluster-defining variable or a character string naming a cluster-defining variable in the `data` argument. If it is not found in the `data` argument, it will be searched for in the calling frame. If missing, each observation will be considered to be a separate cluster. This argument is required when `cond = TRUE`.
`clusterid.vcov`	A cluster-defining variable or a character string naming a cluster-defining variable in the `data` argument to be used for adding contributions from the same cluster. These clusters can be different from the clusters defined by `clusterid`. However, each cluster defined by `clusterid` needs to be contained in exactly one cluster defined by `clusterid.vcov`. This variable is useful when the clusters are hierarchical.
`intercept`	A boolean to choose whether the nuisance parameters in doubly robust conditional logistic regression should be fitted with a model with an intercept. Only used for doubly robust condtional logistic regression.
`...`	Further arguments to be passed to the function `rootFinder`.

Details

drgee estimates the parameter \beta in a main model g\{E(Y|A,L)\}-g\{E(Y|A=0,L)\}=\beta^T \{A\cdot X(L)\}, where Y is the outcome of interest, A is the exposure of interest, and L is a vector of covariates that we wish to adjust for. X(L) is a vector valued function of L. Note that A \cdot X(L) should be interpreted as a columnwise multiplication and that X(L) will always contain a column of 1's. Given a specification of an outcome nuisance model g\{E(Y|A=0,L)=\gamma^T V(L) (where V(L) is a function of L), O-estimation is performed. Alternatively, leaving g\{E(Y|A=0,L) unspecified and using an exposure nuisance model h\{E(A|L)\}=\alpha^T Z(L) (where h is a link function and Z(L) is a function of L), E-estimation is performed. When g is logit, the exposure nuisance model is required be of the form logit\{E(A|Y=0,L)\}=\alpha^T Z(L). In this case the exposure needs to binary.

Given both an outcome and an exposure nuisance model, DR-estimation can be performed. DR-estimation gives a consistent estimate of the parameter \beta when either the outcome nuisance model or the exposure nuisance model is correctly specified, not necessarily both.

Usage is best explained through an example. Suppose that we are interested in the parameter vector (\beta_0, \beta_1) in a main model logit\{E(Y|A,L_1,L_2)\}-logit\{E(Y|A=0,L_1,L_2)\}=\beta_0 A + \beta_1 A \cdot L_1 where L_1 and L_2 are the covariates that we wish to adjust for. To adjust for L_1 and L_2, we can use an outcome nuisance model logit\{E(Y|A=0,L_1,L_2;\gamma_0, \gamma_1, \gamma_2)\}=\gamma_0 + \gamma_1 L_1 + \gamma_2 L_2 or an exposure nuisance model logit\{E(A|Y=0,L_1,L_2)\}=\alpha_0+\alpha_1 L_1+\alpha_2 L_2 to calculate estimates of \beta_0 and \beta_1 in the main model. We specify the outcome nuisance model as oformula=Y~L_1 and olink = "logit". The exposure nuisance model is specified as eformula = A~L_1+L_2 and elink = "logit". Since the outcome Y and the exposure A are identified as the LHS of oformula and eformla respectively and since the outcome link is specified in the olink argument, the only thing left to specify for the main model is the (multiplicative) interactions A\cdot X(L)=A\cdot (1,L_1)^T. This is done by specifying X(L) as iaformula = ~L_1, since 1 is always included in X(L). We can then perform O-estimation, E-estimation or DR-estimation by setting estimation.method to "o", "e" or "dr" respectively. O-estimation uses only the outcome nuisance model, and E-estimation uses only the exposure nuisance model. DR-estimation uses both nuisance models, and gives a consistent estimate of (\beta_0,\beta_1) if either nuisance model is correct, not necessarily both.

When estimation.method = "o", the RHS of eformula will be ignored. The eformula argument can also be replaced by an exposure argument specifying what the exposure of interest is.

When estimation.method = "e", the RHS of oformula will be ignored. The oformula argument can also be replaced by an outcome argument specifying what the outcome of interest is.

When cond = TRUE the nuisance models will be assumed to have cluster-specific intercept. These intercepts will not estimated.

When E-estimation or DR-estimation is chosen with olink = "logit", the exposure link will be changed to "logit". Note that this choice of outcome link does not work for DR-estimation when cond = TRUE.

Robust variance for the estimated parameter is calculated using the function robVcov. A cluster robust variance is calculated when a character string naming a cluster variable is supplied in the clusterid argument.

For E-estimation when cond = FALSE and g is the identity or log link, see Robins et al. (1992).

For DR-estimation when cond = TRUE and g is the identity or log link, see Robins (1999). For DR-estimation when g is the logit link, see Tchetgen et al. (2010).

O-estimation can also be performed using the gee function.

Value

drgee returns an object of class drgee containing:

`coefficients`	Estimates of the parameters in the main model.
`vcov`	Robust variance for all main model parameters.
`coefficients.all`	Estimates of all estimated parameters.
`vcov.all`	Robust variance of the all parameter estimates.
`optim.object`	An estimation object returned from the function specified in the `rootFinder`, if this function is called for the estimation of the main model parameters.
`optim.object.o`	An estimation object returned from the function specified in the `rootFinder` argument, if this function is called for the estimation of the outcome nuisance parameters.
`optim.object.e`	An estimation object returned from the function specified in the `rootFinder` argument, if this function is called for the estimation of the outcome nuisance parameters.
`call`	The matched call.
`estimation.method`	The value of the input argument `estimation.method`.
`data`	The original data object, if given as an input argument
`oformula`	The original oformula object, if given as an input argument
`eformula`	The original eformula object, if given as an input argument
`iaformula`	The original iaformula object, if given as an input argument

The class methods coef and vcov can be used to extract the estimated parameters and their covariance matrix from a drgee object. summary.drgee produces a summary of the calculations.

Author(s)

Johan Zetterqvist, Arvid Sjolander

References

Orsini N., Belocco R., SjÃ¶lander A. (2013), Doubly Robust Estimation in Generalized Linear Models, Stata Journal, 13, 1, pp. 185–205

Robins J.M., Mark S.D., Newey W.K. (1992), Estimating Exposure Effects by Modelling the Expectation of Exposure Conditional on Confounders, Biometrics, 48, pp. 479–495

Robins JM (1999), Robust Estimation in Sequentially Ignorable Missing Data and Causal Inference Models, Proceedings of the American Statistical Association Section on Bayesian Statistical Science, pp. 6–10

Tchetgen E.J.T., Robins J.M., Rotnitzky A. (2010), On Doubly Robust Estimation in a Semiparametric Odds Ratio Model, Biometrika, 97, 1, 171–180

Zetterqvist J., Vansteelandt S., Pawitan Y., Sjolander (2016), Doubly Robust Methods for Handling Confounding by Cluster, Biostatistics, 17, 2, 264–276

Examples


## DR-estimation when
## the main model is
## E(Y|A,L1,L2)-E(Y|A=0,L1,L2)=beta0*A+beta1*A*L1
## and the outcome nuisance model is
## E(Y|A=0,L1,L2)=gamma0+gamma1*L1+gamma2*L2
## and the exposure nuisance model is
## E(A|Y=0,L1,L2)=expit(alpha0+alpha1*L1+alpha2*l2)

library(drgee)

expit<-function(x) exp(x)/(1+exp(x))

n<-5000

## nuisance
l1<-rnorm(n, mean = 0, sd = 1)
l2<-rnorm(n, mean = 0, sd = 1)

beta0<-1.5
beta1<-1
gamma0<--1
gamma1<--2
gamma2<-2
alpha0<-1
alpha1<-5
alpha2<-3

## Exposure generated from the exposure nuisance model
a<-rbinom(n,1,expit(alpha0 + alpha1*l1 + alpha2*l2))
## Outcome generated from the main model and the
## outcome nuisance model
y<-rnorm(n,
mean = beta0 * a + beta1 * a * l1 + gamma0 + gamma1 * l1 + gamma2 * l2,
sd = 1)

simdata<-data.frame(y,a,l1,l2)

## outcome nuisance model misspecified and
## exposure nuisance model correctly specified

## DR-estimation
dr.est <- drgee(oformula = formula(y~l1),
eformula = formula(a~l1+l2),
iaformula = formula(~l1),
olink = "identity", elink = "logit",
data = simdata, estimation.method = "dr")
summary(dr.est)

## O-estimation
o.est <- drgee(exposure = "a", oformula = formula(y~l1),
iaformula = formula(~l1), olink = "identity",
data = simdata, estimation.method = "o")
summary(o.est)

## E-estimation
e.est <- drgee(outcome = "y", eformula = formula(a~l1+l2),
iaformula = formula(~l1), elink="logit",
data = simdata, estimation.method = "e")
summary(e.est)

drgee documentation built on Jan. 16, 2026, 5:19 p.m.