standardize_gee | R Documentation |
standardize_gee
performs regression standardization in linear and log-linear
fixed effects models, at specified values of the exposure, over the sample
covariate distribution. Let Y
, X
, and Z
be the outcome,
the exposure, and a vector of covariates, respectively. It is assumed that
data are clustered with a cluster indicator i
. standardize_gee
uses
fitted fixed effects model, with cluster-specific intercept a_i
(see
details
), to estimate the standardized mean
\theta(x)=E\{E(Y|i,X=x,Z)\}
, where x
is a specific value of
X
, and the outer expectation is over the marginal distribution of
(a_i,Z)
.
standardize_gee(
formula,
link = "identity",
data,
values,
clusterid,
case_control = FALSE,
ci_level = 0.95,
ci_type = "plain",
contrasts = NULL,
family = "gaussian",
reference = NULL,
transforms = NULL
)
formula |
A formula to be used with |
link |
The link function to be used with |
data |
The data. |
values |
A named list or data.frame specifying the variables and values at which marginal means of the outcome will be estimated. |
clusterid |
An optional string containing the name of a cluster identification variable when data are clustered. |
case_control |
Whether the data comes from a case-control study. |
ci_level |
Coverage probability of confidence intervals. |
ci_type |
A string, indicating the type of confidence intervals. Either "plain", which gives untransformed intervals, or "log", which gives log-transformed intervals. |
contrasts |
A vector of contrasts in the following format:
If set to |
family |
The family argument which is used to fit the glm model for the outcome. |
reference |
A vector of reference levels in the following format:
If |
transforms |
A vector of transforms in the following format:
If set to |
standardize_gee
assumes that a fixed effects model
\eta\{E(Y|i,X,Z)\}=a_i+h(X,Z;\beta)
has been fitted. The link
function \eta
is assumed to be the identity link or the log link. The
conditional generalized estimating equation (CGEE) estimate of \beta
is used to obtain estimates of the cluster-specific means:
\hat{a}_i=\sum_{j=1}^{n_i}r_{ij}/n_i,
where
r_{ij}=Y_{ij}-h(X_{ij},Z_{ij};\hat{\beta})
if \eta
is the
identity link, and
r_{ij}=Y_{ij}\exp\{-h(X_{ij},Z_{ij};\hat{\beta})\}
if \eta
is the log link, and (X_{ij},Z_{ij})
is the value of
(X,Z)
for subject j
in cluster i
, j=1,...,n_i
,
i=1,...,n
. The CGEE estimate of \beta
and the estimate of
a_i
are used to estimate the mean E(Y|i,X=x,Z)
:
\hat{E}(Y|i,X=x,Z)=\eta^{-1}\{\hat{a}_i+h(X=x,Z;\hat{\beta})\}.
For
each x
in the x
argument, these estimates are averaged across
all subjects (i.e. all observed values of Z
and all estimated values
of a_i
) to produce estimates
\hat{\theta}(x)=\sum_{i=1}^n
\sum_{j=1}^{n_i} \hat{E}(Y|i,X=x,Z_i)/N,
where N=\sum_{i=1}^n n_i
.
The variance for \hat{\theta}(x)
is obtained by the sandwich formula.
An object of class std_glm
. Obtain numeric results in a data frame with the tidy.std_glm function.
This is a list with the following components:
An unnamed list with one element for each of the requested contrasts. Each element is itself a list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
Data.frame of the estimates of the contrast with inference
The transform argument used for this contrast
The requested contrast type
The reference level of the exposure
Confidence interval type
Confidence interval level
A named list with the elements:
Estimated counterfactual means and standard errors for each exposure level
Estimated covariance matrix of counterfactual means
The estimated regression model for the outcome
The estimated exposure model
A character vector of the exposure variable names
The variance calculation performed by standardize_gee
does not condition
on the observed covariates \bar{Z}=(Z_{11},...,Z_{nn_i})
. To see how
this matters, note that
var\{\hat{\theta}(x)\}=E[var\{\hat{\theta}(x)|\bar{Z}\}]+var[E\{\hat{\theta}(x)|\bar{Z}\}].
The usual parameter \beta
in a generalized linear model does not
depend on \bar{Z}
. Thus, E(\hat{\beta}|\bar{Z})
is independent
of \bar{Z}
as well (since E(\hat{\beta}|\bar{Z})=\beta
), so that
the term var[E\{\hat{\beta}|\bar{Z}\}]
in the corresponding variance
decomposition for var(\hat{\beta})
becomes equal to 0. However,
\theta(x)
depends on \bar{Z}
through the average over the sample
distribution for Z
, and thus the term
var[E\{\hat{\theta}(x)|\bar{Z}\}]
is not 0, unless one conditions on
\bar{Z}
.
Arvid Sjölander.
Goetgeluk S. and Vansteelandt S. (2008). Conditional generalized estimating equations for the analysis of clustered and longitudinal data. Biometrics 64(3), 772-780.
Martin R.S. (2017). Estimation of average marginal effects in multiplicative unobserved effects panel models. Economics Letters 160, 16-19.
Sjölander A. (2019). Estimation of marginal causal effects in the presence of confounding by cluster. Biostatistics doi: 10.1093/biostatistics/kxz054
require(drgee)
set.seed(4)
n <- 300
ni <- 2
id <- rep(1:n, each = ni)
ai <- rep(rnorm(n), each = ni)
Z <- rnorm(n * ni)
X <- rnorm(n * ni, mean = ai + Z)
Y <- rnorm(n * ni, mean = ai + X + Z + 0.1 * X^2)
dd <- data.frame(id, Z, X, Y)
fit.std <- standardize_gee(
formula = Y ~ X + Z + I(X^2),
link = "identity",
data = dd,
values = list(X = seq(-3, 3, 0.5)),
clusterid = "id"
)
print(fit.std)
plot(fit.std)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.