Estimate Standardized Survivals and Attributable Risks for covariates with missing data

Description

The function nested.stdsurv fits the Cox model to estimate standardized survival curves and attributable risks for covariates that are missing data on some cohort members. All covariates must be factor variables. nested.stdsurv requires knowledge of the variables that missingness depends on, with missingness probability modeled through a glm sampling model. Often, the data is in the form of a case-control sample taken within a cohort. nested.stdsurv allows cases to have missing data, and can extract efficiency from auxiliary variables by including them in the sampling model. nested.stdsurv requires coxph from the survival package.

Usage

1
2
3
4
5
6
nested.stdsurv(outcome, exposures, confounders, samplingmod, data,
               exposureofinterest = "", timeofinterest = Inf,cuminc=FALSE,
               plot = FALSE, plotfilename = "", glmlink = binomial(link = "logit"),
               glmcontrol = glm.control(epsilon = 1e-10, maxit = 10, trace = FALSE),
               coxphcontrol = coxph.control(eps = 1e-10, iter.max = 50),
               missvarwarn = TRUE, ...)

Arguments

Required arguments:

outcome

Survival outcome of interest, must be a Surv object

exposures

The part of the right side of the Cox model that parameterizes the exposures. Never use '*' for interaction, use interaction. Survival probabilities will be computed for each level of the exposures.

confounders

The part of the right side of the Cox model that parameterizes the confounders. Never use '*' for interaction, use interaction.

samplingmod

Right side of the formula for the glm sampling model that models the probability of missingness

data

Data Frame that all variables are in

Optional arguments:

exposureofinterest

The name of the level of the exposures for which attributable risk is desired. Default is the first level of the exposure.

timeofinterest

The time at which survival probabilities and attributable risks are desired. Default is the last event time.

cuminc

Set to T if you want output as cumulative incidence, F for survival

plot

If T, plot the standardized survivals. Default is F.

plotfilename

A string for the filename to save the plot as

glmlink

Sampling model link function, default is logistic regression

glmcontrol

See glm.control

coxphcontrol

See coxph.control

missvarwarn

Warn if there is missing data in the sampling variable. Default is TRUE

...

Any additional arguments to be passed on to glm or coxph

Details

If nested.stdsurv reports that the sampling model "failed to converge", the sampling model will be returned for your inspection. Note that if some sampling probabilities are estimated at 1, the model technically cannot converge, but you get very close to 1, and nested.stdsurv will not report non-convergence for this situation.

Note the following issues.

The data must be in a dataframe and specified in the data statement. No variable can be named 'o.b.s.e.r.v.e.d.' or 'p.i.h.a.t.'. Cases and controls cannot be finely matched on time, but matching on time within large strata is allowed. strata(), cluster() or offset() statements in or confounders are not allowed. Everyone must enter the cohort at the same time on the vival time scale. Must use Breslow Tie-Breaking. All covariates must be factor variables, even if binary. Do not use '*' to mean interaction in exposures or confounders, use interaction.

Value

A List with the following components:

coxmod

The fitted Cox model

samplingmod

The fitted glm sampling model

survtable

Standardized survival (and inference) for each exposure level

riskdifftable

Standardized survival (risk) differences (and inference) for each exposure level, relative to the exposure of interest.

PARtable

Population Attributable Risk (and inference) for the exposure of interest

If plot=T, then the additional component is included:

plotdata

A matrix with data needed to plot the survivals: time, standardized survival for each exposure level, and crude survival. Name of each exposure level is converted to a proper R variable name (these are the column labels).

Note

Requires the MASS library from the VR bundle that is available from the CRAN website.

Author(s)

Hormuzd A. Katki

References

Katki HA, Mark SD. Survival Analysis for Cohorts with Missing Covariate Information. R-News, 8(1) 14-9, 2008. http://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf

Mark, S.D. and Katki, H.A. Specifying and Implementing Nonparametric and Semiparametric Survival Estimators in Two-Stage (sampled) Cohort Studies with Missing Case Data. Journal of the American Statistical Association, 2006, 101, 460-471.

Mark SD, Katki H. Influence function based variance estimation and missing data issues in case-cohort studies. Lifetime Data Analysis, 2001; 7; 329-342

Christian C. Abnet, Barry Lai, You-Lin Qiao, Stefan Vogt, Xian-Mao Luo, Philip R. Taylor, Zhi-Wei Dong, Steven D. Mark, Sanford M. Dawsey. Zinc concentration in esophageal biopsies measured by X-ray fluorescence and cancer risk. Journal of the National Cancer Institute, 2005; 97(4) 301-306

See Also

See Also: nested.coxph, zinc, nested.km, coxph, glm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## Simple analysis of zinc and esophageal cancer data:
## We sampled zinc (variable znquartiles) on a fraction of the subjects, with
## sampling fractions depending on cancer status and baseline histology.
## We observed the confounding variables on almost all subjects.
data(zinc)
mod <- nested.stdsurv(outcome="Surv(futime01,ec01==1)",
                      exposures="znquartiles",
                      confounders="sex+agestr+smoke+drink+mildysp+moddysp+sevdysp+anyhist",
                      samplingmod="ec01*basehist",exposureofinterest="Q4",data=zinc)

# This is the output:
#  Standardized Survival for znquartiles by time 5893 
#        Survival  StdErr 95% CI Left 95% CI Right
#  Q1      0.5443 0.07232      0.3932       0.6727
#  Q2      0.7595 0.07286      0.5799       0.8703
#  Q3      0.7045 0.07174      0.5383       0.8203
#  Q4      0.8911 0.06203      0.6863       0.9653
#  Crude   0.7784 0.02491      0.7249       0.8228

#  Standardized Risk Differences vs. znquartiles = Q4 by time 5893 
#             Risk Difference  StdErr 95% CI Left 95% CI Right
#  Q4 - Q1             0.3468 0.10376    0.143412       0.5502
#  Q4 - Q2             0.1316 0.09605   -0.056694       0.3198
#  Q4 - Q3             0.1866 0.09355    0.003196       0.3699
#  Q4 - Crude          0.1126 0.06353   -0.011871       0.2372

# PAR if everyone had znquartiles = Q4 
#     Estimate StdErr 95% PAR CI Left 95% PAR CI Right
# PAR   0.5084 0.2777         -0.4872           0.8375

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.