survtab_ag: Estimate Survival Time Functions

View source: R/survival_aggregated.R

survtab_agR Documentation

Estimate Survival Time Functions

Description

This function estimates survival time functions: survival, relative/net survival, and crude/absolute risk functions (CIF).

Usage

survtab_ag(
  formula = NULL,
  data,
  adjust = NULL,
  weights = NULL,
  surv.breaks = NULL,
  n = "at.risk",
  d = "from0to1",
  n.cens = "from0to0",
  pyrs = "pyrs",
  d.exp = "d.exp",
  n.pp = NULL,
  d.pp = "d.pp",
  d.pp.2 = "d.pp.2",
  n.cens.pp = "n.cens.pp",
  pyrs.pp = "pyrs.pp",
  d.exp.pp = "d.exp.pp",
  surv.type = "surv.rel",
  surv.method = "hazard",
  relsurv.method = "e2",
  subset = NULL,
  conf.level = 0.95,
  conf.type = "log-log",
  verbose = FALSE
)

Arguments

formula

a formula; the response must be the time scale to compute survival time function estimates over, e.g. fot ~ sex. Variables on the right-hand side of the formula separated by + are considered stratifying variables, for which estimates are computed separately. May contain usage of adjust() — see Details and Examples.

data

since popEpi 0.4.0, a data.frame containing variables used in formula and other arguments. aggre objects are recommended as they contain information on any time scales and are therefore safer; for creating aggre objects see as.aggre when your data is already aggregated and aggre for aggregating split Lexis objects.

adjust

can be used as an alternative to passing variables to argument formula within a call to adjust(); e.g. adjust = "agegr". Flexible input.

weights

typically a list of weights or a character string specifying an age group standardization scheme; see the dedicated help page and examples. NOTE: weights = "internal" is based on the counts of persons in follow-up at the start of follow-up (typically T = 0)

surv.breaks

a vector of breaks on the survival time scale. Optional if data is an aggre object and mandatory otherwise. Must define each intended interval; e.g. surv.breaks = 0:5 when data has intervals defined by breaks seq(0, 5, 1/12) will aggregate to wider intervals first. It is generally recommended (and sufficient; see Seppa, Dyban and Hakulinen (2015)) to use monthly intervals where applicable.

n

variable containing counts of subjects at-risk at the start of a time interval; e.g. n = "at.risk". Required when surv.method = "lifetable". Flexible input.

d

variable(s) containing counts of subjects experiencing an event. With only one type of event, e.g. d = "deaths". With multiple types of events (for CIF or cause-specific survival estimation), supply e.g. d = c("canD", "othD"). If the survival time function to be estimated does not use multiple types of events, supplying more than one variable to d simply causes the variables to be added together. Always required. Flexible input.

n.cens

variable containing counts of subjects censored during a survival time interval; E.g. n.cens = "alive". Required when surv.method = "lifetable". Flexible input.

pyrs

variable containing total subject-time accumulated within a survival time interval; E.g. pyrs = "pyrs". Required when surv.method = "hazard". Flexible input.

d.exp

variable denoting total "expected numbers of events" (typically computed pyrs * pop.haz, where pop.haz is the expected hazard level) accumulated within a survival time interval; E.g. pyrs = "pyrs". Required when computing EdererII relative survivals or CIFs based on excess counts of events. Flexible input.

n.pp

variable containing total Pohar-Perme weighted counts of subjects at risk in an interval, supplied as argument n is supplied. Computed originally on the subject level as analogous to pp * as.integer(status == "at-risk"). Required when relsurv.method = "pp". Flexible input.

d.pp

variable(s) containing Pohar-Perme weighted counts of events, supplied as argument d is supplied. Computed originally on the subject level as analogous to pp * as.integer(status == some_event). Required when relsurv.method = "pp". Flexible input.

d.pp.2

variable(s) containing total Pohar-Perme "double-weighted" counts of events, supplied as argument d is supplied. Computed originally on the subject level as analogous to pp * pp * as.integer(status == some_event). Required when relsurv.method = "pp". Flexible input.

n.cens.pp

variable containing total Pohar-Perme weighted counts censorings, supplied as argument n.cens is supplied. Computed originally on the subject level as analogous to pp * as.integer(status == "censored"). Required when relsurv.method = "pp". Flexible input.

pyrs.pp

variable containing total Pohar-Perme weighted subject-times, supplied as argument pyrs is supplied. Computed originally on the subject level as analogous to pp * pyrs. Required when relsurv.method = "pp". Flexible input.

d.exp.pp

variable containing total Pohar-Perme weighted counts of excess events, supplied as argument pyrs is supplied. Computed originally on the subject level as analogous to pp * d.exp. Required when relsurv.method = "pp". Flexible input.

surv.type

one of 'surv.obs', 'surv.cause', 'surv.rel', 'cif.obs' or 'cif.rel'; defines what kind of survival time function(s) is/are estimated; see Details

surv.method

either 'lifetable' or 'hazard'; determines the method of calculating survival time functions, where the former computes ratios such as p = d/(n - n.cens) and the latter utilizes subject-times (typically person-years) for hazard estimates such as d/pyrs which are used to compute survival time function estimates. The former method requires argument n.cens and the latter argument pyrs to be supplied.

relsurv.method

either 'e2' or 'pp'; defines whether to compute relative survival using the EdererII method or using Pohar-Perme weighting; ignored if surv.type != "surv.rel"

subset

a logical condition; e.g. subset = sex == 1; subsets the data before computations

conf.level

confidence level used in confidence intervals; e.g. 0.95 for 95 percent confidence intervals

conf.type

character string; must be one of "plain", "log-log" and "log"; defines the transformation used on the survival time function to yield confidence intervals via the delta method

verbose

logical; if TRUE, the function is chatty and returns some messages and timings along the process

Value

Returns a table of life time function values and other information with survival intervals as rows. Returns some of the following estimates of survival time functions:

  • surv.obs - observed (raw, overall) survival

  • surv.obs.K - observed cause-specific survival for cause K

  • CIF_k - cumulative incidence function for cause k

  • CIF.rel - cumulative incidence function using excess cases

  • r.e2 - relative survival, EdererII

  • r.pp - relative survival, Pohar-Perme weighted

The suffix .as implies adjusted estimates, and .lo and .hi imply lower and upper confidence limits, respectively. The prefix SE. stands for standard error.

Basics

This function computes interval-based estimates of survival time functions, where the intervals are set by the user. For product-limit-based estimation see packages survival and relsurv.

if surv.type = 'surv.obs', only 'raw' observed survival is estimated over the chosen time intervals. With surv.type = 'surv.rel', also relative survival estimates are supplied in addition to observed survival figures.

surv.type = 'cif.obs' requests cumulative incidence functions (CIF) to be estimated. CIFs are estimated for each competing risk based on a survival-interval-specific proportional hazards assumption as described by Chiang (1968). With surv.type = 'cif.rel', a CIF is estimated with using excess cases as the ”cause-specific” cases. Finally, with surv.type = 'surv.cause', cause-specific survivals are estimated separately for each separate type of event.

In hazard-based estimation (surv.method = "hazard") survival time functions are transformations of the estimated corresponding hazard in the intervals. The hazard itself is estimated using counts of events (or excess events) and total subject-time in the interval. Life table surv.method = "lifetable" estimates are constructed as transformations of probabilities computed using counts of events and counts of subjects at risk.

The vignette survtab_examples has some practical examples.

Relative survival

When surv.type = 'surv.rel', the user can choose relsurv.method = 'pp', whereupon Pohar-Perme weighting is used. By default relsurv.method = 'e2', i.e. the Ederer II method is used to estimate relative survival.

Adjusted estimates

Adjusted estimates in this context mean computing estimates separately by the levels of adjusting variables and returning weighted averages of the estimates. For example, computing estimates separately by age groups and returning a weighted average estimate (age-adjusted estimate).

Adjusting requires specification of both the adjusting variables and the weights for all the levels of the adjusting variables. The former can be accomplished by using adjust() with the argument formula, or by supplying variables directly to argument adjust. E.g. the following are all equivalent:

formula = fot ~ sex + adjust(agegr) + adjust(area)

formula = fot ~ sex + adjust(agegr, area)

formula = fot ~ sex, adjust = c("agegr", "area")

formula = fot ~ sex, adjust = list(agegr, area)

The adjusting variables must match with the variable names in the argument weights; see the dedicated help page. Typically weights are supplied as a list or a data.frame. The former can be done by e.g.

weights = list(agegr = VEC1, area = VEC2),

where VEC1 and VEC2 are vectors of weights (which do not have to add up to one). See survtab_examples for an example of using a data.frame to pass weights.

Period analysis and other data selection schemes

To calculate e.g. period analysis (delayed entry) estimates, limit the data when/before supplying to this function.See survtab_examples.

Data requirements

survtab_ag computes estimates of survival time functions using pre-aggregated data. For using subject-level data directly, use survtab. For aggregating data, see lexpand and aggre.

By default, and if data is an aggre object (not mandatory), survtab_ag makes use of the exact same breaks that were used in splitting the original data (with e.g. lexpand), so it is not necessary to specify any surv.breaks. If specified, the surv.breaks must be a subset of the pertinent pre-existing breaks. When data is not an aggre object, breaks must always be specified. Interval lengths (delta in output) are also calculated based on whichever breaks are used, so the upper limit of the breaks should therefore be meaningful and never e.g. Inf.

References

Perme, Maja Pohar, Janez Stare, and Jacques Esteve. "On estimation in relative survival." Biometrics 68.1 (2012): 113-120. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/j.1541-0420.2011.01640.x")}

Hakulinen, Timo, Karri Seppa, and Paul C. Lambert. "Choosing the relative survival method for cancer survival estimation." European Journal of Cancer 47.14 (2011): 2202-2210. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.ejca.2011.03.011")}

Seppa, Karri, Timo Hakulinen, and Arun Pokhrel. "Choosing the net survival method for cancer survival estimation." European Journal of Cancer (2013). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.ejca.2013.09.019")}

CHIANG, Chin Long. Introduction to stochastic processes in biostatistics. 1968. ISBN-14: 978-0471155003

Seppa K., Dyba T. and Hakulinen T.: Cancer Survival, Reference Module in Biomedical Sciences. Elsevier. 08-Jan-2015. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/B978-0-12-801238-3.02745-8")}

See Also

splitMulti, lexpand, ICSS, sire The survtab_examples vignette

Other main functions: Surv(), rate(), relpois_ag(), relpois(), sirspline(), sir(), survmean(), survtab()

Other survtab functions: Surv(), lines.survtab(), plot.survtab(), print.survtab(), summary.survtab(), survtab()

Examples

## see more examples with explanations in vignette("survtab_examples")

#### survtab_ag usage

data("sire", package = "popEpi")
## prepare data for e.g. 5-year "period analysis" for 2008-2012
## note: sire is a simulated cohort integrated into popEpi.
BL <- list(fot=seq(0, 5, by = 1/12),
           per = c("2008-01-01", "2013-01-01"))
x <- lexpand(sire, birth = bi_date, entry = dg_date, exit = ex_date,
             status = status %in% 1:2,
             breaks = BL,
             pophaz = popmort,
             aggre = list(fot))
             
## calculate relative EdererII period method
## NOTE: x is an aggre object here, so surv.breaks are deduced
## automatically
st <- survtab_ag(fot ~ 1, data = x)

summary(st, t = 1:5) ## annual estimates
summary(st, q = list(r.e2 = 0.75)) ## 1st interval where r.e2 < 0.75 at end

plot(st)


## non-aggre data: first call to survtab_ag would fail
df <- data.frame(x)
# st <- survtab_ag(fot ~ 1, data = x)
st <- survtab_ag(fot ~ 1, data = x, surv.breaks = BL$fot)

## calculate age-standardised 5-year relative survival ratio using 
## Ederer II method and period approach 

sire$agegr <- cut(sire$dg_age,c(0,45,55,65,75,Inf),right=FALSE)
BL <- list(fot=seq(0, 5, by = 1/12),
           per = c("2008-01-01", "2013-01-01"))
x <- lexpand(sire, birth = bi_date, entry = dg_date, exit = ex_date,
             status = status %in% 1:2,
             breaks = BL,
             pophaz = popmort,
             aggre = list(agegr, fot))

## age standardisation using internal weights (age distribution of 
## patients diagnosed within the period window)
## (NOTE: what is done here is equivalent to using weights = "internal")
w <- aggregate(at.risk ~ agegr, data = x[x$fot == 0], FUN = sum)
names(w) <- c("agegr", "weights")

st <- survtab_ag(fot ~ adjust(agegr), data = x, weights = w)
plot(st, y = "r.e2.as", col = c("blue"))

## age standardisation using ICSS1 weights
data(ICSS)
cut <- c(0, 45, 55, 65, 75, Inf)
agegr <- cut(ICSS$age, cut, right = FALSE)
w <- aggregate(ICSS1~agegr, data = ICSS, FUN = sum)
names(w) <- c("agegr", "weights")

st <- survtab_ag(fot ~ adjust(agegr), data = x, weights = w)
lines(st, y = "r.e2.as", col = c("red"))


## cause-specific survival
sire$stat <- factor(sire$status, 0:2, c("alive", "canD", "othD"))
x <- lexpand(sire, birth = bi_date, entry = dg_date, exit = ex_date,
             status = stat,
             breaks = BL,
             pophaz = popmort,
             aggre = list(agegr, fot))
st <- survtab_ag(fot ~ adjust(agegr), data = x, weights = w,
                 d = c("fromalivetocanD", "fromalivetoothD"),
                 surv.type = "surv.cause")
plot(st, y = "surv.obs.fromalivetocanD.as")
lines(st, y = "surv.obs.fromalivetoothD.as", col = "red")




WetRobot/popEpi documentation built on Aug. 29, 2023, 3:53 a.m.