survmean: Compute Mean Survival Times Using Extrapolation
In WetRobot/popEpi: Functions for Epidemiological Analysis using Population Data

survmean

R Documentation

Compute Mean Survival Times Using Extrapolation

Description

Computes mean survival times based on survival estimation up to a point in follow-up time (e.g. 10 years), after which survival is extrapolated using an appropriate hazard data file (pophaz) to yield the "full" survival curve. The area under the full survival curve is the mean survival.

Usage

survmean(
  formula,
  data,
  adjust = NULL,
  weights = NULL,
  breaks = NULL,
  pophaz = NULL,
  e1.breaks = NULL,
  e1.pophaz = pophaz,
  r = "auto",
  surv.method = "hazard",
  subset = NULL,
  verbose = FALSE
)

Arguments

`formula`	a `formula`, e.g. `FUT ~ V1` or `Surv(FUT, lex.Xst) ~ V1`. Supplied in the same way as to `⁠[survtab]⁠`, see that help for more info.
`data`	a `Lexis` data set; see `⁠[Epi::Lexis]⁠`.
`adjust`	variables to adjust estimates by, e.g. `adjust = "agegr"`. Flexible input.
`weights`	weights to use to adjust mean survival times. See the dedicated help page for more details on weighting. `survmean` computes curves separately by all variables to adjust by, computes mean survival times, and computes weighted means of the mean survival times. See Examples.
`breaks`	a list of breaks defining the time window to compute observed survival in, and the intervals used in estimation. E.g. `list(FUT = 0:10)` when `FUT` is the follow-up time scale in your data.
`pophaz`	a data set of population hazards passed to `⁠[survtab]⁠` (see the dedicated help page and the help page of `survtab` for more information). Defines the population hazard in the time window where observed survival is estimated.
`e1.breaks`	`NULL` or a list of breaks defining the time window to compute expected survival in, and the intervals used in estimation. E.g. `list(FUT = 0:100)` when `FUT` is the follow-up time scale in your data to extrapolate up to 100 years from where the observed survival curve ends. NOTE: the breaks on the survival time scale MUST include the breaks supplied to argument `breaks`; see Examples. If `NULL`, uses decent defaults (maximum follow-up time of 50 years).
`e1.pophaz`	Same as `pophaz`, except this defines the population hazard in the time window where expected survival is estimated. By default uses the same data as argument `pophaz`.
`r`	either a numeric multiplier such as `0.995`, `"auto"`, or `"autoX"` where `X` is an integer; used to determine the relative survival ratio (RSR) persisting after where the estimated observed survival curve ends. See Details.
`surv.method`	passed to `survtab`; see that help for more info.
`subset`	a logical condition; e.g. `subset = sex == 1`; subsets the data before computations
`verbose`	`logical`; if `TRUE`, the function is returns some messages and results along the run, which may be useful in debugging

Details

Basics

survmean computes mean survival times. For median survival times (i.e. where 50 % of subjects have died or met some other event) use ⁠[survtab]⁠.

The mean survival time is simply the area under the survival curve. However, since full follow-up rarely happens, the observed survival curves are extrapolated using expected survival: E.g. one might compute observed survival till up to 10 years and extrapolate beyond that (till e.g. 50 years) to yield an educated guess on the full observed survival curve.

The area is computed by trapezoidal integration of the area under the curve. This function also computes the "full" expected survival curve from T = 0 till e.g. T = 50 depending on supplied arguments. The expected mean survival time is the area under the mean expected survival curve. This function returns the mean expected survival time to be compared with the mean survival time and for computing years of potential life lost (YPLL).

Results can be formed by strata and adjusted for e.g. age by using the formula argument as in survtab. See also Examples.

Extrapolation tweaks

Argument r controls the relative survival ratio (RSR) assumed to persist beyond the time window where observed survival is computed (defined by argument breaks; e.g. up to FUT = 10). The RSR is simply RSR_i = p_oi / p_ei for a time interval i, i.e. the observed divided by the expected (conditional, not cumulative) probability of surviving from the beginning of a time interval till its end. The cumulative product of RSR_i over time is the (cumulative) relative survival curve.

If r is numeric, e.g. 0.995, that RSR level is assumed to persist beyond the observed survival curve. Numeric r should be ⁠> 0⁠ and expressed at the annual level when using fractional years as the scale of the time variables. E.g. if RSR is known to be 0.95 at the month level, then the annualized RSR is 0.95^12. This enables correct usage of the RSR with survival intervals of varying lengths. When using day-level time variables (such as Dates; see as.Date), numeric r should be expressed at the day level, etc.

If r is "auto" or "auto1", this function computes RSR estimates internally and automatically uses the RSR_i in the last survival interval in each stratum (and adjusting group) and assumes that to persist beyond the observed survival curve. Automatic determination of r is a good starting point, but in situations where the RSR estimate is uncertain it may produce poor results. Using "autoX" such as "auto6" causes survmean to use the mean of the estimated RSRs in the last X survival intervals, which may be more stable. Automatic determination will not use values ⁠>1⁠ but set them to 1. Visual inspection of the produced curves is always recommended: see Examples.

One may also tweak the accuracy and length of extrapolation and expected survival curve computation by using e1.breaks. By default this is whatever was supplied to breaks for the survival time scale, to which

c(seq(1/12, 1, 1/12), seq(1.2, 1.8, 0.2), 2:19, seq(20, 50, 5))

is added after the maximum value, e.g. with breaks = list(FUT = 0:10) we have

⁠..., 10+1/12, ..., 11, 11.2, ..., 2, 3, ..., 19, 20, 25, ... 50⁠

as the e1.breaks. Supplying e1.breaks manually requires the breaks over time survival time scale supplied to argument breaks to be reiterated in e1.breaks; see Examples. NOTE: the default extrapolation breaks assume the time scales in the data to be expressed as fractional years, meaning this will work extremely poorly when using e.g. day-level time scales (such as Date variables). Set the extrapolation breaks manually in such cases.

Value

Returns a data.frame or data.table (depending on getOptions("popEpi.datatable"); see ?popEpi) containing the following columns:

est: The estimated mean survival time
exp: The computed expected survival time
obs: Counts of subjects in data
YPLL: Years of Potential Life Lost, computed as ((exp - est) * obs) — though your time data may be in e.g. days, this column will have the same name regardless. The returned data also has columns named according to the variables supplied to the right-hand-side of the formula.

Author(s)

Joonas Miettinen

Examples


library(Epi)
## take 500 subjects randomly for demonstration
data(sire)
sire <- sire[sire$dg_date < sire$ex_date, ]
set.seed(1L)
sire <- sire[sample(x = nrow(sire), size = 500),]

## NOTE: recommended to use factor status variable
x <- Lexis(entry = list(FUT = 0, AGE = dg_age, CAL = get.yrs(dg_date)),
           exit = list(CAL = get.yrs(ex_date)),
           data = sire,
           exit.status = factor(status, levels = 0:2,
                                labels = c("alive", "canD", "othD")),
           merge = TRUE)

## phony variable
set.seed(1L)
x$group <- rbinom(nrow(x), 1, 0.5)
## age group
x$agegr <- cut(x$dg_age, c(0,45,60,Inf), right=FALSE)

## population hazards data  set
pm <- data.frame(popEpi::popmort)
names(pm) <- c("sex", "CAL", "AGE", "haz")

## breaks to define observed survival estimation
BL <- list(FUT = seq(0, 10, 1/12))

## crude mean survival
sm1 <- survmean(Surv(FUT, lex.Xst != "alive") ~ 1,
                pophaz = pm, data = x, weights = NULL,
                breaks = BL)

sm1 <- survmean(FUT ~ 1,
                pophaz = pm, data = x, weights = NULL,
                breaks = BL)

## mean survival by group
sm2 <- survmean(FUT ~ group,
                pophaz = pm, data = x, weights = NULL,
                breaks = BL)

## ... and adjusted for age using internal weights (counts of subjects)
## note: need also longer extrapolation here so that all curves
## converge to zero in the end.
eBL <- list(FUT = c(BL$FUT, 11:75))
sm3 <- survmean(FUT ~ group + adjust(agegr),
                pophaz = pm, data = x, weights = "internal",
                breaks = BL, e1.breaks = eBL)

## visual inspection of how realistic extrapolation is for each stratum;
## solid lines are observed + extrapolated survivals;
## dashed lines are expected survivals
plot(sm1)

## plotting object with both stratification and standardization
## plots curves for each strata-std.group combination
plot(sm3)

## for finer control of plotting these curves, you may extract
## from the survmean object using e.g.
attributes(sm3)$survmean.meta$curves


#### using Dates

x <- Lexis(entry = list(FUT = 0L, AGE = dg_date-bi_date, CAL = dg_date),
           exit = list(CAL = ex_date),
           data = sire[sire$dg_date < sire$ex_date, ],
           exit.status = factor(status, levels = 0:2,
                                labels = c("alive", "canD", "othD")),
           merge = TRUE)
## phony group variable
set.seed(1L)
x$group <- rbinom(nrow(x), 1, 0.5)


## NOTE: population hazard should be reported at the same scale
## as time variables in your Lexis data.
data(popmort, package = "popEpi")
pm <- data.frame(popmort)
names(pm) <- c("sex", "CAL", "AGE", "haz")
## from year to day level
pm$haz <- pm$haz/365.25
pm$CAL <- as.Date(paste0(pm$CAL, "-01-01"))
pm$AGE <- pm$AGE*365.25

BL <- list(FUT = seq(0, 8, 1/12)*365.25)
eBL <- list(FUT = c(BL$FUT, c(8.25,8.5,9:60)*365.25))
smd <- survmean(FUT ~ group, data = x,
                pophaz = pm, verbose = TRUE, r = "auto5",
                breaks = BL, e1.breaks = eBL)
plot(smd)

WetRobot/popEpi documentation built on June 10, 2025, 11:49 a.m.