# coxed.npsf: Predict expected durations using the GAM method In coxed: Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

## Description

This function is called by coxed and is not intended to be used by itself.

## Usage

 1 coxed.npsf(cox.model, newdata = NULL, coef = NULL, b.ind = NULL) 

## Arguments

 cox.model The output from a Cox proportional hazards model estimated with the coxph function in the survival package or with the cph function in the rms package newdata An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used coef A vector of new coefficients to replace the coefficients attribute of the cox.model. Used primarily for bootstrapping, to recalculate durations using new coefficients derived from a bootstrapped sample. If NULL, the original coefficients are employed b.ind A vector of observation numbers to pass to the estimation sample to construct the a bootstrapped sample with replacement

## Details

The non-parametric step function (NPSF) approach to calculating expected durations from the Cox proportional hazards model, described in Kropko and Harden (2018), uses the method proposed by Cox and Oakes (1984, 107-109) for estimating the cumulative baseline hazard function. This method is nonparametric and results in a step-function representation of the cumulative baseline hazard.

Cox and Oakes (1984, 108) show that the cumulative baseline hazard function can be estimated after fitting a Cox model by

\hat{H}_0(t) = ∑_{τ_j < t}\frac{d_j}{∑_{l\in\Re(τ_j)}\hat{ψ}(l)},

where τ_j represents time points earlier than t, d_j is a count of the total number of failures at τ_j, \Re(τ_j) is the remaining risk set at τ_j, and \hat{ψ}(l) represents the ELP from the Cox model for observations still in the risk set at τ_j. This equation is used calculate the cumulative baseline hazard at all time points in the range of observed durations. This estimate is a stepwise function because time points with no failures do not contribute to the cumulative hazard, so the function is flat until the next time point with observed failures.

We extend this method to obtain expected durations by first calculating the baseline survivor function from the cumulative hazard function, using

\hat{S}_0(t) = \exp[-\hat{H}_0(t)].

Each observation's survivor function is related to the baseline survivor function by

\hat{S}_i(t) = \hat{S}_0(t)^{\hat{ψ}(i)},

where \hat{ψ}(i) is the exponentiated linear predictor (ELP) for observation i. These survivor functions can be used directly to calculate expected durations for each observation. The expected value of a non-negative random variable can be calculated by

E(X) = \int_0^{∞} \bigg(1 - F(t)\bigg)dt,

where F(.) is the cumulative distribution function for X. In the case of a duration variable t_i, the expected duration is

E(t_i) = \int_0^T S_i(t)\,dt,

where T is the largest possible duration and S(t) is the individual's survivor function. We approximate this integral with a right Riemann-sum by calculating the survivor functions at every discrete time point from the minimum to the maximum observed durations, and multiplying these values by the length of the interval between time points with observed failures:

E(t_i) \approx ∑_{t_j \in [0,T]} (t_j - t_{j-1})S_i(t_j).

## Value

Returns a list containing the following components:

 exp.dur A vector of predicted mean durations for the estimation sample if newdata is omitted, or else for the specified new data. baseline.functions The estimated cumulative baseline hazard function and survivor function.

## Author(s)

Jonathan Kropko <jkropko@virginia.edu> and Jeffrey J. Harden <jharden2@nd.edu>

## References

Kropko, J. and Harden, J. J. (2018). Beyond the Hazard Ratio: Generating Expected Durations from the Cox Proportional Hazards Model. British Journal of Political Science https://doi.org/10.1017/S000712341700045X

Cox, D. R., and Oakes, D. (1984). Analysis of Survival Data. Monographs on Statistics & Applied Probability

coxed
  1 2 3 4 5 6 7 8 9 10 11 12 mv.surv <- Surv(martinvanberg$formdur, event = rep(1, nrow(martinvanberg))) mv.cox <- coxph(mv.surv ~ postel + prevdef + cont + ident + rgovm + pgovno + tpgovno + minority, method = "breslow", data = martinvanberg) ed <- coxed.npsf(mv.cox) ed$baseline.functions ed\$exp.dur #Running coxed.npsf() on a bootstrap sample and with new coefficients bsample <- sample(1:nrow(martinvanberg), nrow(martinvanberg), replace=TRUE) newcoefs <- rnorm(8) ed2 <- coxed.npsf(mv.cox, b.ind=bsample, coef=newcoefs)