# coxed.gam: Predict expected durations using the GAM method In coxed: Duration-Based Quantities of Interest for the Cox Proportional Hazards Model

## Description

This function is called by `coxed` and is not intended to be used by itself.

## Usage

 ```1 2``` ```coxed.gam(cox.model, newdata = NULL, k = -1, coef = NULL, b.ind = NULL, warn = TRUE) ```

## Arguments

 `cox.model` The output from a Cox proportional hazards model estimated with the `coxph` function in the `survival` package or with the `cph` function in the `rms` package `newdata` An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used `k` The number of knots in the GAM smoother. The default is -1, which employs the `choose.k` function from the `mgcv` package to choose the number of knots `coef` A vector of new coefficients to replace the `coefficients` attribute of the `cox.model`. Used primarily for bootstrapping, to recalculate durations using new coefficients derived from a bootstrapped sample. If `NULL`, the original coefficients are employed `b.ind` A vector of observation numbers to pass to the estimation sample to construct the a bootstrapped sample with replacement `warn` If `TRUE`, displays warnings, and if `FALSE` suppresses them

## Details

This function employs the GAM method of generating expected durations described in Kropko and Harden (2018), which proceeds according to five steps. First, it uses coefficient estimates from the Cox model, so researchers must first estimate the model just as they always have. Then the method computes expected values of risk for each observation by matrix-multiplying the covariates by the estimated coefficients from the model, then exponentiating the result. This creates the exponentiated linear predictor (ELP). Then the observations are ranked from smallest to largest according to their values of the ELP. This ranking is interpreted as the expected order of failure; the larger the value of the ELP, the sooner the model expects that observation to fail, relative to the other observations.

The next step is to connect the model's expected risk for each observation (ELP) to duration time (the observed durations). A `gam` fits a model to data by using a series of locally-estimated polynomial splines set by the user (see, for example, Wood, Pya, and Saefken 2016). It is a flexible means of allowing for the possibility of nonlinear relationships between variables. `coxed.gam` uses a GAM to model the observed utilizes a cubic regression spline to draw a smoothed line summarizing the bivariate relationship between the observed durations and the ranks. The GAM fit can be used directly to compute expected durations, given the covariates, for each observation in the data.

## Value

Returns a list containing the following components:

 `exp.dur` A vector of predicted mean durations for the estimation sample if `newdata` is omitted, or else for the specified new data. `gam.model` Output from the `gam` function in which the durations are fit against the exponentiated linear predictors from the Cox model. `gam.data` Fitted values and confidence intervals from the GAM model.

## Author(s)

Jonathan Kropko <jkropko@virginia.edu> and Jeffrey J. Harden <jharden2@nd.edu>

## References

Kropko, J. and Harden, J. J. (2018). Beyond the Hazard Ratio: Generating Expected Durations from the Cox Proportional Hazards Model. British Journal of Political Science https://doi.org/10.1017/S000712341700045X

Wood, S.N., N. Pya and B. Saefken (2016). Smoothing parameter and model selection for general smooth models (with discussion). Journal of the American Statistical Association 111, 1548-1575 http://dx.doi.org/10.1080/01621459.2016.1180986

`gam`, `coxed`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24``` ```mv.surv <- Surv(martinvanberg\$formdur, event = rep(1, nrow(martinvanberg))) mv.cox <- coxph(mv.surv ~ postel + prevdef + cont + ident + rgovm + pgovno + tpgovno + minority, method = "breslow", data = martinvanberg) ed <- coxed.gam(mv.cox) summary(ed\$gam.data) summary(ed\$gam.model) ed\$exp.dur #Plotting the GAM fit ## Not run: require(ggplot2) ggplot(ed\$gam.data, aes(x=rank.xb, y=y)) + geom_point() + geom_line(aes(x=rank.xb, y=gam_fit)) + geom_ribbon(aes(ymin=gam_fit_95lb, ymax=gam_fit_95ub), alpha=.5) + xlab("Cox model LP rank (smallest to largest)") + ylab("Duration") ## End(Not run) #Running coxed.gam() on a bootstrap sample and with new coefficients bsample <- sample(1:nrow(martinvanberg), nrow(martinvanberg), replace=TRUE) newcoefs <- rnorm(8) ed2 <- coxed.gam(mv.cox, b.ind=bsample, coef=newcoefs) ```