predict.emfrail: Predicted hazard and survival curves from an 'emfrail' object
In frailtyEM: Fitting Frailty Models with the EM Algorithm

Description Usage Arguments Details Value Note See Also Examples

Predicted hazard and survival curves from an emfrail object

## S3 method for class 'emfrail'
predict(object, newdata = NULL, lp = NULL,
  strata = NULL, quantity = c("cumhaz", "survival"),
  type = c("conditional", "marginal"), conf_int = NULL,
  individual = FALSE, conf_level = 0.95, ...)

`object`	An `emfrail` fit object
`newdata`	A data frame with the same variable names as those that appear in the `emfrail` formula, used to calculate the `lp` (optional).
`lp`	A vector of linear predictor values at which to calculate the curves. Default is 0 (baseline).
`strata`	The name of the strata (if applicable) for which the prediction should be made.
`quantity`	Can be `"cumhaz"` and/or `"survival"`. The quantity to be calculated for the values of `lp`.
`type`	Can be `"conditional"` and/or `"marginal"`. The type of the quantity to be calculated.
`conf_int`	Can be `"regular"` and/or `"adjusted"`. The type of confidence interval to be calculated.
`individual`	Logical. Are the observations in `newdata` from the same individual? See details.
`conf_level`	The width of the confidence intervals. By default, 95% confidence intervals are calculated.
`...`	Ignored

The function calculates predicted cumulative hazard and survival curves for given covariate or linear predictor values; for the first, newdata must be specified and for the latter lp must be specified. Each row of newdata or element of lp is consiered to be a different subject, and the desired predictions are produced for each of them separately.

In newdata two columns may be specified with the names tstart and tstop. In this case, each subject is assumed to be at risk only during the times specified by these two values. If the two are not specified, the predicted curves are produced for a subject that is at risk for the whole follow-up time.

A slightly different behaviour is observed if individual == TRUE. In this case, all the rows of newdata are assumed to come from the same individual, and tstart and tstop must be specified, and must not overlap. This may be used for describing subjects that are not at risk during certain periods or subjects with time-dependent covariate values.

The two "quantities" that can be returned are named cumhaz and survival. If we denote each quantity with q, then the columns with the marginal estimates are named q_m. The confidence intervals contain the name of the quantity (conditional or marginal) followed by _l or _r for the lower and upper bound. The bounds calculated with the adjusted standard errors have the name of the regular bounds followed by _a. For example, the adjusted lower bound for the marginal survival is in the column named survival_m_l_a.

The emfrail only gives the Breslow estimates of the baseline hazard λ_0(t) at the event time points, conditional on the frailty. Let λ(t) be the baseline hazard for a linear predictor of interest. The estimated conditional cumulative hazard is then Λ(t) = ∑_{s= 0}^t λ(s). The variance of Λ(t) can be calculated from the (maybe adjusted) variance-covariance matrix.

The conditional survival is obtained by the usual expression S(t) = \exp(-Λ(t)). The marginal survival is given by

\bar S(t) = E ≤ft[\exp(-Λ(t)) \right] = \mathcal{L}(Λ(t)),

i.e. the Laplace transform of the frailty distribution calculated in Λ(t).

The marginal hazard is obtained as

\bar Λ(t) = - \log \bar S(t).

The only standard errors that are available from emfrail are those for λ_0(t). From this, standard errors of \log Λ(t) may be calculated. On this scale, the symmetric confidence intervals are built, and then moved to the desired scale.

The return value is a single data frame (if lp has length 1, newdata has 1 row or individual == TRUE) or a list of data frames corresponding to each value of lp or each row of newdata otherwise. The names of the columns in the returned data frames are as follows: time represents the unique event time points from the data set, lp is the value of the linear predictor (as specified in the input or as calculated from the lines of newdata). By default, for each lp a data frame will contain the following columns: cumhaz, survival, cumhaz_m, survival_m for the cumulative hazard and survival, conditional and marginal, with corresponding confidence bands. The naming of the columns is explained more in the Details section.

The linear predictor is taken as fixed, so the variability in the estimation of the regression coefficient is not taken into account. Does not support left truncation (at the moment). That is because, if individual == TRUE and tstart and tstop are specified, for the marginal estimates the distribution of the frailty is used to calculate the integral, and not the distribution of the frailty given the truncation.

For performance reasons, consider running with conf_int = NULL; the reason is that the deltamethod function that is used to calculate the confidence intervals easily becomes slow when there is a large number of time points for the cumulative hazard.

plot.emfrail, autoplot.emfrail

kidney$sex <- ifelse(kidney$sex == 1, "male", "female")
m1 <- emfrail(formula = Surv(time, status) ~  sex + age  + cluster(id),
              data =  kidney)

# get all the possible prediction for the value 0 of the linear predictor
predict(m1, lp = 0)

# get the cumulative hazards for two different values of the linear predictor
predict(m1, lp = c(0, 1), quantity = "cumhaz", conf_int = NULL)

# get the cumulative hazards for a female and for a male, both aged 30
newdata1 <- data.frame(sex = c("female", "male"),
                       age = c(30, 30))

predict(m1, newdata = newdata1, quantity = "cumhaz", conf_int = NULL)

# get the cumulative hazards for an individual that changes
# sex from female to male at time 40.
newdata2 <- data.frame(sex = c("female", "male"),
                      age = c(30, 30),
                      tstart = c(0, 40),
                      tstop = c(40, Inf))

predict(m1, newdata = newdata2,
        individual = TRUE,
        quantity = "cumhaz", conf_int = NULL)