bshazard: Nonparametric Smoothing of the Hazard Function

View source: R/bshazard.R

bshazardR Documentation

Nonparametric Smoothing of the Hazard Function

Description

The function estimates the hazard function non parametrically from a survival object (possibly adjusted for covariates). The smoothed estimate is based on B-splines from the perspective of generalized linear mixed models. Left truncated and right censoring data are allowed.

Usage


## Default S3 method:
bshazard(formula,data,nbin=NULL,nk,degree,l0,lambda,phi,alpha,err,verbose)

Arguments

formula

a formula object, which must have a Surv object as the response on the left of the ~ operator. On the right, if desired are the names of the covariates in the Poisson model separated by + operators. For unadjusted estimates the right hand side should be ~1.

data

a data frame containing the variables named in the formula.

nbin

number of bins (equally spaced). If omitted, the function will split the data at each distinct time in the data (censoring or event).

nk

number of knots for B-splines (including minimum and maximum, default is 31): as a rule of thumb the number of observations divided by 4, but more than 40 knots are rarely needed.

degree

degree of B-splines, representing the effective number of parameters in the piecewise segments of splines (default is 1, corresponding to a linear segment)

l0

Starting value for the smoothing parameter lambda (default is 10). Relevant only if lambda=NULL

lambda

smoothing parameter. If not provided (default is NULL) it is estimated from the data as phi times the reciprocal of the variance of the random effect (by an iterative algorithm). A high value imposes a smoother estimate.

phi

overdispersion parameter. If not provided (default is NULL) it is estimated from the data.

alpha

1 minus the level for the two-sided confidence interval for the hazard function. Default is 0.05.

err

relative error for the iterative process in the lambda/phi estimation (default is 0.0001)

verbose

TRUE/FALSE Print each iteration step (default is T)

Details

The time axis is partitioned in a number of intervals equal to the number of different observations (if not fixed otherwise by the option nbin). The number of events in each interval is modeled by a Poisson model and the smoothing parameter (lambda) is estimated by a mixed model framework. The number of knots is 31 by default. The code also allows for overdispersion (phi). Adjustment for covariates can increase the computation time.

Value

The output of the bshazard function can be divided into three parts: 1. a data frame with the original data split in time intervals, 2. the set of vectors containing the estimated hazard and confidence limits and 3. the parameter estimates.

raw.data

data frame with original data split in bins (intervals of time), containing:

time

mid point of each bin

n.event

number of events in each bin

PY

total person-time in each bin (for each covariate value)

raw.hazard

number of events in each interval divided by PY

...

covariate values

nobs

number of records in input data

time

mid point of each bin at which the curve is computed

hazard

hazard estimate for each bin

lower.ci

lower limit of the hazard confidence interval (depends on alpha level)

upper.ci

upper limit of the hazard confidence interval (depends on alpha level)

(cov.value)

values of covariates at which hazard is computed (mean value)

fitted.values

estimated number of events at each time (from the Poisson model)

coefficients

coefficient estimates for each covariate

phi

overdispersion parameter

sv2

variance of the random effect corresponding to the inverse of the smoothing parameter multiplied by phi

df

degrees of freedom representing the effective number of smoothing parameters

Author(s)

Paola Rebora, Agus Salim, Marie Reilly Maintainer: Paola Rebora <paola.rebora@unimib.it>

References

Rebora P, Salim A, Reilly M (2014) bshazard: A Flexible Tool for Nonparametric Smoothing of the Hazard Function.The R Journal Vol. 6/2:114-122.

Lee Y, Nelder JA, Pawitan Y (2006). Generalized Linear Models with Random Effects: Unified Analysis via H-likelihood, volume 106. Chapman & Hall/CRC.

Pawitan Y (2001). In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press

Examples

data(cancer,package="survival")
  fit<-bshazard(Surv(time, status==2) ~ 1,data=lung)
  plot(fit$time, fit$hazard,xlab='Time', ylab='Rate/person-days',type='l')
  points(fit$raw.data$time,fit$raw.data$raw.hazard,cex=.3, lwd=3,col=1)
  lines(fit$time, fit$hazard, lty=1, lwd=3)
  lines(fit$time, fit$lower.ci, lty=1, lwd=1)
lines(fit$time, fit$upper.ci, lty=1, lwd=1)
# or alternatively
plot(fit)


#Example to graphically evaluate the Markov assumpion in an illness-death model
### data simulated under EXTENDED SEMI-MARKOV model
set.seed(123)
n  <- 500
beta<-log(3)
R  <- runif(n, min = 0, max =2)
cat <- cut(R, breaks = seq(0,2,0.5), labels = seq(1,4,1))
T12  <- 1/0.2*( (-log(runif(n, min = 0, max = 1))) / (exp(beta*R)))**(1/0.6)
C  <- runif(n, min =0, max = 10)
T12obs <- pmin(T12, C)
status  <- ifelse(T12obs < C, 1, 0)
T012obs <- R+T12obs
#R: simulated time to illness
#cat: time to illness categorised in 4 classes
#T12obs:  time from illness to death or censoring
#T012obs: time from origin to death or censoring
#status: indicator of death (1=death,0=censoring)

# Hazard estimate in the Clock Reset scale (time from illness) by time to illness class
fit <- bshazard(Surv(T12obs[cat == 1],status[cat==1]) ~ 1)
plot(fit,conf.int=FALSE,xlab='Time since illness',xlim=c(0,5),ylim=c(0,10),lwd=2,col=rainbow(1))
for(i in 2:4) {
  fit <- bshazard(Surv(T12obs[cat == i],status[cat==i]) ~ 1)
  lines(fit$time, fit$hazard, type = 'l', lwd = 2, col = rainbow(4)[i])
}
legend("top",title="Time to illness",c("<=0.5","0.5-|1","1-|1.5","1.5-|2"),col=c(rainbow(4)),lwd =2)

# Hazard estimate in the Clock Forward scale (time from origin) by time to illness class
fit <- bshazard(Surv(R[cat == 1],T012obs[cat == 1],status[cat==1]) ~ 1)
plot(fit,conf.int=FALSE,xlab='Time since origin',xlim=c(0,5),ylim=c(0,10),lwd=2,col=rainbow(1))
for(i in 2:4) {
  fit <- bshazard(Surv(R[cat == i],T012obs[cat == i],status[cat==i]) ~ 1)
  lines(fit$time, fit$hazard, type = 'l', lwd = 2, col = rainbow(4)[i])
}
legend("top",title="Time to illness",c("<=0.5","0.5-|1","1-|1.5","1.5-|2"),col=c(rainbow(4)),lwd =2)

#Alternatively an adjusted estimate can be performed, assuming proportionl hazard. 
##This computation can be time consuming!
## Not run: fit.adj <- bshazard(Surv(T12obs,status) ~ cat )
plot(fit.adj,overall=FALSE, xlab = 'Time since illness',col=rainbow(4),lwd=2, xlim = c(0,5), 
ylim = c(0,10))
legend("top",title="Time to illness",c("<=0.5","0.5-|1","1-|1.5","1.5-|2"),col=c(rainbow(4)),lwd =2)
## End(Not run)

### A more general setting with examples of Markov assumption evaluation can be found 
### in Bernasconi et al. Stat in Med 2016
## The function is currently defined as
function (x, ...) 
UseMethod("bshazard")

bshazard documentation built on May 29, 2024, 11:30 a.m.