Competings Risks Regression

Description

Fits a semiparametric model for the cause-specific quantities :

P(T < t, cause=1 | x,z) = P_1(t,x,z) = h( g(t,x,z) )

for a known link-function h() and known prediction-function g(t,x,z) for the probability of dying from cause 1 in a situation with competing causes of death.

We consider the following models : 1) the additive model where h(x)=1-\exp(-x) and

g(t,x,z) = x^T A(t) + (diag(t^p) z)^T β

2) the proportional setting that includes the Fine & Gray (FG) "prop" model and some extensions where h(x)=1-\exp(-\exp(x)) and

g(t,x,z) = (x^T A(t) + (diag(t^p) z)^T β)

The FG model is obtained when x=1, but the baseline is parametrized as \exp(A(t)).

The "fg" model is a different parametrization that contains the FG model, where h(x)=1-\exp(-x) and

g(t,x,z) = (x^T A(t)) \exp((diag(t^p) z)^T β)

The FG model is obtained when x=1.

3) a "logistic" model where h(x)=\exp(x)/( 1+\exp(x)) and

g(t,x,z) = x^T A(t) + (diag(t^p) z)^T β

The "logistic2" is

P_1(t,x,z) = x^T A(t) exp((diag(t^p) z)^T β)/ (1+ x^T A(t) exp((diag(t^p) z)^T β))

The simple logistic model with just a baseline can also be fitted by an alternative procedure that has better small sample properties see prop.odds.subist().

4) the relative cumulative incidence function "rcif" model where h(x)=\exp(x) and

g(t,x,z) = x^T A(t) + (diag(t^p) z)^T β

The "rcif2"

P_1(t,x,z) = (x^T A(t)) \exp((diag(t^p) z)^T β)

Where p by default is 1 for the additive model and 0 for the other models. In general p may be powers of the same length as z.

Usage

1
2
3
4
5
6
7
8
comp.risk(formula,data=sys.parent(),cause,times=NULL,Nit=50,
clusters=NULL,est=NULL,fix.gamma=0,gamma=0,n.sim=0,weighted=0,model="fg",
detail=0,interval=0.01,resample.iid=1,
cens.model="KM",cens.formula=NULL,
time.pow=NULL,time.pow.test=NULL,silent=1,conv=1e-6,
weights=NULL,max.clust=1000,n.times=50,first.time.p=0.05,
estimator=1,trunc.p=NULL,cens.weights=NULL,admin.cens=NULL,
conservative=1,monotone=0,step=NULL)

Arguments

formula

a formula object, with the response on the left of a '~' operator, and the terms on the right. The response must be a survival object as returned by the ‘Event’ function. The status indicator is not important here. Time-invariant regressors are specified by the wrapper const(), and cluster variables (for computing robust variances) by the wrapper cluster().

data

a data.frame with the variables.

cause

For competing risk models specificies which cause we consider.

times

specifies the times at which the estimator is considered. Defaults to all the times where an event of interest occurs, with the first 10 percent or max 20 jump points removed for numerical stability in simulations.

Nit

number of iterations for Newton-Raphson algorithm.

clusters

specifies cluster structure, for backwards compability.

est

possible starting value for nonparametric component of model.

fix.gamma

to keep gamma fixed, possibly at 0.

gamma

starting value for constant effects.

n.sim

number of simulations in resampling.

weighted

Not implemented. To compute a variance weighted version of the test-processes used for testing time-varying effects.

model

"additive", "prop"ortional, "rcif", or "logistic".

detail

if 0 no details are printed during iterations, if 1 details are given.

interval

specifies that we only consider timepoints where the Kaplan-Meier of the censoring distribution is larger than this value.

resample.iid

to return the iid decomposition, that can be used to construct confidence bands for predictions

cens.model

specified which model to use for the ICPW, KM is Kaplan-Meier alternatively it may be "cox"

cens.formula

specifies the regression terms used for the regression model for chosen regression model. When cens.model is specified, the default is to use the same design as specified for the competing risks model.

time.pow

specifies that the power at which the time-arguments is transformed, for each of the arguments of the const() terms, default is 1 for the additive model and 0 for the proportional model.

time.pow.test

specifies that the power the time-arguments is transformed for each of the arguments of the non-const() terms. This is relevant for testing if a coefficient function is consistent with the specified form A_l(t)=beta_l t^time.pow.test(l). Default is 1 for the additive model and 0 for the proportional model.

silent

if 0 information on convergence problems due to non-invertible derviates of scores are printed.

conv

gives convergence criterie in terms of sum of absolute change of parameters of model

weights

weights for estimating equations.

max.clust

sets the total number of i.i.d. terms in i.i.d. decompostition. This can limit the amount of memory used by coarsening the clusters. When NULL then all clusters are used. Default is 1000 to save memory and time.

first.time.p

first point for estimation is pth percentile of cause jump times.

n.times

only uses 50 points for estimation, if NULL then uses all points, subject to p.start condition.

estimator

default estimator is 1.

trunc.p

truncation weight for delayed entry, P(T > entry.time | Z_i), typically Cox model.

cens.weights

censoring weights can be given here rather than calculated using the KM, cox or aalen models.

admin.cens

censoring times for the administrative censoring

conservative

set to 0 to compute correct variances based on censoring weights, default is conservative estimates that are much quicker.

monotone

monotone=0, uses estimating equations

(D_β P_1) w(t) ( Y(t)/G_c(t) - P_1(t,X)) and

montone 1 uses

w(t) ( Y(t)/G_c(t) - P_1(t,X)) and

step

step size for Fisher-Scoring algorithm.

Details

Since timereg version 1.8.4. the response must be specified with the Event function instead of the Surv function and the arguments. For example, if the old code was

comp.risk(Surv(time,cause>0)~x1+x2,data=mydata,cause=mydata$cause,causeS=1)

the new code is

comp.risk(Event(time,cause)~x1+x2,data=mydata,cause=1)

Also the argument cens.code is now obsolete since cens.code is an argument of Event.

Value

returns an object of type 'comprisk'. With the following arguments:

cum

cumulative timevarying regression coefficient estimates are computed within the estimation interval.

var.cum

pointwise variances estimates.

gamma

estimate of proportional odds parameters of model.

var.gamma

variance for gamma.

score

sum of absolute value of scores.

gamma2

estimate of constant effects based on the non-parametric estimate. Used for testing of constant effects.

obs.testBeq0

observed absolute value of supremum of cumulative components scaled with the variance.

pval.testBeq0

p-value for covariate effects based on supremum test.

obs.testBeqC

observed absolute value of supremum of difference between observed cumulative process and estimate under null of constant effect.

pval.testBeqC

p-value based on resampling.

obs.testBeqC.is

observed integrated squared differences between observed cumulative and estimate under null of constant effect.

pval.testBeqC.is

p-value based on resampling.

conf.band

resampling based constant to construct 95% uniform confidence bands.

B.iid

list of iid decomposition of non-parametric effects.

gamma.iid

matrix of iid decomposition of parametric effects.

test.procBeqC

observed test process for testing of time-varying effects

sim.test.procBeqC

50 resample processes for for testing of time-varying effects

conv

information on convergence for time points used for estimation.

Author(s)

Thomas Scheike

References

Scheike, Zhang and Gerds (2008), Predicting cumulative incidence probability by direct binomial regression,Biometrika, 95, 205-220.

Scheike and Zhang (2007), Flexible competing risks regression modelling and goodness of fit, LIDA, 14, 464-483.

Martinussen and Scheike (2006), Dynamic regression models for survival data, Springer.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
data(bmt); 

clust <- rep(1:204,each=2)
addclust<-comp.risk(Event(time,cause)~platelet+age+tcell+cluster(clust),data=bmt,
cause=1,resample.iid=1,n.sim=100,model="additive")
###

addclust<-comp.risk(Event(time,cause)~+1+cluster(clust),data=bmt,cause=1,
		    resample.iid=1,n.sim=100,model="additive")
pad <- predict(addclust,X=1)
plot(pad)

add<-comp.risk(Event(time,cause)~platelet+age+tcell,data=bmt,
cause=1,resample.iid=1,n.sim=100,model="additive")
summary(add)

par(mfrow=c(2,4))
plot(add); 
### plot(add,score=1) ### to plot score functions for test

ndata<-data.frame(platelet=c(1,0,0),age=c(0,1,0),tcell=c(0,0,1))
par(mfrow=c(2,3))
out<-predict(add,ndata,uniform=1,n.sim=100)
par(mfrow=c(2,2))
plot(out,multiple=0,uniform=1,col=1:3,lty=1,se=1)

add<-comp.risk(Event(time,cause)~platelet+age+tcell,data=bmt,
	       cause=1,resample.iid=0,n.sim=0,cens.model="cox",
	       cens.formula=~factor(platelet),model="additive")

out<-predict(add,ndata,se=0,uniform=0)
par(mfrow=c(2,2))
plot(out,multiple=0,se=0,uniform=0,col=1:3,lty=1)

## fits additive model with some constant effects 
add.sem<-comp.risk(Event(time,cause)~
const(platelet)+const(age)+const(tcell),data=bmt,
cause=1,resample.iid=1,n.sim=100,model="additive")
summary(add.sem)

out<-predict(add.sem,ndata,uniform=1,n.sim=100)
par(mfrow=c(2,2))
plot(out,multiple=0,uniform=1,col=1:3,lty=1,se=0)

## Fine & Gray model 
fg<-comp.risk(Event(time,cause)~
const(platelet)+const(age)+const(tcell),data=bmt,
cause=1,resample.iid=1,model="fg",n.sim=100)
summary(fg)

out<-predict(fg,ndata,uniform=1,n.sim=100)

par(mfrow=c(2,2))
plot(out,multiple=1,uniform=0,col=1:3,lty=1,se=0)

## extended model with time-varying effects
fg.npar<-comp.risk(Event(time,cause)~platelet+age+const(tcell),
data=bmt,cause=1,resample.iid=1,model="prop",n.sim=100)
summary(fg.npar); 

out<-predict(fg.npar,ndata,uniform=1,n.sim=100)
head(out$P1[,1:5]); head(out$se.P1[,1:5])

par(mfrow=c(2,2))
plot(out,multiple=1,uniform=0,col=1:3,lty=1,se=0)

## Fine & Gray model with alternative parametrization for baseline
fg2<-comp.risk(Event(time,cause)~const(platelet)+const(age)+const(tcell),data=bmt,
cause=1,resample.iid=1,model="prop",n.sim=100)
summary(fg2)

#################################################################
## Delayed entry models, 
#################################################################
nn <- nrow(bmt)
entrytime <- rbinom(nn,1,0.5)*(bmt$time*runif(nn))
bmt$entrytime <- entrytime
times <- seq(5,70,by=1)

bmtw <- prep.comp.risk(bmt,times=times,time="time",entrytime="entrytime",cause="cause")

## non-parametric model 
outnp <- comp.risk(Event(time,cause)~tcell+platelet+const(age),
		   data=bmtw,cause=1,fix.gamma=1,gamma=0,
 cens.weights=bmtw$cw,weights=bmtw$weights,times=times,n.sim=0)
par(mfrow=c(2,2))
plot(outnp)

outnp <- comp.risk(Event(time,cause)~tcell+platelet,
		   data=bmtw,cause=1,
 cens.weights=bmtw$cw,weights=bmtw$weights,times=times,n.sim=0)
par(mfrow=c(2,2))
plot(outnp)


## semiparametric model 
out <- comp.risk(Event(time,cause)~const(tcell)+const(platelet),data=bmtw,cause=1,
 cens.weights=bmtw$cw,weights=bmtw$weights,times=times,n.sim=0)
summary(out)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.