#' Sample-Weighted Prevalence-Incidence Mixture Models for Competing Incident Events (as of 03/03/2020)
#'
#' This package fits competing risks models to failure time (or survival time) data for two competing events.
#' Failure time for one event of the competing events can be prevalent left-censored, interval-censored or a mixture
#' of truly incident disease and missed prevalent disease when disease ascertainment is not always conducted at baseline,
#' while failure time for the other event is only interval-censored. Baseline is set to be time 0.
#' General transformation,G(x)=(1+r*x)/r if r>0; =x if r=0, is used for a subdistribution hazard function multiplied by an exponential effect of a linear combination of risk factors
#' for flexible incidence models. Logistic regression models are used for prevalence. The IPW log-likelihood approach, which uses the inverse of sample inclusion probabilities,
#' is employed to account for different sampling fractions across strata.
#'
#' @importFrom fdrtool gcmlcm
#' @importFrom nloptr cobyla
#'
#' @param Data Data used to fit the model containing columns for each term in p.model, i.model1 and i.model2 expressions.
#' For stratified random sampling designs, columns denoted samp.wgt and strata are expected indicating the sampling weights and sampling strata.
#' population="super" option, an additional column denoted strata.frac is expected indicating the fraction of the population
#' that consists of each strata. For example, if in the target population there are three strata that occurs with proportions 0.2, 0.4, and 0.6,
#' then strata.frac will take values of 0.2, 0.4 or 0.6.
#' @param p.model The prevalence model for event 1 to be fitted, specified using an expression of the form \emph{C~model}.
#' Elements in the expression are as followed:
#' \itemize{
#' \item c - Numeric variable indicating whether the event was prevalent at time zero, taking values of 1="Yes", 0="No", -999="Unknown";
#' \item model - Linear predictor consisting of a series of terms separated by \emph{+} operators.
#' }
#' @param i.model1 The incidence model for event 1 to be fitted using an expression of the form \emph{C+L1+R1~model1}
#' \itemize{
#' \item C - Numeric variable indicating whether the event was prevalent at time zero, taking values of 1="Yes", 0="No", -999="Unknown";
#' \item L1 - Numeric starting time of the interval in which event 1 occurred, with -999 denoting known prevalent events;
#' \item R1 - Ending time of the interval in which event 1 occurred, with -999 and Inf denoting known prevalent event 1 and right-censoring, respectively;
#' \item model1 - Linear predictor consisting of a series of terms separated by \emph{+} operators.
#' }
#' @param i.model2 The incidence model for event 2 to be fitted, specified using an expression of the form \emph{L2+R2~model2}
#' \itemize{
#' \item L2 - Numeric starting time of the interval in which event 1 occurred, with -999 denoting known prevalent event 1;
#' \item R2 - Ending time of the interval in which event 1 occurred, with -999 and Inf denoting known prevalent event 1 and right-censoring, respectively;
#' \item model2 - Linear predictor consisting of a series of terms separated by \emph{+} operators.
#' }
#' @param trans.r1 The parameter "r" for the transformation function for event 1, G(x)=log(1+rx)/r for r>0;G(x)=x for r=0 (default),which indicates proportional hazards model for the subdistribution hazard function.
#' @param trans.r2 The parameter "r" for the transformation function for event 2. Default to 0.
#' @param n.beta is The number of regressors expressed in the p.model plus 1 (for intercept). If p.model is "C~1", n.beta=1.
#' @param n.gamma1 The number of regressors expressed in the i.model1. If i.model1 is "C+L1+R1~1", n.gamma1=0.
#' @param n.gamma2 The number of regressors expressed in the i.model2. If i.model2 is "L2+R2~1", n.gamma2=0.
#' @param reg.initials The initial values for regression coefficients in the order of (p.model, i.model1, i.model2). The number of components for reg.initials is n.beta+n.gamma1+n.gamma2. Default to be NULL.
#' @param convergence.criteria The criterion for the convergence of the iterated algorithm. Default to 0.001
#' @param iteration.limit The maximum number allowed for the iteration of the algorithm. Default to 250.
#' @param time.interval time.interval determines how finner finite time points are evenly divided, at which subdistribution hazard functions are estimated. Default to 0.1.
#' @param time.list a vector of finite time points at which subdistribution hazard functions are estimated. Default to NULL. For example, when an irregular spaced time points are of interest, time.list=c(1,3,8,10).
#' @param population options="super" and "finite" include variation due to super-population sampling and finite sampling from the super-population and variation due to finite sampling from a finite population, respectively. Default to "super".
#' @param anal.var analytical variance estimation is provided when anal.var=TRUE and the inverse information matrix exists. Default to TRUE.
#'
#' @return The output is a list of class ipw.pi.competing.risks, which contains the following elements.
#' \itemize{
#' \item data.summary: A data frame containing the following:
#' Included subjects - number of observations with complete data;
#' Known prevalent event 1 - the number of events known to be prevalent at time zero;
#' Incident event 1 - the number of event times for event 1 occuring in the interval (L1>=0,R1<Inf] and C=0;
#' Incident event 2 - the number of event times for event 2 occuring in the interval (L2>=0,R2<Inf] and C=0;
#' Left censored event 1 - the number of event times known to occur by R1<Inf, but can also have been prevalent at time zero, that is C=-999;
#' Right censoring - the number of observations right-censored with event time occurring in the interval (L1>0,R1=Inf) or (L2>0,R2=Inf) with C=0;
#' Missing prevalent+right censoring - the number of observations with intervals (0,Inf) for both events 1 and 2 and C=-999.
#' Maximum follow-up time for event 1
#' Maximum follow-up time for event 2
#' Maximum right censoring time for event 1
#' Maximum right censoring time for event 2
#' \item reg.coef: A data frame summarizing parameter values, standard errors, and 95 percent confidence intervals.
#' \item reg.covariance: The analytical asymptotic covariance matrix exists for the regression coefficient estimates
#' \item prevalence: A vector of prevalence estimates given covariates specified in p.model
#' \item subdist.hazard1: A data frame includes two columns, time and estimated subdistribution hazard for event 1.
#' \item subdist.hazard2: A data frame includes two columns, time and estimated subdistribution hazard for event 2.
#' \item subdist.hazard.fn1: A function returns estimated subdistribution hazard for event 1 when time points are input.
#' \item subdist.hazard.fn2: A function returns estimated subdistribution hazard for event 2 when time points are input.
#' \item convergence: Convergence statistics
#' \item run.time.mins: The elapsed computation time in minutes.
#' \item loglike: Sample-weighted log-likelikelihood at the estimated parameters
#' \item trans.r: A vector of the specified parameters for the transformation functions used for events 1 and 2
#' \item models: A vector of the specified models, p.model, i.model1 and i.model2
#' }
#'
#' @author Noorie Hyun, \email{nhyun@mcw.edu}, Xiao Li \email{xiaoli@mcw.edu}
#'
#' @references
#' \itemize{
#' \item Hyun N, Katki HA, Graubard BI.
#' Sample-Weighted Semiparametric Estimation of Cause-Specific Cumulative Risk and Incidence Using Left or Interval-Censored Data from Electronic Health Records. Statistics in Medicine 2020;
#' under the 2nd review.
#' }
#'
#' @export
#'
ipw.pi.competing<-function(Data,p.model,i.model1,i.model2,trans.r1=0,trans.r2=0,n.beta=1, n.gamma1=0,n.gamma2=0,
reg.initials=NULL,convergence.criteria=0.001,iteration.limit=250,
time.interval=0.1,time.list=NULL,population="super",anal.var=TRUE,...){
#library('fdrtool') is required for Lambda estimates
#library('nloptr') is required for regression parameter estimates with constraints
#population option "super" or "finite"
########## Transformation functions #############################
trans<-function(r,x){
if(r==0){
output<-x
}else if(r>0){
output<-log(1+r*x)/r
}
return(output)
}
dG<-function(r,x){
if(r==0){
output<-1
}else if(r>0){
output<-1/(1+r*x)
}
return(output)
}
inv.trans<-function(r,x){
if(r==0){
output<-x
}else if(r>0){
output<- (exp(r*x)-1)/r
}
return(output)
}
######################### semiparametric weighted log-like ########################################################
wobs.semipara.like<-function(na.drop=FALSE,trans.r1,trans.r2,para,n.beta,n.gamma1,n.gamma2,design.mat,xmat1,xmat2,sdata3){
#sdata3 should include the variables of LambdaL and LambddaR
n.para<-n.beta+n.gamma1+n.gamma2
mm<-n.beta+1
mm2<-n.beta+n.gamma1
mm3<-n.beta+n.gamma1+1
betta<-para[1:n.beta]
if(n.gamma1>0&n.gamma2>0){
gam1<-para[mm:mm2]
gam2<-para[mm3:n.para]
gamma.risk1<-xmat1%*%gam1
gamma.risk2<-xmat2%*%gam2
} else if(n.gamma1==0&n.gamma2>0){
gam1<-0
gam2<-para[mm3:n.para]
gamma.risk1<-0
gamma.risk2<-xmat2%*%gam2
} else if(n.gamma1>0&n.gamma2==0){
gam1<-para[mm:mm2]
gam2<-0
gamma.risk1<-xmat1%*%gam1
gamma.risk2<-0
}else if(n.gamma1==0&n.gamma2==0){
gam1<-0
gam2<-0
gamma.risk1<-0
gamma.risk2<-0
}
if(n.beta>1){
beta.risk<-design.mat%*%betta
}else if(n.beta==1){
beta.risk<-design.mat*betta
}
LambdaL1<-sdata3$LambdaL1
LambdaR1<-sdata3$LambdaR1
LambdaL2<-sdata3$LambdaL2
LambdaR2<-sdata3$LambdaR2
event<-sdata3$event
C<-sdata3$C
samp.wgt<-sdata3$samp.wgt
# 3 cases
#C=-999 and event=1 or -999
#C=-999 and event=2/3 ;C=0 and event=1/2/3
#C=1
#event==1/-999
gp1<-which(C==0&event==1)
gp2<-which(C==0&event==2)
gp3<-which(C==0&event==3)
gp4<-which(C==-999&event==1)
gp5<-which(C==-999&event==3)
prev<-exp(beta.risk)/(1+exp(beta.risk))
lp<-log(prev)*samp.wgt
lcomp<-log(1-prev)*samp.wgt
e.risk1<-as.numeric(exp(gamma.risk1))
e.risk2<-as.numeric(exp(gamma.risk2))
######### Transformation of Lambda(t)*e.risk #################
G.L1<-trans(trans.r1,LambdaL1*e.risk1)
G.R1<-trans(trans.r1,LambdaR1*e.risk1)
G.L2<-trans(trans.r2,LambdaL2*e.risk2)
G.R2<-trans(trans.r2,LambdaR2*e.risk2)
cond1<-exp(-G.L1)-exp(-G.R1)
cond2<-exp(-G.L2)-exp(-G.R2)
cond3<-exp(-G.L1)+exp(-G.L2)
sgr1<-which(cond1>0)
sgr2<-which(cond2>0)
sgr3<-which(cond3>1)
B.1<-log( (exp(-G.L1)-exp(-G.R1))[intersect(gp1,sgr1)] )*samp.wgt[intersect(gp1,sgr1)]
B.2<-log( (exp(-G.L2)-exp(-G.R2))[intersect(gp2,sgr2)])*samp.wgt[intersect(gp2,sgr2)]
if(na.drop==TRUE){
B.3<-log( (exp(-G.L1)+exp(-G.L2)-1 )[intersect(gp3,sgr3)])*samp.wgt[intersect(gp3,sgr3)]
}else if(na.drop==FALSE){
B.3<-log( (exp(-G.L1)+exp(-G.L2)-1 )[gp3])*samp.wgt[gp3]
}
#B.3[is.na(B.3)|B.3==-Inf]<-0
B.4<-log( (prev+(1-prev)*(1-exp(-G.R1)))[gp4])*samp.wgt[gp4]
B.5<-log( (prev+(1-prev)*exp(-G.L2))[gp5])*samp.wgt[gp5]
output<-sum(lp[C==1],lcomp[C==0],B.1,B.2,B.3,B.4,B.5)
return(output)
} #pseudo-log-likelihood
wsemi.opt <- function(para){
-wobs.semipara.like(na.drop=FALSE,trans.r1,trans.r2,para,n.beta,n.gamma1,n.gamma2,design.mat,xmat1,xmat2,sdata3)
}
wsemi.opt.subgr <- function(para){
-wobs.semipara.like(na.drop=TRUE,trans.r1,trans.r2,para,n.beta,n.gamma1,n.gamma2,design.mat,xmat1,xmat2,sdata3)
}
score.f<-function(trans.r1,trans.r2,para,n.beta,n.gamma1,n.gamma2,design.mat,xmat1,xmat2,sdata3){
n.samp<-nrow(sdata3)
n.para<-n.beta+n.gamma1+n.gamma2
mm<-n.beta+1
mm2<-n.beta+n.gamma1
mm3<-n.beta+n.gamma1+1
betta<-para[1:n.beta]
if(n.beta>1){
beta.risk<-design.mat%*%betta
}else if(n.beta==1){
beta.risk<-design.mat*betta
}
if(n.gamma1>0){
gam1<-para[mm:mm2]
gamma.risk1<-xmat1%*%gam1
} else if(n.gamma1==0){
gam1<-0
gamma.risk1<-0
}
if(n.gamma2>0){
gam2<-para[mm3:n.para]
gamma.risk2<-xmat2%*%gam2
} else if(n.gamma2==0){
gam2<-0
gamma.risk2<-0
}
LambdaL1<-sdata3$LambdaL1
LambdaR1<-sdata3$LambdaR1
LambdaL2<-sdata3$LambdaL2
LambdaR2<-sdata3$LambdaR2
event<-sdata3$event
C<-sdata3$C
samp.wgt<-sdata3$samp.wgt
# 3 cases
#C=-999 and event=1 or -999
#C=-999 and event=2/3 ;C=0 and event=1/2/3
#C=1
prev<-exp(beta.risk)/(1+exp(beta.risk))
comp.prev<-1/(1+exp(beta.risk))
lp<-log(prev)
lcomp<-log(1-prev)
e.risk1<-as.numeric(exp(gamma.risk1))
e.risk2<-as.numeric(exp(gamma.risk2))
e.beta.risk<-as.numeric(exp(beta.risk))
gp1<-which(C==0&event==1)
gp2<-which(C==0&event==2)
gp3<-which(C==0&event==3)
gp4<-which(C==-999&event==1)
gp5<-which(C==-999&event==3)
L1.e1<-LambdaL1*e.risk1
R1.e1<-LambdaR1*e.risk1
L2.e2<-LambdaL2*e.risk2
R2.e2<-LambdaR2*e.risk2
G.L1<-trans(trans.r1,L1.e1)
G.R1<-trans(trans.r1,R1.e1)
G.L2<-trans(trans.r2,L2.e2)
G.R2<-trans(trans.r2,R2.e2)
S2.L<-exp(- G.L2)
S2.R<-exp(-G.R2)
S1.L<-exp(-G.L1)
S1.R<-exp(-G.R1)
#gradient of beta
gr.beta<-matrix(rep(n.samp*n.beta),n.samp,n.beta)
beta.cons<-rep(0,n.samp)
beta.cons[C==1]<-comp.prev[C==1] #C==1
beta.cons[C==0]<-(-prev)[C==0]
beta.cons[gp5]<-(prev*(1-S2.L)/(e.beta.risk+S2.L))[gp5]
beta.cons[gp4]<-(prev*S1.R/(e.beta.risk+1-S1.R))[gp4]
gr.beta<-beta.cons*design.mat*samp.wgt
gr.beta.nowt<-beta.cons*design.mat
#gradient of gamma1
if(n.gamma1>0){
gamma1.cons<-rep(0,n.samp)
gamma1.cons[gp1]<-((S1.R*dG(trans.r1,R1.e1)*R1.e1-S1.L*dG(trans.r1,L1.e1)*L1.e1)/(S1.L-S1.R) )[gp1]
gamma1.cons[gp3]<-(-S1.L*dG(trans.r1,L1.e1)*L1.e1/(S1.L+S2.L-1) )[gp3]
gamma1.cons[gp4]<-(S1.R*dG(trans.r1,R1.e1)*R1.e1/(e.beta.risk+1-S1.R) )[gp4]
gr.gamma1<-xmat1*gamma1.cons*samp.wgt
gr.gamma1.nowt<-xmat1*gamma1.cons
}
if(n.gamma2>0){
gamma2.cons<-rep(0,n.samp)
gamma2.cons[gp2]<-((S2.R*dG(trans.r2,R2.e2)*R2.e2-S2.L*dG(trans.r2,L2.e2)*L2.e2)/(S2.L-S2.R) )[gp2]
gamma2.cons[gp3]<-((-S2.L*dG(trans.r2,L2.e2)*L2.e2)/(S1.L+S2.L-1) )[gp3]
gamma2.cons[gp5]<-(-S2.L*dG(trans.r2,L2.e2)*L2.e2/(e.beta.risk+S2.L) )[gp5]
gr.gamma2<-xmat2*gamma2.cons*samp.wgt
gr.gamma2.nowt<-xmat2*gamma2.cons
}
if(n.gamma1>0&n.gamma2>0){
output<-cbind(gr.beta,gr.gamma1,gr.gamma2)
output.nowt<-cbind(gr.beta.nowt,gr.gamma1.nowt,gr.gamma2.nowt)
}else if(n.gamma1>0&n.gamma2==0){
output<-cbind(gr.beta,gr.gamma1)
output.nowt<-cbind(gr.beta.nowt,gr.gamma1.nowt)
}else if(n.gamma1==0&n.gamma2>0){
output<-cbind(gr.beta,gr.gamma2)
output.nowt<-cbind(gr.beta.nowt,gr.gamma2.nowt)
}else if(n.gamma1==0&n.gamma2==0){
output<-gr.beta
output.nowt<-gr.beta.nowt
}
output22<-list()
output[is.na(output)]<-0
output.nowt[is.na(output.nowt)]<-0
output22$score.vec<-output
output22$score.vec.nowt<-output.nowt
output22$score<-colSums(output,na.rm=TRUE)
return(output22)
} #score.f
#In the simulation setting prep1_5 with no covariate, this initial values work well.
sub.haz.ini2<-function(x){
hazard.ini<-0.15*(1-exp(-0.15*x))
return(hazard.ini)
}
#In the simulation setting prep2_1; this initial values work well, in particular, w.r.t gamma2.1 estimates and Lambda 2 estimate.
sub.haz.ini1<-function(x){
hazard.ini<-0.25*(1-exp(-0.25*x))
return(hazard.ini)
}
grad.f<-function(para){
gradient<-score.f(trans.r1,trans.r2,para,n.beta,n.gamma1,n.gamma2,design.mat,xmat1,xmat2,sdata3)
return(-gradient$score)
}
cov.mat.ph1<-function(delta.theta,samp.data){
n.para<-ncol(delta.theta)
cov.mat<-0*diag(n.para)
aaa<-which(table(samp.data$strata)!=1)
strata.list<-as.numeric(names(aaa))
for (k in strata.list){
average<-apply(delta.theta[samp.data$strata==k,],2,mean)
delta.theta.gp<-delta.theta[samp.data$strata==k,]
n.gp<-nrow(delta.theta.gp)
ave.mat<-matrix(average,n.gp,n.para,1)
c.delta.theta.gp<-delta.theta.gp-ave.mat
cov.mat<-cov.mat+n.gp/(n.gp-1)*(t(c.delta.theta.gp)%*%c.delta.theta.gp)
}
return(cov.mat)
}#the end of the function cov.mat.ph1
cov.mat.ph2<-function(score.nowt,samp.data){
n.para<-ncol(score.nowt)
cov.mat<-0*diag(n.para)
aaa<-which(table(samp.data$strata)!=1)
strata.list<-as.numeric(names(aaa))
for (k in strata.list){
score.by.strata<-score.nowt[samp.data$strata==k,]
n.gp<-nrow(score.by.strata)
strata.fraction<-as.numeric(samp.data$strata.frac[samp.data$strata==k])[1]
samp.data$s.frc<-1/samp.data$samp.wgt
samp.fraction<-as.numeric(samp.data$s.frc[samp.data$strata==k])[1]
A<- (1/n.gp)*(t(score.by.strata)%*%score.by.strata)
B<-colMeans(score.by.strata)
B2<-B%*%t(B)
cond.var<-A-B2
cov.mat<-cov.mat+(strata.fraction*(1-samp.fraction)/samp.fraction)*(cond.var)
}
return(cov.mat)
}#the end of the function cov.mat.ph2
Lambda.initials<-function(time.list,Lambda,missing.upper,Data,event,time.1,time.2,cap.Lambda){
#Data may not include LambdaL and LambdaR
if(event==1){
#Lambda1<-time.list1*0.015
Lambda.data<-data.frame(time=time.list,Lambda=Lambda)
Lambda.index<-rbind(Lambda.data,rep(0,2),c(Inf,cap.Lambda),rep(missing.upper,2))
sdata<-Data
sdata2<-merge(sdata,Lambda.index,by.x="L1",by.y="time",all.x=TRUE)
colnames(sdata2)[ncol(sdata2)]<-"LambdaL1"
sdata3<-merge(sdata2,Lambda.index,by.x="R1",by.y="time",all.x=TRUE)
colnames(sdata3)[ncol(sdata3)]<-"LambdaR1"
sdata3$LambdaL1[sdata3$L1<time.1]<-0
sdata3$LambdaL1[sdata3$L1>time.2&sdata3$L1<missing.upper]<-max(sdata3$LambdaL1[sdata3$LambdaL1<missing.upper],na.rm=TRUE)
sdata3$LambdaR1[sdata3$R1>time.2&sdata3$R1<missing.upper]<-max(sdata3$LambdaL1[sdata3$LambdaL1<missing.upper],na.rm=TRUE)
}else if(event==2){
Lambda.data<-data.frame(time=time.list,Lambda=Lambda)
Lambda.index<-rbind(Lambda.data,rep(0,2),c(Inf,cap.Lambda),rep(missing.upper,2))
sdata<-Data
sdata2<-merge(sdata,Lambda.index,by.x="L2",by.y="time",all.x=TRUE)
colnames(sdata2)[ncol(sdata2)]<-"LambdaL2"
sdata3<-merge(sdata2,Lambda.index,by.x="R2",by.y="time",all.x=TRUE)
colnames(sdata3)[ncol(sdata3)]<-"LambdaR2"
sdata3$LambdaL2[sdata3$L2<time.1]<-0
sdata3$LambdaL2[sdata3$L2>time.2&sdata3$L2<missing.upper]<-max(sdata3$LambdaL2[sdata3$LambdaL2<missing.upper],na.rm=TRUE)
sdata3$LambdaR2[sdata3$R2>time.2&sdata3$R2<missing.upper]<-max(sdata3$LambdaL2[sdata3$LambdaL2<missing.upper],na.rm=TRUE)
}
output<-list()
output$sdata3<-sdata3
output$Lambda.data<-Lambda.data
output$Lambda.index<-Lambda.index
return(output)
}
L1update<-function(trans.r1,trans.r2,Data,tt.list,Lambda.data,missing.upper){
#Data was updated with new regression parameters estimation (Data<-sdata3)
#Data should include LambdaL1, LambdaR1, LambdaL2 and LambdaR2
#sdata<-samp.data
ssdata<-Data
n.samp<-nrow(ssdata)
ssdata$WL1<-rep(0,n.samp)
ssdata$WL2<-rep(0,n.samp)
ssdata$WL3<-rep(0,n.samp)
ssdata$WL4<-rep(0,n.samp)
samp.wgt<-ssdata$samp.wgt
exc1<-which(abs(ssdata$LambdaR1-ssdata$LambdaL1)<1e-10)
exc2<-which(is.na(ssdata$LambdaL1))
exc3<-which(is.na(ssdata$LambdaR1))
condition<-exp(-trans(trans.r1,ssdata$LambdaL1*ssdata$HR1) )+exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2))
exc4<-which( condition<=1)
subset1<-which(ssdata$C==0&ssdata$event==1&ssdata$L1>0)
subset2<-which(ssdata$C==0&ssdata$event==1&ssdata$R1<missing.upper)
subset3<-which(ssdata$C==0&ssdata$event==3&ssdata$L1>0)
subset4<-which(ssdata$C==-999&ssdata$event==1&ssdata$R1<missing.upper)
# weighted process
WL1<- -exp(-trans(trans.r1,ssdata$LambdaL1*ssdata$HR1) )*ssdata$HR1*dG(trans.r1,ssdata$LambdaL1*ssdata$HR1)/(exp(-trans(trans.r1,ssdata$LambdaL1*ssdata$HR1) )-exp(-trans(trans.r1,ssdata$LambdaR1*ssdata$HR1) ))
ssdata$WL1[subset1]<-WL1[subset1]*samp.wgt[subset1]
ssdata$WL1[union(exc1,exc2)]<-0
# weighted process
WL2<- exp(-trans(trans.r1,ssdata$LambdaR1*ssdata$HR1) )*ssdata$HR1*dG(trans.r1,ssdata$LambdaR1*ssdata$HR1)/(exp(-trans(trans.r1,ssdata$LambdaL1*ssdata$HR1) )-exp(-trans(trans.r1,ssdata$LambdaR1*ssdata$HR1) ))
ssdata$WL2[subset2]<-WL2[subset2]*samp.wgt[subset2]
ssdata$WL2[union(exc1,exc3)]<-0
# weighted process
WL3<- -exp(-trans(trans.r1,ssdata$LambdaL1*ssdata$HR1) )*ssdata$HR1*dG(trans.r1,ssdata$LambdaL1*ssdata$HR1)/(exp(-trans(trans.r1,ssdata$LambdaL1*ssdata$HR1) )+exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2) )-1)
ssdata$WL3[subset3]<-WL3[subset3]*samp.wgt[subset3]
ssdata$WL3[union(exc4,exc2)]<-0
# weighted process
WL4<-exp(-trans(trans.r1,ssdata$LambdaR1*ssdata$HR1) )*ssdata$HR1*dG(trans.r1,ssdata$LambdaR1*ssdata$HR1)/(ssdata$expplus-exp(-trans(trans.r1,ssdata$LambdaR1*ssdata$HR1) ))
ssdata$WL4[subset4]<-WL4[subset4]*samp.wgt[subset4]
ssdata$WL1.sq<-ssdata$WL1^2
ssdata$WL2.sq<-ssdata$WL2^2
ssdata$WL3.sq<-ssdata$WL3^2
ssdata$WL4.sq<-ssdata$WL4^2
k<-0
Q_L<-0
W_L<-0
G_L<-0
TL<-nrow(Lambda.data)
Q_Lambda<-dG_Lambda<-G_Lambda<-W_Lambda<-rep(NA,TL)
for (ordered.t in Lambda.data$time){
k<-k+1
L.process<-as.numeric(ssdata$L1<=ordered.t)
R.process<-as.numeric(ssdata$R1<=ordered.t)
W_Lambda[k]<-sum(R.process*(ssdata$WL2+ssdata$WL4)+L.process*(ssdata$WL1+ssdata$WL3),na.rm=TRUE)
G_Lambda[k]<-sum(R.process*(ssdata$WL2.sq+ssdata$WL4.sq)+L.process*(ssdata$WL1.sq+ssdata$WL3.sq),na.rm=TRUE)
if(k==1){dG_Lambda[k]<-G_Lambda[k]
} else if(k>1){
dG_Lambda[k]<-G_Lambda[k]-G_Lambda[k-1]
}
Q_L<- Q_L+Lambda.data$Lambda[k]*dG_Lambda[k]
Q_Lambda[k]<-W_Lambda[k]+Q_L
}
no.change<-which(dG_Lambda==0)
if(length(no.change)>0){
G_Lambda<-G_Lambda[-no.change]
Q_Lambda<-Q_Lambda[-no.change]
Lambda.data<-Lambda.data[-no.change,]
}
G_Lambda<-c(0,G_Lambda)
Q_Lambda<-c(0,Q_Lambda)
if(length(G_Lambda)!=length(unique(G_Lambda))){
stop("The algorithm for estimating subdistribution hazard function for event 1 does not converge.")
}
GCM = gcmlcm(G_Lambda, Q_Lambda)
time.knots<-which(G_Lambda[-1] %in% GCM$x.knots[-1])
update.Lambda<-data.frame(time=Lambda.data$time[time.knots],Lambda=GCM$slope.knots)
update.Lambda$Lambda[update.Lambda$Lambda<0]<-0
sfun.1 <- stepfun(update.Lambda$time, c(update.Lambda$Lambda[1],update.Lambda$Lambda), f = 1) #CADLAG function
new.Lambda<-data.frame(time=tt.list,Lambda=sfun.1(tt.list))
output<-list()
output$new.Lambda<-new.Lambda
output$sfun<-sfun.1
return(output)
}#L1update
L2update<-function(trans.r1,trans.r2,Data,tt.list,Lambda.data,missing.upper){
#Data was updated with new regression parameters estimation
#Data should include LambdaL1, LambdaR1, LambdaL2 and LambdaR2
#sdata<-samp.data
ssdata<-Data
n.samp<-nrow(ssdata)
ssdata$WL1<-rep(0,n.samp)
ssdata$WL2<-rep(0,n.samp)
ssdata$WL3<-rep(0,n.samp)
ssdata$WL4<-rep(0,n.samp)
samp.wgt<-ssdata$samp.wgt
exc1<-which(abs(ssdata$LambdaR2-ssdata$LambdaL2)<1e-10)
exc2<-which(is.na(ssdata$LambdaL2))
exc3<-which(is.na(ssdata$LambdaR2))
condition<-exp(-trans(trans.r1,ssdata$LambdaL1*ssdata$HR1) )+exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2) )
exc4<-which( condition<=1)
subset1<-which(ssdata$C==0&ssdata$event==2&ssdata$L2>0)
subset2<-which(ssdata$C==0&ssdata$event==2&ssdata$R2<missing.upper)
subset3<-which(ssdata$C==0&ssdata$event==3&ssdata$L2>0)
subset4<-which(ssdata$C==-999&ssdata$event==3&ssdata$L2>0)
# weighted process
WL1<- -exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2))*ssdata$HR2*dG(trans.r2,ssdata$LambdaL2*ssdata$HR2)/(exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2) )-exp(-trans(trans.r2,ssdata$LambdaR2*ssdata$HR2) ))
ssdata$WL1[subset1]<-WL1[subset1]*samp.wgt[subset1]
ssdata$WL1[union(exc1,exc2)]<-0
# weighted process
WL2<- exp(-trans(trans.r2,ssdata$LambdaR2*ssdata$HR2) )*ssdata$HR2*dG(trans.r2,ssdata$LambdaR2*ssdata$HR2)/(exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2) )-exp(-trans(trans.r2,ssdata$LambdaR2*ssdata$HR2) ))
ssdata$WL2[subset2]<-WL2[subset2]*samp.wgt[subset2]
ssdata$WL2[union(exc1,exc3)]<-0
# weighted process
WL3<- -exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2) )*ssdata$HR2*dG(trans.r2,ssdata$LambdaL2*ssdata$HR2)/(exp(-trans(trans.r1,ssdata$LambdaL1*ssdata$HR1) )+exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2) )-1)
ssdata$WL3[subset3]<-WL3[subset3]*samp.wgt[subset3]
ssdata$WL3[union(exc4,exc1)]<-0
# weighted process
WL4<- -exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2) )*ssdata$HR2*dG(trans.r2,ssdata$LambdaL2*ssdata$HR2)/(rep(-1,n.samp)+ssdata$expplus+exp(-trans(trans.r2,ssdata$LambdaL2*ssdata$HR2) ))
ssdata$WL4[subset4]<-WL4[subset4]*samp.wgt[subset4]
ssdata$WL1.sq<-ssdata$WL1^2
ssdata$WL2.sq<-ssdata$WL2^2
ssdata$WL3.sq<-ssdata$WL3^2
ssdata$WL4.sq<-ssdata$WL4^2
k<-0
Q_L<-0
W_L<-0
G_L<-0
TL<-length(tt.list)
Q_Lambda<-dG_Lambda<-G_Lambda<-W_Lambda<-rep(NA,TL)
for (ordered.t in Lambda.data$time){
k<-k+1
L.process<-as.numeric(ssdata$L2<=ordered.t)
R.process<-as.numeric(ssdata$R2<=ordered.t)
W_Lambda[k]<-sum(R.process*(ssdata$WL2)+L.process*(ssdata$WL1+ssdata$WL3+ssdata$WL4),na.rm=TRUE)
G_Lambda[k]<-sum(R.process*(ssdata$WL2.sq)+L.process*(ssdata$WL1.sq+ssdata$WL3.sq+ssdata$WL4.sq),na.rm=TRUE)
if(k==1){dG_Lambda[k]<-G_Lambda[k]
} else if(k>1){
dG_Lambda[k]<-G_Lambda[k]-G_Lambda[k-1]
}
Q_L<- Q_L+Lambda.data$Lambda[k]*dG_Lambda[k]
Q_Lambda[k]<-W_Lambda[k]+Q_L
}
no.change<-which(dG_Lambda==0)
if(length(no.change)>0){
G_Lambda<-G_Lambda[-no.change]
Q_Lambda<-Q_Lambda[-no.change]
Lambda.data<-Lambda.data[-no.change,]
}
G_Lambda<-c(0,G_Lambda)
Q_Lambda<-c(0,Q_Lambda)
if(length(G_Lambda)!=length(unique(G_Lambda))){
stop("The algorithm for estimating subdistribution hazard function for event 2 does not converge.")
}
GCM = gcmlcm(G_Lambda, Q_Lambda)
time.knots<-which(G_Lambda[-1] %in% GCM$x.knots[-1])
update.Lambda<-data.frame(time=Lambda.data$time[time.knots],Lambda=GCM$slope.knots)
update.Lambda$Lambda[update.Lambda$Lambda<0]<-0
sfun.1 <- stepfun(update.Lambda$time, c(update.Lambda$Lambda[1],update.Lambda$Lambda), f = 1) #CADLAG function
new.Lambda<-data.frame(time=tt.list,Lambda=sfun.1(tt.list))
#return(new.Lambda)
output<-list()
output$new.Lambda<-new.Lambda
output$sfun<-sfun.1
return(output)
}#L2update
wPI.get.summary<- function(samp.data) {
n.samp<-nrow(samp.data)
prevalent.event1<- sum(as.numeric(samp.data$C==1&samp.data$event==1))
incident.event1<-sum(as.numeric(samp.data$C==0&samp.data$event==1))
incident.event2<-sum(as.numeric(samp.data$C==0&samp.data$event==2))
right.censoring.C0<-sum(as.numeric(samp.data$C==0&samp.data$event==3))
right.censoring.C999<-sum(as.numeric(samp.data$C==-999&samp.data$event==3))
left.censoring.event1<-sum(as.numeric(samp.data$C==-999&samp.data$event==1))
max.event1.time<-max(samp.data$R1[samp.data$R1<Inf])
max.event2.time<-max(samp.data$R2[samp.data$R2<Inf])
max.right.censoring.time1<-max(samp.data$L1[samp.data$L1>0&samp.data$R1==Inf])
max.right.censoring.time2<-max(samp.data$L2[samp.data$L2>0&samp.data$R2==Inf])
data.summary1<-data.frame(c("Included subjects","Known prevalent event1","Incident event1","Incident event2",
"Left censored event1","Right censoring","Missing prevalent+right censoring", "maximum FU event1",
"maximum FU event2","maximum right censonring time event1","maximum right censonring time event2"),
c(n.samp,prevalent.event1,incident.event1,incident.event2,left.censoring.event1,
right.censoring.C0,right.censoring.C999,max.event1.time,max.event2.time,
max.right.censoring.time1,max.right.censoring.time2), stringsAsFactors=FALSE)
names(data.summary1)<- c("label", "no.obs")
return(data.summary1)
}
prev.fun<-function(beta){
prev<-exp(beta)/(1+exp(beta))
return(prev)
}
################ Data Preparation ####################
#change variable names for outcomes, L, R
fml0<-as.formula(p.model)
fml1<-as.formula(i.model1)
fml2<-as.formula(i.model2)
if(attr(terms(fml0),"response")!=1){
stop("No response variable in model1")
}
if(attr(terms(fml1),"response")!=1){
stop("No response variable in model1")
}
if(attr(terms(fml2),"response")!=1){
stop("No response variable in model2")
}
if (!(c("event") %in% names(Data))) {
stop("No event variable in the data")
}
allvars0<-all.vars(fml0)
allvars1<-all.vars(fml1)
allvars2<-all.vars(fml2)
names(Data)[which(names(Data) %in% allvars1[1])]<-"C"
names(Data)[which(names(Data) %in% allvars1[2])]<-"L1"
names(Data)[which(names(Data) %in% allvars1[3])]<-"R1"
names(Data)[which(names(Data) %in% allvars2[1])]<-"L2"
names(Data)[which(names(Data) %in% allvars2[2])]<-"R2"
allvars0[1]<-c("C")
allvars1[1:3]<-c("C","L1","R1")
allvars2[1:2]<-c("L2","R2")
#change the response variable name in p.model formula
p.model.split<-strsplit(p.model,split="~")
p.model1<-paste("C~",p.model.split[[1]][2])
fml0<-as.formula(p.model1)
i.model1.split<-strsplit(i.model1,split="~")
i.model11<-paste("C+L1+R1~",i.model1.split[[1]][2])
fml1<-as.formula(i.model11)
i.model2.split<-strsplit(i.model2,split="~")
i.model22<-paste("L2+R2~",i.model2.split[[1]][2])
fml2<-as.formula(i.model22)
remove(p.model1,i.model11,i.model22)
#change the response variable name in i.model formula
if(c("samp.wgt") %in% names(Data)) {
allvars<-c(allvars0,allvars1,allvars2,"samp.wgt")
if(sum(Data$samp.wgt<0)>0){
stop("\"samp.weight\" should be positive.")
}
} else if(!(c("samp.wgt") %in% names(Data))){
allvars<-c(allvars0,allvars1,allvars2,"samp.wgt")
Data$samp.wgt<-1
}
if(c("strata") %in% names(Data)) {
allvars<-c(allvars,"strata")
} else if(!(c("strata") %in% names(Data))){
allvars<-c(allvars,"strata")
Data$strata<-1
}
if(population=="super"){
if(c("strata.frac") %in% names(Data)){
allvars<-c(allvars,"strata.frac")
}
}
allvars<-c(allvars,"event")
allvars<-unique(allvars)
samp.data2<-Data[,allvars]
samp.data<-samp.data2[complete.cases(samp.data2),] #excluding cases with NA
###############################################################################################
#time.points2<-time points to be set for cumulative hazard estimate
#fixed.initials initial parameters for beta and gamma just in case when logistic model doesn't converge.
n.samp<-nrow(samp.data)
n.regpara<-n.beta+n.gamma1+n.gamma2
n.gamma<-n.gamma1+n.gamma2
mm<-n.beta+1
mm2<-n.beta+n.gamma1
mm3<-n.beta+n.gamma1+1
iter.limit<-400 #iteration for cumulative hazard function
########### Initial parameters ################################################
if(is.null(reg.initials)){
if(n.gamma1==0&n.gamma2==0){
if (n.beta==1){
initials1<--3
initials2<--3
}else if(n.beta>1){
initials1<-c(-3,rep(0.1,n.beta-1))
ini.data<-samp.data
ini.data<-ini.data[ini.data$C==1|ini.data$C==0,]
beta.ini<-glm(fml0, data = ini.data, family = "binomial")
initials2<-beta.ini$coefficients
remove(beta.ini,ini.data)
}
} else if(n.gamma1>0&n.gamma2>0) {
if(n.beta==1) {
initials1<-c(-3,rep(0.1,n.gamma))
ini.data<-samp.data
ini.data$event1<-as.numeric(samp.data$event==1)
ini.data$event2<-as.numeric(samp.data$event==2)
gam1.ini<-glm(paste0("event1~",i.model1.split[[1]][2],sep=""), data = ini.data, family = "binomial")
gam2.ini<-glm(paste0("event2~",i.model2.split[[1]][2],sep=""), data = ini.data, family = "binomial")
initials2<-c(-3,gam1.ini$coefficients[-1],gam2.ini$coefficients[-1])
remove(ini.data,gam1.ini,gam2.ini)
}else if(n.beta>1){
initials1<-c(-3,rep(0.1,n.beta-1),rep(0.1,n.gamma))
ini.data<-samp.data
ini.data$event1<-as.numeric(samp.data$event==1)
ini.data$event2<-as.numeric(samp.data$event==2)
gam1.ini<-glm(paste0("event1~",i.model1.split[[1]][2],sep=""), data = ini.data, family = "binomial")
gam2.ini<-glm(paste0("event2~",i.model2.split[[1]][2],sep=""), data = ini.data, family = "binomial")
ini.data<-ini.data[ini.data$C==1|ini.data$C==0,]
beta.ini<-glm(fml0, data = ini.data, family = "binomial")
initials2<-c(beta.ini$coefficients,gam1.ini$coefficients[-1],gam2.ini$coefficients[-1])
remove(ini.data,beta.ini,gam1.ini,gam2.ini)
}
} else if(n.gamma1>0&n.gamma2==0){
if(n.beta==1) {
initials1<-c(-3,rep(0.1,n.gamma1))
ini.data<-samp.data
ini.data$event1<-as.numeric(samp.data$event==1)
gam1.ini<-glm(paste0("event1~",i.model1.split[[1]][2],sep=""), data = ini.data, family = "binomial")
initials2<-c(-3,gam1.ini$coefficients[-1])
remove(ini.data,gam1.ini)
}else if(n.beta>1){
initials1<-c(-3,rep(0.1,n.beta-1),rep(0.1,n.gamma1))
ini.data<-samp.data
ini.data$event1<-as.numeric(samp.data$event==1)
gam1.ini<-glm(paste0("event1~",i.model1.split[[1]][2],sep=""), data = ini.data, family = "binomial")
ini.data<-ini.data[ini.data$C==1|ini.data$C==0,]
beta.ini<-glm(fml0, data = ini.data, family = "binomial")
initials2<-c(beta.ini$coefficients,gam1.ini$coefficients[-1])
remove(ini.data,beta.ini,gam1.ini)
}
} else if(n.gamma1==0&n.gamma2>0){
if(n.beta==1) {
initials1<-c(-3,rep(0.1,n.gamma2))
ini.data<-samp.data
ini.data$event2<-as.numeric(samp.data$event==2)
gam2.ini<-glm(paste0("event2~",i.model2.split[[1]][2],sep=""), data = ini.data, family = "binomial")
initials2<-c(-3,gam2.ini$coefficients[-1])
remove(ini.data,gam2.ini)
}else if(n.beta>1){
initials1<-c(-3,rep(0.1,n.beta-1),rep(0.1,n.gamma2))
ini.data<-samp.data
ini.data$event2<-as.numeric(samp.data$event==2)
gam2.ini<-glm(paste0("event2~",i.model2.split[[1]][2],sep=""), data = ini.data, family = "binomial")
ini.data<-ini.data[ini.data$C==1|ini.data$C==0,]
beta.ini<-glm(fml0, data = ini.data, family = "binomial")
initials2<-c(beta.ini$coefficients,gam2.ini$coefficients[-1])
remove(ini.data,beta.ini,gam2.ini)
}
} else {
if(length(reg.initials)!=n.regpara){
stop("The number of regression parameter initials are not matched with the model.")
} else {
initials2<-reg.initials
}
}
}
missing.upper1<-max(samp.data$R1[samp.data$R1<Inf],samp.data$L1)*1.5+2 #event type=1;the initial Lambda=time.list*1.5 and 2 is an arbitrary positive number
missing.upper2<-max(samp.data$R2[samp.data$R2<Inf],samp.data$L2)*1.5+2 #event type=2;the initial Lambda=time.list*1.5 and 2 is an arbitrary positive number
missing.upper<-max(missing.upper1,missing.upper2)
tab.event<-table(samp.data$event[samp.data$C==0],samp.data$samp.wgt[samp.data$C==0])
tab.event2<-t(tab.event)*as.numeric(colnames(tab.event))
marginal.prop<-colSums(prop.table(tab.event2))
cap.subdist1<-marginal.prop[1]+marginal.prop[3]*0.4
cap.subdist2<-marginal.prop[2]+marginal.prop[3]*0.4
cap.L1<--log(1-cap.subdist1)
cap.L2<--log(1-cap.subdist2)
samp.data$L1[samp.data$L1==-999]<-missing.upper
samp.data$R1[samp.data$R1==-999]<-missing.upper
samp.data$L2[samp.data$L2==-999]<-missing.upper
samp.data$R2[samp.data$R2==-999]<-missing.upper
samp.data$id<-seq(1,nrow(samp.data))
############## Lambda1 Initials #################################################
Jn1<-sort(unique(c(samp.data$L1[samp.data$L1>0&samp.data$L1<missing.upper],
samp.data$R1[samp.data$R1<missing.upper])))
Jn2<-sort(unique(c(samp.data$L2[samp.data$L2>0&samp.data$L2<missing.upper],
samp.data$R2[samp.data$R2<missing.upper])))
#ordered statistics for time: min{X_i: T_i<=X_i} and max{X_i:X_i<=T_i}
time1.1<-min(samp.data$R1[samp.data$R1<missing.upper])
time1.2<-max(samp.data$L1[samp.data$L1>0&samp.data$L1<missing.upper])
time2.1<-min(samp.data$R2[samp.data$R2<missing.upper])
time2.2<-max(samp.data$L2[samp.data$L2>0&samp.data$L2<missing.upper])
if(time1.2==0|time2.2==0){
stop("NPMLE condition: the maximum left time point should be positive.")
}
if(is.null(time.interval)){
time.interval<- 0.1
time.points1<-seq(0,time1.2,by=time.interval)
time.points2<-seq(0,time2.2,by=time.interval)
}else {
time.points1<-seq(0,time1.2,by=time.interval)
time.points2<-seq(0,time2.2,by=time.interval)
}
time.list1<-Jn1[Jn1>=time1.1&Jn1<=time1.2]
time.list2<-Jn2[Jn2>=time2.1&Jn2<=time2.2]
time1.95<-quantile(time.list1,c(0.95))
time2.95<-quantile(time.list2,c(0.95))
Lamb.ini1<-Lambda.initials(time.list1,sub.haz.ini2(time.list1),missing.upper,samp.data,event=1,time1.1,time1.2,cap.L1)
old.Lambda1<-Lamb.ini1$Lambda.data
Lamb.ini2<-Lambda.initials(time.list2,sub.haz.ini2(time.list2),missing.upper,Data=Lamb.ini1$sdata3,event=2,time2.1,time2.2,cap.L2)
old.Lambda2<-Lamb.ini2$Lambda.data
sdata3<-Lamb.ini2$sdata3
mf0 <- model.frame(formula=fml0, data=sdata3)
design.mat <- model.matrix(attr(mf0, "terms"), data=mf0)
mf <- model.frame(formula=fml1, data=sdata3)
xmat1 <- model.matrix(attr(mf, "terms"), data=mf)
mf2 <- model.frame(formula=fml2, data=sdata3)
xmat2 <- model.matrix(terms(fml2), data=mf2)
if(ncol(xmat2)>1){
xmat2<-as.matrix(xmat2[,-1])
} else if(ncol(xmat2)==1){
xmat2<-0
}
if(ncol(xmat1)>1){
xmat1<-as.matrix(xmat1[,-1])
} else if(ncol(xmat1)==1){
xmat1<-0
}
if(n.beta>1){
expplus.est<-as.numeric(1+exp(design.mat%*%initials1[1:n.beta]))
}else if(n.beta==1){
expplus.est<-as.numeric(1+exp(design.mat*initials1[1]))
}
if(n.gamma1>0){
cox.hr.est1<-as.numeric(exp(xmat1%*%initials1[mm:mm2]))
} else {
cox.hr.est1<-1
}
if(n.gamma2>0){
cox.hr.est2<-as.numeric(exp(xmat2%*%initials1[mm3:n.regpara]))
} else {
cox.hr.est2<-1
}
sdata3$HR1<-cox.hr.est1
sdata3$HR2<-cox.hr.est2
sdata3$expplus<-expplus.est
########################### LOOPING ###########################################################
old.wNR.est<-initials2
iteration<-0
keep.going<-1
ptm<-proc.time()
diff.estimates<-100
while(keep.going==1){
iteration<- iteration +1
###################### Cumulative Hazard Estimate ##############################
# update Lambda2
iter<-0
keep.looping<-1
old.Lambda22<-old.Lambda2
#L2OUT5<-list()
while(keep.looping==1){
iter<-iter+1
L2out<-L2update(trans.r1,trans.r2,sdata3,time.list2,old.Lambda22,missing.upper)
# L2OUT5[[iter]]<-L2out
new.Lambda2<-L2out$new.Lambda
diff.Lambda2<-max(abs(new.Lambda2$Lambda[time.list2<time2.95]-old.Lambda22$Lambda[time.list2<time2.95]))
old.Lambda22<-new.Lambda2
#update sdata3 with new.Lambda2
remove.ind2<-which(names(sdata3) %in% c("LambdaL2","LambdaR2"))
Lamb.ini2<-Lambda.initials(time.list2,new.Lambda2$Lambda,missing.upper,sdata3[,-remove.ind2],event=2,time2.1,time2.2,cap.L2)
sdata3<-Lamb.ini2$sdata3
if (diff.Lambda2<convergence.criteria|iter>iter.limit){
keep.looping<-0
}
} #sdata3 is sorted by L2 and R2
#For now, Lambda2 estimate is back to the estimate from the previous step:
remove.ind2<-which(names(sdata3) %in% c("LambdaL2","LambdaR2"))
Lamb.ini2<-Lambda.initials(time.list2,old.Lambda2$Lambda,missing.upper,sdata3[,-remove.ind2],event=2,time2.1,time2.2,cap.L2)
sdata3<-Lamb.ini2$sdata3
## update 1 ##
iter<-0
keep.looping<-1
old.Lambda11<-old.Lambda1
while(keep.looping==1){
iter<-iter+1
L1out<-L1update(trans.r1,trans.r2,sdata3,time.list1,old.Lambda11,missing.upper)
new.Lambda1<-L1out$new.Lambda
diff.Lambda1<-max(abs(new.Lambda1$Lambda[time.list1<time1.95]-old.Lambda11$Lambda[time.list1<time1.95]))
old.Lambda11<-new.Lambda1
#update sdata3 with new.Lambda1
remove.ind1<-which(names(sdata3) %in% c("LambdaL1","LambdaR1"))
Lamb.ini1<-Lambda.initials(time.list1,new.Lambda1$Lambda,missing.upper,sdata3[,-remove.ind1],event=1,time1.1,time1.2,cap.L1)
sdata3<-Lamb.ini1$sdata3
if (diff.Lambda1<convergence.criteria|iter>iter.limit){
keep.looping<-0
}
} #sdata3 is sorted by L1 and R1
#Lambda2 estimate is now updated:
remove.ind2<-which(names(sdata3) %in% c("LambdaL2","LambdaR2"))
Lamb.ini2<-Lambda.initials(time.list2,new.Lambda2$Lambda,missing.upper,sdata3[,-remove.ind2],event=2,time2.1,time2.2,cap.L2)
sdata3<-Lamb.ini2$sdata3
mf0 <- model.frame(formula=fml0, data=sdata3)
design.mat <- model.matrix(attr(mf0, "terms"), data=mf0)
label0<-paste0("logit:",colnames(design.mat))
mf <- model.frame(formula=fml1, data=sdata3)
xmat1 <- model.matrix(attr(mf, "terms"), data=mf)
mf2 <- model.frame(formula=fml2, data=sdata3)
xmat2 <- model.matrix(terms(fml2), data=mf2)
if(ncol(xmat2)>1){
label2<-paste0("i.model2:",colnames(xmat2)[-1])
xmat2<-as.matrix(xmat2[,-1])
} else if(ncol(xmat2)==1){
xmat2<-0
label2<-NULL
}
if(ncol(xmat1)>1){
label1<-paste0("i.model1:",colnames(xmat1)[-1])
xmat1<-as.matrix(xmat1[,-1])
} else if(ncol(xmat1)==1){
xmat1<-0
label1<-NULL
}
labels<-c(label0,label1,label2)
r.cens.ind<-which(sdata3$C==0&sdata3$R1==Inf&sdata3$R2==Inf)
if(n.gamma1>0){
const.mat1<-as.matrix(xmat1[r.cens.ind,])
}
if(n.gamma2>0) {
const.mat2<-as.matrix(xmat2[r.cens.ind,])
}
################# start updating regression parameters
const.ineq <- function(para) {
grisk1<-as.numeric(const.mat1%*%para[mm:mm2])
grisk2<-as.numeric(const.mat2%*%para[mm3:n.regpara])
L.erisk1<-exp(grisk1)*sdata3$LambdaL1[r.cens.ind]
L.erisk2<-exp(grisk2)*sdata3$LambdaL2[r.cens.ind]
S1<-exp(-trans(trans.r1,L.erisk1))
S2<-exp(-trans(trans.r2,L.erisk2))
constr=S1+S2-1.001 #constr>=0
return(constr)
}
const.ineq2 <- function(para) {
grisk1<-as.numeric(const.mat1%*%para[mm:mm2])
grisk2<-0
L.erisk1<-exp(grisk1)*sdata3$LambdaL1[r.cens.ind]
L.erisk2<-exp(grisk2)*sdata3$LambdaL2[r.cens.ind]
S1<-exp(-trans(trans.r1,L.erisk1))
S2<-exp(-trans(trans.r2,L.erisk2))
constr=S1+S2-1.001 #constr>=0
return(constr)
}
const.ineq3 <- function(para) {
grisk1<-0
grisk2<-as.numeric(const.mat2%*%para[mm3:n.regpara])
L.erisk1<-exp(grisk1)*sdata3$LambdaL1[r.cens.ind]
L.erisk2<-exp(grisk2)*sdata3$LambdaL2[r.cens.ind]
S1<-exp(-trans(trans.r1,L.erisk1))
S2<-exp(-trans(trans.r2,L.erisk2))
constr=S1+S2-1.001 #constr>=0
return(constr)
}
#resw<-solnl(X=old.wNR.est,objfun=wsemi.opt.subgr,confun=const.ineq)
if(n.gamma1>0&n.gamma2>0){
resw<-cobyla(old.wNR.est, fn=wsemi.opt.subgr, hin = const.ineq)
}else if(n.gamma1>0&n.gamma2==0){
resw<-cobyla(old.wNR.est, fn=wsemi.opt.subgr, hin = const.ineq2)
}else if(n.gamma1==0&n.gamma2>0){
resw<-cobyla(old.wNR.est, fn=wsemi.opt.subgr, hin = const.ineq3)
}else if(n.gamma1==0&n.gamma2==0){
resw<-optim(old.wNR.est,wsemi.opt,gr=grad.f,method="BFGS")
}
############### insert the cut codes
################# end of updating regression parameters
wNR.est<-resw$par
#print(wNR.est)
diff.reg.para<-max(abs(old.wNR.est-wNR.est))
if(n.beta>1){
expplus.est<-as.numeric(1+exp(design.mat%*%wNR.est[1:n.beta]))
}else if(n.beta==1){
expplus.est<-as.numeric(1+exp(design.mat*wNR.est[1]))
}
if(n.gamma1>0){
cox.hr.est1<-as.numeric(exp(xmat1%*%wNR.est[mm:mm2]))
} else {
cox.hr.est1<-1
}
if(n.gamma2>0){
cox.hr.est2<-as.numeric(exp(xmat2%*%wNR.est[mm3:n.regpara]))
} else {
cox.hr.est2<-1
}
sdata3$HR1<-cox.hr.est1
sdata3$HR2<-cox.hr.est2
sdata3$expplus<-expplus.est
diff.estimates<-max(diff.reg.para,diff.Lambda1,diff.Lambda2)
diff.reg.est<-diff.reg.para
diff.Lambda.est1<-diff.Lambda1
diff.Lambda.est2<-diff.Lambda2
old.wNR.est<-wNR.est
old.Lambda1<-new.Lambda1
old.Lambda2<-new.Lambda2
if(diff.estimates<convergence.criteria|iteration>iteration.limit){
keep.going<-0
run.time<-((proc.time()-ptm)[3])/60
}
} #while(keep.going==1){
par.est<-wNR.est
prev.est<-prev.fun(par.est)
if(is.null(time.list)){
Lambda.data1<-new.Lambda1
Lambda.data2<-new.Lambda2
}else{
Lambda.data1<-data.frame(time=time.list,Lambda=L1out$sfun(time.list))
Lambda.data2<-data.frame(time=time.list,Lambda=L2out$sfun(time.list))
}
wloglike<-wobs.semipara.like(na.drop=FALSE,trans.r1,trans.r2,par.est,n.beta,n.gamma1,n.gamma2,design.mat,xmat1,xmat2,sdata3)
############ Variance estimation for regression parameter estimates ###########
if(anal.var==TRUE){
score.reg<-score.f(trans.r1,trans.r2,par.est,n.beta,n.gamma1,n.gamma2,design.mat,xmat1,xmat2,sdata3)
score.nowt<-score.reg$score.vec.nowt
score.wt<-score.reg$score.vec
info.mat<- (t(score.wt)%*%score.nowt)
N<-(sum(sdata3$samp.wgt))
inv.info2<-try(solve(info.mat,tol=1e-300),silent=TRUE)
if('try-error' %in% class(inv.info2)){
cat("inverse-matrix problem in info.mat\n")
}else {
inv.info<-inv.info2
if(population=="super"){
inv.info.mat<- inv.info*N #because of subgroup analysis, sum of sampling weights replaces N
cov.mat.phase1<-inv.info.mat/N
cov.mat.by.strata<-cov.mat.ph2(score.nowt,sdata3)
cov.mat.phase2<-(inv.info.mat%*%cov.mat.by.strata%*%inv.info.mat)/N
cov.mat<-cov.mat.phase1+cov.mat.phase2
theta.se<-sqrt(diag(cov.mat))
} else if(population=="finite"){
delta.s<-score.reg$score.vec
cov.mat.theta<-cov.mat.ph1(delta.s,samp.data)
cov.mat<-inv.info%*%cov.mat.theta%*%inv.info
theta.se<-sqrt(diag(cov.mat))
}
para.95UL<-as.numeric(par.est+1.96*theta.se)
para.95LL<-as.numeric(par.est-1.96*theta.se)
} #if('try-error' %in% class(inv.info2)){
}#if anal.var==TRUE
output<-list()
output$data.summary<-wPI.get.summary(samp.data)
if(exists('theta.se')){
output$reg.coef<-data.frame(cbind(label=labels,coef=par.est,se=theta.se,ll95=para.95LL,ul95=para.95UL) )
rownames(output$reg.coef)<-NULL
output$reg.covariance<-cov.mat
}else {
output$reg.coef<-data.frame(cbind(label=labels,coef=par.est))
rownames(output$reg.coef)<-NULL
output$reg.covariance<-NA
}
output$prevalence<-prev.est
output$subdist.hazard1<-Lambda.data1
output$subdist.hazard2<-Lambda.data2
output$subdist.hazard.fn1<-L1out$sfun
output$subdist.hazard.fn2<-L2out$sfun
output$convergence<-c(iteration=iteration,converg.overall=diff.estimates, convg.diff.reg=diff.reg.est,convg.diff.L1=diff.Lambda.est1,convg.diff.L2=diff.Lambda.est2)
output$run.time.mins<-run.time
output$loglike<-wloglike
output$trans.r<-c(r1=trans.r1,r2=trans.r2)
output$models<-c(p.model=p.model,i.model1=i.model1,i.model2=i.model2)
#Used optim algorithm in each iteration
return(output)
} # the end of the function compting.lc.semipara
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.