R/twostage.R

Defines functions twostage.phreg twostage.coxph twostage.cox.aalen twostage.aalen EVaddGam make.pairwise.design object.defined simSurvFam easy.survival.twostage coefmat print.summary.pc.twostage print.pc.twostage summary.pc.twostage piecewise.data piecewise.twostage alpha2kendall alpha2spear predict.mets.twostage matplot.mets.twostage plot.mets.twostage print.mets.twostage coef.mets.twostage summary.mets.twostage survival.iterative twostageMLE readmargsurv randomDes survival.twostageCC survival.twostage

Documented in alpha2kendall alpha2spear coefmat easy.survival.twostage EVaddGam make.pairwise.design matplot.mets.twostage object.defined piecewise.data piecewise.twostage randomDes readmargsurv simSurvFam survival.iterative survival.twostage survival.twostageCC twostage.aalen twostage.cox.aalen twostage.coxph twostageMLE twostage.phreg

##' @title Twostage survival model for multivariate survival data
##'
##' @description
##' Fits Clayton-Oakes or bivariate Plackett models for bivariate survival data
##' using marginals that are on Cox form. The dependence can be modelled via
##' \enumerate{
##' \item  Regression design on dependence parameter.
##' \item  Random effects, additive gamma model.
##' }
##'
##' If clusters contain more than two subjects, we use a composite likelihood
##' based on the pairwise bivariate models, for full MLE see twostageMLE.
##'
##' The two-stage model is constructed such that
##' given the gamma distributed random effects it is assumed that the survival functions
##' are indpendent, and that the marginal survival functions are on Cox form (or additive form)
##' \deqn{
##' P(T > t| x) = S(t|x)= exp( -exp(x^T \beta) A_0(t) )
##' }
##'
##' One possibility is to model the variance within clusters via a regression design, and
##' then one can specify a regression structure for the independent gamma distributed
##' random effect for each cluster, such that the variance is given by
##' \deqn{
##'  \theta = h( z_j^T \alpha)
##' }
##' where \eqn{z} is specified by theta.des, and a possible link function var.link=1 will
##' will use the exponential link \eqn{h(x)=exp(x)}, and var.link=0 the identity link \eqn{h(x)=x}.
##' The reported standard errors are based on the estimated information from the
##' likelihood assuming that the marginals are known (unlike the twostageMLE and for the
##' additive gamma model below).
##'
##' Can also fit a structured additive gamma random effects model, such
##' as the ACE, ADE model for survival data.  In this case the
##' random.design specificies the random effects for each subject within a cluster. This is
##' a matrix of 1's and 0's with dimension n x d.  With d random effects.
##' For a cluster with two subjects, we let the random.design rows be
##'  \eqn{v_1} and \eqn{v_2}.
##' Such that the random effects for subject
##' 1 is \deqn{v_1^T (Z_1,...,Z_d)}, for d random effects. Each random effect
##' has an associated parameter \eqn{(\lambda_1,...,\lambda_d)}.
##' By construction subjects 1's random effect are Gamma distributed with
##' mean \eqn{\lambda_j/v_1^T \lambda}
##' and variance \eqn{\lambda_j/(v_1^T \lambda)^2}. Note that the random effect
##' \eqn{v_1^T (Z_1,...,Z_d)} has mean 1 and variance \eqn{1/(v_1^T \lambda)}.
##' It is here asssumed that  \eqn{lamtot=v_1^T \lambda} is fixed within clusters
##' as it would be for the ACE model below.
##'
##' Based on these parameters the relative contribution (the heritability, h) is
##' equivalent to  the expected values of the random effects: \eqn{\lambda_j/v_1^T \lambda}
##'
##' The DEFAULT parametrization (var.par=1) uses the variances of the random effecs
##' \deqn{
##' \theta_j  = \lambda_j/(v_1^T \lambda)^2
##' }
##' For alternative parametrizations one can specify how the parameters relate to \eqn{\lambda_j}
##' with the argument var.par=0.
##'
##' For both types of models the basic model assumptions are that
##' given the random effects of the clusters the survival distributions within a cluster
##' are independent and ' on the form
##' \deqn{
##' P(T > t| x,z) = exp( -Z \cdot Laplace^{-1}(lamtot,lamtot,S(t|x)) )
##' }
##' with the inverse laplace of the gamma distribution with mean 1 and variance 1/lamtot.
##'
##' The parameters \eqn{(\lambda_1,...,\lambda_d)} are related to the parameters of the model
##' by a regression construction \eqn{pard} (d x k), that links the \eqn{d}
##' \eqn{\lambda} parameters
##' with the (k) underlying \eqn{\theta} parameters
##' \deqn{
##' \lambda = theta.des  \theta
##' }
##' here using theta.des to specify these low-dimension association. Default is a diagonal matrix.
##' This can be used to make structural assumptions about the variances of the random-effects
##' as is needed for the ACE model for example.
##'
##' The case.control option that can be used with the pair specification of the pairwise parts
##' of the estimating equations. Here it is assumed that the second subject of each pair is the proband.
##'
##' @references
##'
##' Twostage estimation of additive gamma frailty models for survival data.
##' Scheike (2019), work in progress
##'
##' Shih and Louis (1995) Inference on the association parameter in copula models for bivariate
##' survival data, Biometrics, (1995).
##'
##' Glidden (2000), A Two-Stage estimator of the dependence
##' parameter for the Clayton Oakes model, LIDA, (2000).
##'
##' Measuring early or late dependence for bivariate twin data
##' Scheike, Holst, Hjelmborg (2015), LIDA
##'
##' Estimating heritability for cause specific mortality based on twins studies
##' Scheike, Holst, Hjelmborg (2014), LIDA
##'
##' Additive Gamma frailty models for competing risks data, Biometrics (2015)
##' Eriksson and Scheike (2015),
##'
##' @examples
##' data(diabetes)
##'
##' # Marginal Cox model  with treat as covariate
##' margph <- phreg(Surv(time,status)~treat+cluster(id),data=diabetes)
##' ### Clayton-Oakes, MLE
##' fitco1<-twostageMLE(margph,data=diabetes,theta=1.0)
##' summary(fitco1)
##'
##' ### Plackett model
##' mph <- phreg(Surv(time,status)~treat+cluster(id),data=diabetes)
##' fitp <- survival.twostage(mph,data=diabetes,theta=3.0,Nit=40,
##'                clusters=diabetes$id,var.link=1,model="plackett")
##' summary(fitp)
##'
##' ### Clayton-Oakes
##' fitco2 <- survival.twostage(mph,data=diabetes,theta=0.0,detail=0,
##'                  clusters=diabetes$id,var.link=1,model="clayton.oakes")
##' summary(fitco2)
##' fitco3 <- survival.twostage(margph,data=diabetes,theta=1.0,detail=0,
##'                  clusters=diabetes$id,var.link=0,model="clayton.oakes")
##' summary(fitco3)
##'
##' ### without covariates but with stratafied
##' marg <- phreg(Surv(time,status)~+strata(treat)+cluster(id),data=diabetes)
##' fitpa <- survival.twostage(marg,data=diabetes,theta=1.0,
##'                 clusters=diabetes$id,model="clayton.oakes")
##' summary(fitpa)
##'
##' fitcoa <- survival.twostage(marg,data=diabetes,theta=1.0,clusters=diabetes$id,
##'                  model="clayton.oakes")
##' summary(fitcoa)
##'
##' ### Piecewise constant cross hazards ratio modelling
##' ########################################################
##'
##' d <- subset(simClaytonOakes(2000,2,0.5,0,stoptime=2,left=0),!truncated)
##' udp <- piecewise.twostage(c(0,0.5,2),data=d,method="optimize",
##'                           id="cluster",timevar="time",
##'                           status="status",model="clayton.oakes",silent=0)
##' summary(udp)
##'
##' \donttest{ ## Reduce Ex.Timings
##' ### Same model using the strata option, a bit slower
##' ########################################################
##' ## makes the survival pieces for different areas in the plane
##' ##ud1=surv.boxarea(c(0,0),c(0.5,0.5),data=d,id="cluster",timevar="time",status="status")
##' ##ud2=surv.boxarea(c(0,0.5),c(0.5,2),data=d,id="cluster",timevar="time",status="status")
##' ##ud3=surv.boxarea(c(0.5,0),c(2,0.5),data=d,id="cluster",timevar="time",status="status")
##' ##ud4=surv.boxarea(c(0.5,0.5),c(2,2),data=d,id="cluster",timevar="time",status="status")
##'
##' ## everything done in one call
##' ud <- piecewise.data(c(0,0.5,2),data=d,timevar="time",status="status",id="cluster")
##' ud$strata <- factor(ud$strata);
##' ud$intstrata <- factor(ud$intstrata)
##'
##' ## makes strata specific id variable to identify pairs within strata
##' ## se's computed based on the id variable across strata "cluster"
##' ud$idstrata <- ud$id+(as.numeric(ud$strata)-1)*2000
##'
##' marg2 <- timereg::aalen(Surv(boxtime,status)~-1+factor(num):factor(intstrata),
##'                data=ud,n.sim=0,robust=0)
##' tdes <- model.matrix(~-1+factor(strata),data=ud)
##' fitp2 <- survival.twostage(marg2,data=ud,se.clusters=ud$cluster,clusters=ud$idstrata,
##'                 model="clayton.oakes",theta.des=tdes,step=0.5)
##' summary(fitp2)
##'
##' ### now fitting the model with symmetry, i.e. strata 2 and 3 same effect
##' ud$stratas <- ud$strata;
##' ud$stratas[ud$strata=="0.5-2,0-0.5"] <- "0-0.5,0.5-2"
##' tdes2 <- model.matrix(~-1+factor(stratas),data=ud)
##' fitp3 <- survival.twostage(marg2,data=ud,clusters=ud$idstrata,se.cluster=ud$cluster,
##'                 model="clayton.oakes",theta.des=tdes2,step=0.5)
##' summary(fitp3)
##'
##' ### same model using strata option, a bit slower
##' fitp4 <- survival.twostage(marg2,data=ud,clusters=ud$cluster,se.cluster=ud$cluster,
##'                 model="clayton.oakes",theta.des=tdes2,step=0.5,strata=ud$strata)
##' summary(fitp4)
##' }
##'
##' \donttest{ ## Reduce Ex.Timings
##' ### structured random effects model additive gamma ACE
##' ### simulate structured two-stage additive gamma ACE model
##' data <- simClaytonOakes.twin.ace(4000,2,1,0,3)
##' out <- twin.polygen.design(data,id="cluster")
##' pardes <- out$pardes
##' pardes
##' des.rv <- out$des.rv
##' head(des.rv)
##' aa <- phreg(Surv(time,status)~x+cluster(cluster),data=data,robust=0)
##' ts <- survival.twostage(aa,data=data,clusters=data$cluster,detail=0,
##' 	       theta=c(2,1),var.link=0,step=0.5,
##' 	       random.design=des.rv,theta.des=pardes)
##' summary(ts)
##' }
##'
##' @keywords survival
##' @author Thomas Scheike
##' @param margsurv Marginal model
##' @param data data frame
##' @param method Scoring method "nr", for lava NR optimizer
##' @param detail Detail
##' @param clusters Cluster variable
##' @param silent Debug information
##' @param weights Weights
##' @param theta Starting values for variance components
##' @param theta.des design for dependence parameters, when pairs are given the indeces of the
##' theta-design for this pair, is given in pairs as column 5
##' @param var.link Link function for variance:  exp-link.
##' @param baseline.iid to adjust for baseline estimation, using phreg function on same data.
##' @param model model
##' @param marginal.trunc marginal left truncation probabilities
##' @param marginal.survival optional vector of marginal survival probabilities
##' @param strata strata for fitting, see example
##' @param se.clusters for clusters for se calculation with iid
##' @param numDeriv to get numDeriv version of second derivative, otherwise uses sum of squared scores for each pair
##' @param random.design random effect design for additive gamma model, when pairs are given the
##' indeces of the pairs random.design rows are given as columns 3:4
##' @param pairs matrix with rows of indeces (two-columns) for the pairs considered in the pairwise
##' composite score, useful for case-control sampling when marginal is known.
##' @param dim.theta dimension of the theta parameter for pairs situation.
##' @param numDeriv.method uses simple to speed up things and second derivative not so important.
##' @param additive.gamma.sum for two.stage=0, this is specification of the lamtot in the models via
##' a matrix that is multiplied onto the parameters theta (dimensions=(number random effects x number
##' of theta parameters), when null then sums all parameters.
##' @param var.par is 1 for the default parametrization with the variances of the random effects,
##' var.par=0 specifies that the \eqn{\lambda_j}'s are used as parameters.
##' @param no.opt for not optimizng
##' @param ... Additional arguments to maximizer NR of lava.
##' and ascertained sampling
##' @aliases survival.twostage twostage.aalen twostage.cox.aalen twostage.coxph twostage.phreg randomDes readmargsurv 
##' @export survival.twostage
survival.twostage <- function(margsurv,data=parent.frame(),
    method="nr",detail=0,clusters=NULL,
    silent=1,weights=NULL,theta=NULL,theta.des=NULL,
    var.link=1,baseline.iid=1,model="clayton.oakes",
    marginal.trunc=NULL,marginal.survival=NULL,strata=NULL,
    se.clusters=NULL,numDeriv=1,random.design=NULL,pairs=NULL,dim.theta=NULL,
    numDeriv.method="simple",additive.gamma.sum=NULL,var.par=1,no.opt=FALSE,...)
{#
  requireNamespace("timereg")
# seting up design and variables
iid <- 1
two.stage <- 1; rate.sim <- 1; sym=1; var.func <- NULL
if (model=="clayton.oakes" || model=="gamma") dep.model <- 1
else if (model=="plackett" || model=="or") dep.model <- 2
else stop("Model must by either clayton.oakes or plackett \n");
start.time <- NULL; ptrunc <- NULL; psurvmarg <- NULL; status <- NULL; score.iid <- NULL
fix.baseline <- 0; convergence.bp <- 1;  ### to control if baseline profiler converges

  if ((!is.null(margsurv)) | (!is.null(marginal.survival))) fix.baseline <- 1
  antpers <- nrow(data); RR <-  rep(1,antpers);


  if (!is.null(margsurv)) {
     rrr <-  readmargsurv(margsurv,data,clusters)
     psurvmarg <- rrr$psurvmarg; ptrunc <- rrr$ptrunc; start.time <- rrr$entry;
     time2 <- rrr$exit; status <- rrr$status; clusters <- rrr$clusters
  }

  if (is.null(psurvmarg)) psurvmarg <- rep(1,antpers);
  if (!is.null(marginal.survival)) psurvmarg <- marginal.survival
  if (!is.null(marginal.trunc)) ptrunc <- marginal.trunc
  if (is.null(ptrunc)) ptrunc <- rep(1,length(psurvmarg))
  if (is.null(weights)==TRUE) weights <- rep(1,antpers);
  if (is.null(strata)==TRUE) strata<- rep(1,antpers);
  if (length(strata)!=antpers) stop("Strata must have length equal to number of data points \n");

  # cluster set up
  cluster.call <- clusters
  out.clust <- cluster.index(clusters);
  clusters <- out.clust$clusters
  maxclust <- out.clust$maxclust
  antclust <- out.clust$cluster.size
  clusterindex <- out.clust$idclust
  clustsize <- out.clust$cluster.size
  call.secluster <- se.clusters

  if (is.null(se.clusters)) { se.clusters <- clusters; antiid <- nrow(clusterindex);} else  {
      iids <-  unique(se.clusters);
      antiid <- length(iids);
      if (is.numeric(se.clusters)) se.clusters <-  fast.approx(iids,se.clusters)-1
       else se.clusters <- as.integer(factor(se.clusters, labels = seq(antiid)))-1
  }
  if (length(se.clusters)!=length(clusters)) stop("Length of seclusters and clusters must be same\n");

  #

  ### setting design for random variables, in particular with pairs are given
  ddd <- randomDes(dep.model,random.design,theta.des,theta,antpers,additive.gamma.sum,pairs,var.link,clusterindex,dim.theta)
  random.design=ddd$random.design;clusterindex=ddd$clusterindex;
  antpairs=ddd$antpairs;
  theta=ddd$theta;ptheta=ddd$ptheta;theta.des=ddd$theta.des
  pair.structure=ddd$pair.structure; dep.model=ddd$dep.model
  dim.rv <- ddd$dim.rv; additive.gamma.sum=ddd$additive.gamma.sum

  theta.score<-rep(0,ptheta);
  Stheta<-var.theta<-matrix(0,ptheta,ptheta);
  #
  ascertained <- 0

  obj <- function(par)
  { #

     if (pair.structure==0 | dep.model!=3) Xtheta <- as.matrix(theta.des) %*% matrix(c(par),nrow=ptheta,ncol=1);
     if (pair.structure==1 & dep.model==3) Xtheta <- matrix(0,antpers,1); ## not needed

      if (var.link==1 & dep.model==3) epar <- c(exp(par)) else epar <- c(par)
      partheta <- epar

      if (var.par==1 & dep.model==3) {
	 ## from variances to
	 if (is.null(var.func)) {
	    sp <- sum(epar)
	    partheta <- epar/sp^2
         } else partheta <- epar ## par.func(epar)
      }

      if (pair.structure==0) {
	      outl<-.Call("twostageloglikeRV", # only two stage model for this option
	      icause=status,ipmargsurv=psurvmarg,
	      itheta=c(partheta), ithetades=theta.des,
	      icluster=clusters,iclustsize=clustsize,iclusterindex=clusterindex,
	      ivarlink=var.link,iid=iid,iweights=weights,isilent=silent,idepmodel=dep.model,
	      itrunkp=ptrunc,istrata=as.numeric(strata),iseclusters=se.clusters,iantiid=antiid,
	      irvdes=random.design,iags=additive.gamma.sum,iascertained=ascertained,
              PACKAGE="mets") #
      }
      else { # pair-structure
          outl<-.Call("twostageloglikeRVpairs", #
          icause=status,ipmargsurv=psurvmarg, itheta=c(partheta), ithetades=theta.des,
          icluster=clusters,iclustsize=clustsize,iclusterindex=clusterindex, ivarlink=var.link,
	  iiid=iid,iweights=weights,isilent=silent, idepmodel=dep.model,
	  itrunkp=ptrunc,istrata=as.numeric(strata), iseclusters=se.clusters,iantiid=antiid,
	  irvdes=random.design, iags=additive.gamma.sum, iascertained=ascertained,PACKAGE="mets")
	  #
      } #

    if (detail==3) print(c(partheta,outl$loglike))

    ## variance parametrization, and inverse.link
    if (dep.model==3) {
    if (var.par==1) {
         ## from variances to and with sum for all random effects
         if (is.null(var.func)) {
	 if (var.link==0)  {
             mm <- matrix(-epar*2*sp,length(epar),length(epar))
	     diag(mm) <- sp^2-epar*2*sp
	 } else {
            mm <- -c(epar) %o% c(epar)*2*sp
            diag(mm) <- epar*sp^2-epar^2*2*sp
	 }
	    mm <- mm/sp^4
	 } else mm  <- numDeriv::hessian(var.func,par)
      } else {
	   if (var.link==0) mm <- diag(length(epar)) else
			  mm <- diag(length(c(epar)))*c(epar)
      }
      }


    if (dep.model==3) {
       outl$score <-  t(mm) %*% outl$score
       outl$Dscore <- t(mm) %*% outl$Dscore %*% mm
       if (iid==1) { outl$score.iid <- t(t(mm) %*% t(outl$score.iid))
               if (inherits(margsurv,"phreg")) {
	          outl$D1thetal  <- t(t(mm) %*% t(outl$D1thetal))
	          outl$D2thetal  <- t(t(mm) %*% t(outl$D2thetal))
	       }
       }
    }


    outl$gradient <- outl$score 
    outl$hessian <- outl$Dscore

    if (oout==0) ret <- with(outl,structure(outl$loglike,gradient=-gradient,hessian=-hessian))
    else if (oout==1) ret <- outl$gradient 
    else if (oout==2) ret <- outl

    return(ret)
  } #

  score1 <- NULL
  theta.iid <- NULL
  p <- theta

  oout <- 0
  opt <- NULL
  if (no.opt==FALSE) {
      if (tolower(method)=="nr") {
          tim <- system.time(opt <- lava::NR(p,obj,...))
          opt$timing <- tim
          opt$estimate <- opt$par
      } else {
          opt <- nlm(obj,beta,...)
          opt$method <- "nlm"
      }
      cc <- opt$estimate;  ## names(cc) <- colnames(X)
      oout <- 2
      val <- c(list(coef=cc),obj(opt$estimate))
 } else {oout <- 2; val <- c(list(coef=p),obj(p))}

     if (numDeriv>=1) {
	 p <- val$coef
         oout <- 1
         if (detail==1 ) cat("starting numDeriv for second derivative \n");
         val$hessian <- numDeriv::jacobian(obj,p,method=numDeriv.method)
         if (detail==1 ) cat("finished numDeriv for second derivative \n");
        }


    ### update p, but note that score and derivative in fact related to previous p
    if (!is.nan(sum(p))) {
      if (detail==1 && iid==1) cat("iid decomposition\n");
         if (iid==1) {  #
            score.iid <- val$score.iid
	    val$theta.iid <- score.iid
	    theta.iid <- score.iid
	    ucc <-  unique(cluster.call)
	    if (length(ucc)== nrow(theta.iid))
		    rownames(theta.iid) <- unique(cluster.call)
                ### lets iid for theta be just score to start, correction for marginal for phreg call

                if (inherits(margsurv,"phreg") & baseline.iid==1) {  # adjust for baseline when phreg is used

                  if ((margsurv$no.opt) | is.null(margsurv$coef)) fixbeta<- 1 else fixbeta <- 0
		  xx <- margsurv$cox.prep
                  id  <-  xx$id+1
		  cumhazt <- c(rrr$cum)
		  rr <- c(rrr$RR)

		  ## these refers to id given in cluster.call
		  xx <- margsurv$cox.prep
		  D1thetal<- val$D1thetal
		  D2thetal<- val$D2thetal
		  Dlamthetal <- -(D1thetal+D2thetal)
		  ## ordered as original data
                  Fbeta <- t(Dlamthetal) %*% (margsurv$X *c(cumhazt*psurvmarg))

                  ### timeordered
		  ## t- not needed because using S(T-) for survival already
                  Gbasedt <- Dlamthetal[xx$ord+1,,drop=FALSE]*psurvmarg[xx$ord+1]*rr[xx$ord+1]
                  Gbase <- apply(Gbasedt,2,revcumsumstrata,xx$strata,xx$nstrata)

                  if (fixbeta==0) {  ### iid after beta of marginal model
		     Z <- xx$X
		     U <- E <- matrix(0,nrow(xx$X),margsurv$p)
		     E[xx$jumps+1,] <- margsurv$E
		     U[xx$jumps+1,] <- margsurv$U
		     invhess <- -solve(margsurv$hessian)
		     S0i <- rep(0,length(xx$strata))
		     S0i[xx$jumps+1] <- 1/margsurv$S0
		     cumhaz <- c(cumsumstrata(S0i,xx$strata,xx$nstrata))
		     EdLam0 <- apply(E*S0i,2,cumsumstrata,xx$strata,xx$nstrata)
	             rr <- c(xx$sign*exp(Z %*% coef(margsurv) + xx$offset))
		     ### Martingale for all subjects
		     MGt <- U[,drop=FALSE]-(Z*cumhaz-EdLam0)*rr*c(xx$weights)
		     UUbeta <- apply(MGt,2,sumstrata,id-1,max(id))
		     UUbeta <- UUbeta %*% invhess
		     GbaseEdLam0 <- t(Gbase) %*% (E*S0i)
		     Fbeta   <-  Fbeta -  GbaseEdLam0
		     Fbetaiid <- UUbeta %*% t(Fbeta)
	          }

	  ###  \int_0^\tau (GBase(s) / S_0(s)) dN_i(s) - dLamba_i(s)
	  xx <- margsurv$cox.prep
	  S0i  <- S0i2 <- rep(0,length(xx$strata))
	  S0i[xx$jumps+1] <- 1/margsurv$S0
	  S0i2[xx$jumps+1] <- 1/margsurv$S0^2

	  GbasedN <- Gbase*S0i
	  GbasedLam0 <- apply(Gbase*S0i2,2,cumsumstrata,xx$strata,xx$nstrata)

	  if (fixbeta==0) rr <- c(xx$sign*exp(Z %*% coef(margsurv) + xx$offset)) else rr <- c(xx$sign*exp( xx$offset))
	  MGt <- (GbasedN[,drop=FALSE]-GbasedLam0*rr)*c(xx$weights)
	  MGti <- apply(MGt,2,sumstrata,id-1,max(id))

	  if (fixbeta==0) MGti <-  MGti + Fbetaiid
	  theta.iid <-  -theta.iid + MGti
	  val$theta.iid <- theta.iid
	  } #
   } #
   }

  hess <- val$hessian
  if (!is.na(sum(hess))) hessi <- lava::Inverse(val$hessian) else hessi <- diag(nrow(val$hessian))

# handling output
  loglikeiid <- NULL; robvar.theta <- NULL; likepairs <- NULL; marginal.surv <- psurvmarg; 
  marginal.trunc <- ptrunc; 

  if (iid==1) {
  if (dep.model==3 & pair.structure==1) likepairs <- val$likepairs
  if (dep.model==3 & two.stage==0) {
	  hessi <- -1*hessi
	  all.likepairs <- val$all.likepairs
	  colnames(all.likepairs) <- c("surv","dt","ds","dtds","cause1","cause2")
  }
     theta.iid <- val$theta.iid %*% hessi
### rownames not set to make more robust
### if (is.null(call.secluster)) rownames(theta.iid) <- unique(cluster.call) else rownames(theta.iid) <- unique(se.clusters)
     robvar.theta  <- crossprod(theta.iid)
     loglikeiid <- val$loglikeiid
    } else { all.likepairs <- NULL}
  var.theta <- robvar.theta
  if (is.null(robvar.theta)) var.theta <- hessi

  if (!is.null(colnames(theta.des))) thetanames <- colnames(theta.des) else thetanames <- paste("dependence",1:length(theta),sep="")
  theta <- matrix(theta,length(c(theta)),1)
  if (length(thetanames)==nrow(theta)) { rownames(theta) <- thetanames; rownames(var.theta) <- colnames(var.theta) <- thetanames; }


  ud <- list(theta=matrix(val$coef,ncol=1),coef=val$coef,score=val$gradient,hess=hess,hessi=hessi,var.theta=var.theta,
     model=model,robvar.theta=robvar.theta,
     loglike=val$loglike,loglikeiid=loglikeiid,likepairs=likepairs,
     theta.iid=theta.iid,thetanames=thetanames,
     score1=score1,Dscore=val$Dscore,
     marginal.surv=marginal.surv,marginal.trunc=marginal.trunc,
     se=diag(robvar.theta)^.5,score.iid=-score.iid,theta.des=theta.des,random.design=random.design)
  class(ud) <- "mets.twostage"
  attr(ud,"response") <- "survival"
  attr(ud,"Formula") <- formula
  attr(ud,"clusters") <- clusters
  attr(ud,"cluster.call") <- cluster.call
  attr(ud,"secluster") <- c(se.clusters)
  attr(ud,"sym")<-sym;
  attr(ud,"var.link")<-var.link;
  attr(ud,"var.par")<-var.par;
  attr(ud,"var.func")<-var.func;
  attr(ud,"ptheta")<-ptheta
  attr(ud,"antpers")<-antpers;
  attr(ud,"antclust")<-antclust;
  attr(ud,"Type") <- model
  attr(ud,"twostage") <- two.stage
  attr(ud,"additive-gamma") <- (dep.model==3)*1
  if (!is.null(marginal.trunc)) attr(ud,"trunclikeiid")<- val$trunclikeiid
  if (dep.model==3 & two.stage==0) attr(ud,"all.likepairs")<- all.likepairs
  if (dep.model==3 ) attr(ud,"additive.gamma.sum") <- additive.gamma.sum

if (dep.model==3 & pair.structure==0) {
       attr(ud, "pardes") <- theta.des
       attr(ud, "theta.des") <- theta.des
       attr(ud, "rv1") <-    random.design[1,]
    }
    if (dep.model==3 & pair.structure==1) {
       nrv <- clusterindex[1,6]
       attr(ud, "theta.des") <- matrix(theta.des[1,1:(nrv*ptheta)],nrv,ptheta)
       attr(ud, "pardes") <-    matrix(theta.des[1,1:(nrv*ptheta)],nrv,ptheta)
       attr(ud, "rv1") <-    random.design[1,]
       attr(ud, "nrv") <- clusterindex[1,6]
    }


  return(ud);
  #

} #


##' @title Twostage survival model for multivariate survival data
##' @description older extended version of survival.twostage with  more options  and different maximizer.
##' @keywords survival
##' @author Thomas Scheike
##' @param margsurv Marginal model
##' @param data data frame
##' @param method Scoring method "nr", "nlminb", "optimize", "nlm"
##' @param Nit Number of iterations
##' @param detail Detail
##' @param clusters Cluster variable
##' @param silent Debug information
##' @param weights Weights
##' @param control Optimization arguments
##' @param theta Starting values for variance components
##' @param theta.des design for dependence parameters, when pairs are given the indeces of the
##' theta-design for this pair, is given in pairs as column 5
##' @param var.link Link function for variance:  exp-link.
##' @param baseline.iid to adjust for baseline estimation, using phreg function on same data.
##' @param step Step size
##' @param model model
##' @param marginal.trunc marginal left truncation probabilities
##' @param marginal.survival optional vector of marginal survival probabilities
##' @param marginal.status related to marginal survival probabilities
##' @param strata strata for fitting, see example
##' @param se.clusters for clusters for se calculation with iid
##' @param numDeriv to get numDeriv version of second derivative, otherwise uses sum of squared score
##' @param random.design random effect design for additive gamma model, when pairs are given the
##' indeces of the pairs random.design rows are given as columns 3:4
##' @param pairs matrix with rows of indeces (two-columns) for the pairs considered in the pairwise
##' composite score, useful for case-control sampling when marginal is known.
##' @param dim.theta dimension of the theta parameter for pairs situation.
##' @param numDeriv.method uses simple to speed up things and second derivative not so important.
##' @param additive.gamma.sum for two.stage=0, this is specification of the lamtot in the models via
##' a matrix that is multiplied onto the parameters theta (dimensions=(number random effects x number
##' of theta parameters), when null then sums all parameters.
##' @param var.par is 1 for the default parametrization with the variances of the random effects,
##' var.par=0 specifies that the \eqn{\lambda_j}'s are used as parameters.
##' @param cr.models competing risks models for two.stage=0, should be given as a list with models for each cause
##' @param case.control assumes case control structure for "pairs" with second column being the probands,
##' when this options is used the twostage model is profiled out via the paired estimating equations for the survival model.
##' @param ascertained if the pair are sampled only when there is an event. This is in contrast to
##' case.control sampling where a proband is given. This can be combined with control probands. Pair-call
##' of twostage is needed  and second column of pairs are the first jump time with an event for ascertained pairs,
##' or time of control proband.
##' @param shut.up to make the program more silent in the context of iterative procedures for case-control
##' @export survival.twostageCC
survival.twostageCC <- function(margsurv,data=parent.frame(),
    method="nr",Nit=60,detail=0,clusters=NULL,
    silent=1,weights=NULL, control=list(),theta=NULL,theta.des=NULL,
    var.link=1,baseline.iid=1,step=0.5,model="clayton.oakes",
    marginal.trunc=NULL,marginal.survival=NULL,marginal.status=NULL,strata=NULL,
    se.clusters=NULL,numDeriv=0, random.design=NULL, pairs=NULL,dim.theta=NULL,
    numDeriv.method="simple",additive.gamma.sum=NULL,var.par=1,cr.models=NULL,
    case.control=0,ascertained=0,shut.up=0)
{#
# seting up design and variables
iid <- 1
two.stage <- 1; rate.sim <- 1; sym=1; var.func <- NULL
if (model=="clayton.oakes" || model=="gamma") dep.model <- 1
else if (model=="plackett" || model=="or") dep.model <- 2
else stop("Model must by either clayton.oakes or plackett \n");
start.time <- NULL; ptrunc <- NULL; psurvmarg <- NULL; status <- NULL; score.iid <- NULL
fix.baseline <- 0; convergence.bp <- 1;  ### to control if baseline profiler converges

  if ((!is.null(margsurv)) | (!is.null(marginal.survival))) fix.baseline <- 1
  antpers <- nrow(data); RR <-  rep(1,antpers);

  if (!is.null(margsurv)) {
     rrr <-  readmargsurv(margsurv,data,clusters)
     psurvmarg <- rrr$psurvmarg; ptrunc <- rrr$ptrunc; start.time <- rrr$entry;
     time2 <- rrr$exit; status <- rrr$status; clusters <- rrr$clusters
  }

  if (is.null(psurvmarg)) psurvmarg <- rep(1,antpers);
  if (!is.null(marginal.survival)) psurvmarg <- marginal.survival
  if (!is.null(marginal.trunc)) ptrunc <- marginal.trunc
  if (is.null(ptrunc)) ptrunc <- rep(1,length(psurvmarg))
  if (!is.null(marginal.status)) status <- marginal.status
  if (is.null(status) & is.null(cr.models)) stop("must give status variable for survival via either margninal model (margsurv), marginal.status or as cr.models \n");
  if (is.null(weights)==TRUE) weights <- rep(1,antpers);
  if (is.null(strata)==TRUE) strata<- rep(1,antpers);
  if (length(strata)!=antpers) stop("Strata must have length equal to number of data points \n");

  # cluster set up
  cluster.call <- clusters
  out.clust <- cluster.index(clusters);
  clusters <- out.clust$clusters
  maxclust <- out.clust$maxclust
  antclust <- out.clust$cluster.size
  clusterindex <- out.clust$idclust
  clustsize <- out.clust$cluster.size
  call.secluster <- se.clusters

  if (is.null(se.clusters)) { se.clusters <- clusters; antiid <- nrow(clusterindex);} else  {
      iids <-  unique(se.clusters);
      antiid <- length(iids);
      if (is.numeric(se.clusters)) se.clusters <-  fast.approx(iids,se.clusters)-1
       else se.clusters <- as.integer(factor(se.clusters, labels = seq(antiid)))-1
  }
  if (length(se.clusters)!=length(clusters)) stop("Length of seclusters and clusters must be same\n");

  #

  ### setting design for random variables, in particular with pairs are given
  ddd <- randomDes(dep.model,random.design,theta.des,theta,antpers,additive.gamma.sum,pairs,var.link,clusterindex,dim.theta)
  random.design=ddd$random.design;clusterindex=ddd$clusterindex;
  antpairs=ddd$antpairs;
  theta=ddd$theta;ptheta=ddd$ptheta;theta.des=ddd$theta.des
  pair.structure=ddd$pair.structure; dep.model=ddd$dep.model
  dim.rv <- ddd$dim.rv; additive.gamma.sum=ddd$additive.gamma.sum

  theta.score<-rep(0,ptheta);
  Stheta<-var.theta<-matrix(0,ptheta,ptheta);
  #

###  setting up arguments for Aalen baseline profile estimates
 if (fix.baseline==0)  { #  when baseline is estimated
 if (is.null(cr.models)) stop("give hazard models for different causes, ex cr.models=list(Surv(time,status==1)~+1,Surv(time,status==2)~+1) \n")

      if (case.control==1 || ascertained==1) { #

         data1 <-        data[pairs[,1],]
         data.proband <- data[pairs[,2],]
	 weights1 <-     weights[pairs[,1]]

        # setting up designs for jump times
        timestatus <- all.vars(cr.models[[1]])
        if (is.null(status)) status <- data[,timestatus[2]]
	alltimes      <- data[,timestatus[1]]
	times         <- data1[,timestatus[1]]
	lstatus       <- data1[,timestatus[2]]
	timescase     <- data.proband[,timestatus[1]]
	lstatuscase   <- data.proband[,timestatus[2]]
	### organize increments according to overall jump-times
	jumps         <- lstatus!=0
	dtimes        <- times[jumps]
	dtimescase    <- timescase[jumps]
	st            <- order(dtimes)
	dtimesst      <- dtimes[st]
	dtimesstcase  <- dtimescase[st]
	dcauses       <- lstatus[jumps][st]
	dcausescase   <- lstatuscase[jumps][st]
	ids           <- (1:nrow(data1))[jumps][st]
	### delayed entry for case because of ascertained sampling
	### controls are however control probands, and have entry=0
	entry <- timescase*lstatuscase
	data1$entry <- entry
	cr.models2 <- list()
	if (ascertained==1) {
           for (i in 1:length(cr.models)) {
	      cr.models2[[i]] <- update(cr.models[[i]],as.formula(paste("Surv(entry,",timestatus[1],",",timestatus[2],")~.",sep="")))
	    }
	} else cr.models2 <- cr.models

        nc <- 0
	for (i in 1:length(cr.models)) {
              X <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data1)$X
	      nc <- nc+ncol(X)
	}
	dBaalen    <- matrix(0,length(dtimes),nc)
	xjump      <- array(0,c(length(cr.models),nc,length(ids)))
	xjumpcase  <- array(0,c(length(cr.models),nc,length(ids)))

	## first compute marginal aalen models for all causes
	a <- list(); da <- list();
        ### starting values for iteration
        Bit <- Bitcase <- c()
	for (i in 1:length(cr.models)) { #
	      a[[i]]  <-  timereg::aalen(as.formula(cr.models2[[i]]),data=data1,robust=0,weights=weights1)
	      da[[i]] <- apply(a[[i]]$cum[,-1,drop=FALSE],2,diff)
	      jumpsi  <- (1:length(dtimes))[dcauses==i]
              X       <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data1)$X
              Xcase   <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data.proband)$X
	      if (i==1) fp <- 1
	      indexc  <- fp:(fp+ncol(X)-1)
	      dBaalen[jumpsi,indexc] <- da[[i]]
              xjump[i,indexc,]       <- t(X[ids,])
              xjumpcase[i,indexc,]   <- t(Xcase[ids,])
	      fp <- fp+ncol(X)
	      ### starting values
	      Bit <- cbind(Bit,cpred(a[[i]]$cum,dtimesst)[,-1,drop=FALSE])
       } #
       Bit.ini <- Bit
       #

####  organize subject specific random variables and design
####  already done in basic pairwise setup
	mtheta.des <- theta.des[ids,drop=FALSE]
	### array randomdes to jump times (subjects)
	mrv.des <- random.design[ids,drop=FALSE]
###	nrv.des <- pairs.rvs[ids]
      } #

 }  else {
	 mrv.des <- mtheta.des <- margthetades <- matrix(0,1,1);
	 xjump <- dBaalen <- matrix(0,1,1); nrv.des <- 3
 } #

  loglike <- function(par)
  { #

     if (pair.structure==0 | dep.model!=3) Xtheta <- as.matrix(theta.des) %*% matrix(c(par),nrow=ptheta,ncol=1);
     if (pair.structure==1 & dep.model==3) Xtheta <- matrix(0,antpers,1); ## not needed

      if (var.link==1 & dep.model==3) epar <- c(exp(par)) else epar <- c(par)
      partheta <- epar

      if (var.par==1 & dep.model==3) {
	 ## from variances to
	 if (is.null(var.func)) {
	    sp <- sum(epar)
	    partheta <- epar/sp^2
         } else partheta <- epar ## par.func(epar)
      }


      if (pair.structure==0) {
	      outl<-.Call("twostageloglikeRV", # only two stage model for this option
	      icause=status,ipmargsurv=psurvmarg,
	      itheta=c(partheta), ithetades=theta.des,
	      icluster=clusters,iclustsize=clustsize,iclusterindex=clusterindex,
	      ivarlink=var.link,iid=iid,iweights=weights,isilent=silent,idepmodel=dep.model,
	      itrunkp=ptrunc,istrata=as.numeric(strata),iseclusters=se.clusters,iantiid=antiid,
	      irvdes=random.design,iags=additive.gamma.sum,iascertained=ascertained,
              PACKAGE="mets") #
      }
      else { # pair-structure
	     ## twostage model,    case.control option,   profile out baseline
	     ## conditional model, case.control option,   profile out baseline
         if (fix.baseline==0) ## if baseline is not given
         {
          cum1 <- cbind(dtimesst,Bit)
          if ( (case.control==1 || ascertained==1) & (convergence.bp==1)) { #  profiles out baseline under case-control/ascertainment sampling

###	      ## initial values , only one cr.model for survival
             if (detail>1) plot(dtimesst,Bit,type="l",main="Bit")

             if (ncol(Bit)==0) Bit <- Bit.ini
             Bitcase  <- cpred(cbind(dtimesst,Bit),dtimesstcase)[,-1,drop=FALSE]
             Bitcase <- .Call("MatxCube",Bitcase,dim(xjumpcase),xjumpcase,PACKAGE="mets")$X

             for (j in 1:5) { # profile via iteration
                   cncc <- .Call("BhatAddGamCC",1,dBaalen,dcauses,dim(xjump),xjump,
		                 c(partheta),dim(mtheta.des),mtheta.des,additive.gamma.sum,var.link,
				 dim(mrv.des),mrv.des,nrv.des,1,Bit,Bitcase,dcausescase,PACKAGE="mets")
		   d <- max(abs(Bit-cncc$B))
		   if (detail>1) print(d)
		   Bit <- cncc$B
		   if (detail>1) print(summary(cncc$caseweights))
		   cum1 <- cbind(dtimesst,cncc$B)
		   Bitcase  <-cbind(cpred(cum1,dtimesstcase)[,-1])
		   if (detail>1) lines(dtimesst,Bit,col=j+1);
		   if (is.na(d)) {
			  if (shut.up==0) cat("Baseline profiler gives missing values\n");
		          Bit <- Bit.ini; cum1 <- cbind(dtimesst,Bit); convergence.bp <<- 0; break;
		   }
		   Bitcase <- .Call("MatxCube",Bitcase,dim(xjumpcase),xjumpcase,PACKAGE="mets")$X
		   if (d<0.00001) break;
           } #


	   nulrow    <- rep(0,ncol(Bit)+1)
	   pbases    <- cpred(rbind(nulrow,cbind(dtimesst,Bit)),alltimes)[,-1,drop=FALSE]
           X         <- timereg::aalen.des(as.formula(cr.models[[1]]),data=data)$X
	   psurvmarg <- exp(-apply(X*pbases,1,sum))  ## psurv given baseline
	   if (ascertained==1) {
              Xcase     <- timereg::aalen.des(as.formula(cr.models[[1]]),data=data.proband)$X
              X         <- timereg::aalen.des(as.formula(cr.models[[1]]),data=data1)$X
	      pba.case  <- cpred(rbind(nulrow,cbind(dtimesst,Bit)),entry)[,-1,drop=FALSE]
	      ptrunc    <- rep(0,nrow(data))
	      ### for control probands ptrunc=1, thus no adjustment
	      ptrunc[pairs[,1]]     <- exp(-apply(X*    pba.case,1,sum)*lstatuscase) ## delayed entry at time of ascertainment proband
 	      ptrunc[pairs[,2]]     <- exp(-apply(Xcase*pba.case,1,sum)*lstatuscase)
	   }

          } #
          }

          outl<-.Call("twostageloglikeRVpairs", #
          icause=status,ipmargsurv=psurvmarg, itheta=c(partheta), ithetades=theta.des,
          icluster=clusters,iclustsize=clustsize,iclusterindex=clusterindex, ivarlink=var.link,
	  iiid=iid,iweights=weights,isilent=silent, idepmodel=dep.model,
	  itrunkp=ptrunc,istrata=as.numeric(strata), iseclusters=se.clusters,iantiid=antiid,
	  irvdes=random.design, iags=additive.gamma.sum, iascertained=ascertained,PACKAGE="mets")
	  #

          if (fix.baseline==0)  {
              outl$baseline <- cum1;
	      outl$marginal.surv <- psurvmarg;
	      outl$marginal.trunc <- ptrunc
	  }

      } #

    if (detail==3) print(c(partheta,outl$loglike))

    ## variance parametrization, and inverse.link
    if (dep.model==3) {
    if (var.par==1) {
         ## from variances to and with sum for all random effects
         if (is.null(var.func)) {
	 if (var.link==0)  {
###		 print(c(sp,epar))
             mm <- matrix(-epar*2*sp,length(epar),length(epar))
	     diag(mm) <- sp^2-epar*2*sp
###	    print(mm)
	 } else {
            mm <- -c(epar) %o% c(epar)*2*sp
            diag(mm) <- epar*sp^2-epar^2*2*sp
###	    print(mm)
	 }
	    mm <- mm/sp^4
	 } else mm  <- numDeriv::hessian(var.func,par)
      } else {
	   if (var.link==0) mm <- diag(length(epar)) else
			  mm <- diag(length(c(epar)))*c(epar)
      }
      }

    if (dep.model==3) {
       outl$score <-  t(mm) %*% outl$score
       outl$Dscore <- t(mm) %*% outl$Dscore %*% mm
       if (iid==1) { outl$score.iid <- t(t(mm) %*% t(outl$score.iid))
               if (inherits(margsurv,"phreg")) {
	          outl$D1thetal  <- t(t(mm) %*% t(outl$D1thetal))
	          outl$D2thetal  <- t(t(mm) %*% t(outl$D2thetal))
	       }
       }
    }

    attr(outl,"gradient") <-outl$score
    if (oout==0) ret <- c(-1*outl$loglike) else if (oout==1) ret <- sum(outl$score^2) else if (oout==2) ret <- outl else ret <- outl$score
    return(ret)
  } #

  if (method=="optimize" && ptheta!=1) {
     cat("optimize only works for d==1, score.mehod set to nlminb \n");
     method <- "nlminb";
  }

  score1 <- NULL
  theta.iid <- NULL
  logl <- NULL
  p <- theta

    if (method=="nr") { #
        oout <- 2;  ### output control for obj
        if (Nit>0)
            for (i in 1:Nit)
            {
                    out <- loglike(p)
		    ## updating starting values for cumulative baselines
		    if (fix.baseline==0) Bit <- out$baseline[,-1,drop=FALSE]
		    if (fix.baseline==1) hess <-  out$Dscore
                    ###  uses simple second derivative for computing derivative of score
		    if (numDeriv==2 || (((fix.baseline==0)) & (i==1))) {
			    oout <- 3
			    hess <- numDeriv::jacobian(loglike,p,method="simple")
			    oout <- 2
		    }
                    if (!is.na(sum(hess))) hessi <- lava::Inverse(hess) else hessi <- hess
                    if (detail==1) {#
                        cat(paste("Fisher-Scoring ===================: it=",i,"\n"));
                        cat("theta:");print(c(theta))
                        cat("loglike:");cat(c(out$loglike),"\n");
                        cat("score:");cat(c(out$score),"\n");
                        cat("hess:\n"); cat(hess,"\n");
                    }#
                    delta <- step*( hessi %*% out$score )
		    ### update p, but note that score and derivative in fact related to previous p
		    ### unless Nit=0,
	            if (Nit>0) {
                    p <- p - delta
                    theta <- p;
		    }
                    if (is.nan(sum(out$score))) break;
                    if (sum(abs(out$score))<0.00001) break;
                    if (max(theta)>20 & var.link==1) { cat("theta too large lacking convergence \n"); break; }
           }
        if (!is.nan(sum(p))) {
            if (detail==1 && iid==1) cat("iid decomposition\n");
            out <- loglike(p)
            logl <- out$loglike
            score1 <- score <- out$score
            oout <- 0;
            hess1 <- hess  <- -1*out$Dscore
           if (iid==1) {  #
		score.iid <- out$score.iid
		out$theta.iid <- score.iid
		theta.iid <- score.iid
		ucc <-  unique(cluster.call)
		if (length(ucc)== nrow(theta.iid))
		    rownames(theta.iid) <- unique(cluster.call)
		### lets iid for theta be just score to start, correction for marginal for phreg call

                if (inherits(margsurv,"phreg") & baseline.iid==1) {  # adjust for baseline when phreg is used

                  if ((margsurv$no.opt) | is.null(margsurv$coef)) fixbeta<- 1 else fixbeta <- 0
		  xx <- margsurv$cox.prep
                  id  <-  xx$id+1
		  cumhazt <- c(rrr$cum)
		  rr <- c(rrr$RR)

		  ## these refers to id given in cluster.call
		  xx <- margsurv$cox.prep
		  D1thetal<- out$D1thetal
		  D2thetal<- out$D2thetal
		  Dlamthetal <- -(D1thetal+D2thetal)
		  ## ordered as original data
                  Fbeta <- t(Dlamthetal) %*% (margsurv$X *cumhazt*psurvmarg)

                  ### timeordered
		  ## t- not needed because using S(T-) for survival already
                  Gbasedt <- Dlamthetal[xx$ord+1,,drop=FALSE]*psurvmarg[xx$ord+1]*rr[xx$ord+1]
                  Gbase <- apply(Gbasedt,2,revcumsumstrata,xx$strata,xx$nstrata)
###		  print(Gbase)

                  if (fixbeta==0) {  ### iid after beta of marginal model
		     Z <- xx$X
		     U <- E <- matrix(0,nrow(xx$X),margsurv$p)
		     E[xx$jumps+1,] <- margsurv$E
		     U[xx$jumps+1,] <- margsurv$U
		     invhess <- -solve(margsurv$hessian)
		     S0i <- rep(0,length(xx$strata))
		     S0i[xx$jumps+1] <- 1/margsurv$S0
		     cumhaz <- c(cumsumstrata(S0i,xx$strata,xx$nstrata))
		     EdLam0 <- apply(E*S0i,2,cumsumstrata,xx$strata,xx$nstrata)
	             rr <- c(xx$sign*exp(Z %*% coef(margsurv) + xx$offset))
		     ### Martingale for all subjects
		     MGt <- U[,drop=FALSE]-(Z*cumhaz-EdLam0)*rr*c(xx$weights)
		     UUbeta <- apply(MGt,2,sumstrata,id-1,max(id))
		     UUbeta <- UUbeta %*% invhess
		     GbaseEdLam0 <- t(Gbase) %*% (E*S0i)
		     Fbeta   <-  Fbeta -  GbaseEdLam0
		     Fbetaiid <- UUbeta %*% t(Fbeta)
	          }

	  ###  \int_0^\tau (GBase(s) / S_0(s)) dN_i(s) - dLamba_i(s)
	  xx <- margsurv$cox.prep
	  S0i  <- S0i2 <- rep(0,length(xx$strata))
	  S0i[xx$jumps+1] <- 1/margsurv$S0
	  S0i2[xx$jumps+1] <- 1/margsurv$S0^2

	  GbasedN <- Gbase*S0i
	  GbasedLam0 <- apply(Gbase*S0i2,2,cumsumstrata,xx$strata,xx$nstrata)

	  if (fixbeta==0) rr <- c(xx$sign*exp(Z %*% coef(margsurv) + xx$offset)) else rr <- c(xx$sign*exp( xx$offset))
	  MGt <- (GbasedN[,drop=FALSE]-GbasedLam0*rr)*c(xx$weights)
	  MGti <- apply(MGt,2,sumstrata,id-1,max(id))

	  if (fixbeta==0) MGti <-  MGti + Fbetaiid
	  theta.iid <-  -theta.iid + MGti
	  out$theta.iid <- theta.iid
	  } #
   } #


            if (detail==1 && iid==1) cat("finished iid decomposition\n");
	    ### for profile solutions update second derivative at final
	    if (numDeriv==2 || ((fix.baseline==0))) {
                    oout <- 3
                    hess <- numDeriv::jacobian(loglike,p,method="simple")
		    oout <- 2
		    }
        }
        if (numDeriv>=1) {
            if (detail==1 ) cat("starting numDeriv for second derivative \n");
            oout <- 0;
            score2 <- numDeriv::jacobian(loglike,p)
	    score1 <- matrix(score2,ncol=1)
            oout <- 3
            hess <- numDeriv::jacobian(loglike,p,method="simple")
            if (detail==1 ) cat("finished numDeriv for second derivative \n");
        }
        if (detail==1 & Nit==0) {#
            cat(paste("Fisher-Scoring ===================: final","\n"));
            cat("theta:");print(c(p))
            cat("loglike:");cat(c(out$loglike),"\n");
            cat("score:");cat(c(out$score),"\n");
            cat("hess:\n"); cat(hess,"\n");
        }#
        if (!is.na(sum(hess))) hessi <- lava::Inverse(hess) else hessi <- diag(nrow(hess))
        #
  } else if (method=="nlminb") { # nlminb optimizer
    oout <- 0;
    tryCatch(opt <- nlminb(theta,loglike,control=control),error=function(x) NA)
    if (detail==1) print(opt);
    if (detail==1 && iid==1) cat("iid decomposition\n");
    oout <- 2
    theta <- opt$par
    out <- loglike(opt$par)
    logl <- out$loglike
    score1 <- score <- out$score
    hess1 <- hess <- -1* out$Dscore
    if (iid==1) { theta.iid <- out$score.iid; score.iid <- out$score.iid }
    if (numDeriv==1) {
    if (detail==1 ) cat("numDeriv hessian start\n");
      oout <- 3; ## returns score
      hess <- numDeriv::jacobian(loglike,opt$par)
    if (detail==1 ) cat("numDeriv hessian done\n");
    }
    hessi <- lava::Inverse(hess);
  #
  } else if (method=="optimize" && ptheta==1) { #  optimizer
    oout <- 0;
    if (var.link==1) {mino <- -20; maxo <- 10;} else {mino <- 0.001; maxo <- 100;}
    tryCatch(opt <- optimize(loglike,c(mino,maxo)));
    if (detail==1) print(opt);
    opt$par <- opt$minimum
    theta <- opt$par
    if (detail==1 && iid==1) cat("iid decomposition\n");
    oout <- 2
    out <- loglike(opt$par)
    logl <- out$loglike
    score1 <- score <- out$score
    hess1 <- hess <- -1* out$Dscore
    if (numDeriv==1) {
    if (detail==1 ) cat("numDeriv hessian start\n");
      oout <- 3;  ## to get jacobian
      hess <- numDeriv::jacobian(loglike,theta)
    if (detail==1 ) cat("numDeriv hessian done\n");
    }
    hessi <- lava::Inverse(hess);
    if (iid==1) { theta.iid <- out$score.iid; score.iid <- out$score.iid }
  #
  } else if (method=="nlm") { # nlm optimizer
    iid <- 0; oout <- 0;
    tryCatch(opt <- nlm(loglike,theta,hessian=TRUE,print.level=detail),error=function(x) NA)
    iid <- 1;
    hess <- opt$hessian
    score <- opt$gradient
    if (detail==1) print(opt);
    hessi <- lava::Inverse(hess);
    theta <- opt$estimate
    if (detail==1 && iid==1) cat("iid decomposition\n");
    oout <- 2
    out <- loglike(opt$estimate)
    logl <- out$loglike
    score1 <- out$score
    hess1 <- out$Dscore
    if (iid==1) { theta.iid <- out$score.iid; score.iid <- out$score.iid }
  #
  }  else stop("method = optimize(dim=1) nlm nlminb nr\n");

# handling output
  loglikeiid <- NULL
  robvar.theta <- NULL
  likepairs <- NULL
  if (fix.baseline==1)  { marginal.surv <- psurvmarg; marginal.trunc <- ptrunc; } else { marginal.surv <- out$marginal.surv; marginal.trunc <- out$marginal.trunc;}

  if (iid==1) {
  if (dep.model==3 & pair.structure==1) likepairs <- out$likepairs
  if (dep.model==3 & two.stage==0) {
	  hessi <- -1*hessi
	  all.likepairs <- out$all.likepairs
	  colnames(all.likepairs) <- c("surv","dt","ds","dtds","cause1","cause2")
  }
     theta.iid <- out$theta.iid %*% hessi
### rownames not set to make more robust
###     if (is.null(call.secluster)) rownames(theta.iid) <- unique(cluster.call) else rownames(theta.iid) <- unique(se.clusters)
     robvar.theta  <- crossprod(theta.iid)
     loglikeiid <- out$loglikeiid
    } else { all.likepairs <- NULL}
  var.theta <- robvar.theta
  if (is.null(robvar.theta)) var.theta <- hessi

  if (!is.null(colnames(theta.des))) thetanames <- colnames(theta.des) else thetanames <- paste("dependence",1:length(theta),sep="")
  theta <- matrix(theta,length(c(theta)),1)
  if (length(thetanames)==nrow(theta)) { rownames(theta) <- thetanames; rownames(var.theta) <- colnames(var.theta) <- thetanames; }
  if (!is.null(logl)) logl <- -1*logl

  if (convergence.bp==0) theta <- rep(NA,length(theta))
  ud <- list(theta=theta,score=score,hess=hess,hessi=hessi,var.theta=var.theta,
	     model=model,robvar.theta=robvar.theta,
             theta.iid=theta.iid,loglikeiid=loglikeiid,likepairs=likepairs,
	     thetanames=thetanames,loglike=logl,score1=score1,Dscore=out$Dscore,
	     marginal.surv=marginal.surv,marginal.trunc=marginal.trunc,
	     baseline=out$baseline,se=diag(robvar.theta)^.5,
	     score.iid=-score.iid,theta.des=theta.des,random.design=random.design)
  class(ud) <- "mets.twostage"
  attr(ud,"response") <- "survival"
  attr(ud,"Formula") <- formula
  attr(ud,"clusters") <- clusters
  attr(ud,"cluster.call") <- cluster.call
  attr(ud,"secluster") <- c(se.clusters)
  attr(ud,"sym")<-sym;
  attr(ud,"var.link")<-var.link;
  attr(ud,"var.par")<-var.par;
  attr(ud,"var.func")<-var.func;
  attr(ud,"ptheta")<-ptheta
  attr(ud,"antpers")<-antpers;
  attr(ud,"antclust")<-antclust;
  attr(ud,"Type") <- model
  attr(ud,"twostage") <- two.stage
  attr(ud,"additive-gamma") <- (dep.model==3)*1
  if (!is.null(marginal.trunc)) attr(ud,"trunclikeiid")<- out$trunclikeiid
  if (dep.model==3 & two.stage==0) attr(ud,"all.likepairs")<- all.likepairs
  if (dep.model==3 ) attr(ud,"additive.gamma.sum") <- additive.gamma.sum

if (dep.model==3 & pair.structure==0) {
       attr(ud, "pardes") <- theta.des
       attr(ud, "theta.des") <- theta.des
       attr(ud, "rv1") <-    random.design[1,]
    }
    if (dep.model==3 & pair.structure==1) {
       nrv <- clusterindex[1,6]
       attr(ud, "theta.des") <- matrix(theta.des[1,1:(nrv*ptheta)],nrv,ptheta)
       attr(ud, "pardes") <-    matrix(theta.des[1,1:(nrv*ptheta)],nrv,ptheta)
       attr(ud, "rv1") <-    random.design[1,]
       attr(ud, "nrv") <- clusterindex[1,6]
    }



  return(ud);
  #

} #


##' @export
randomDes <- function(dep.model,random.design,theta.des,theta,antpers,ags,pairs,var.link,clusterindex,dim.theta)
{
   additive.gamma.sum <- ags

  if (!is.null(random.design)) { ### different parameters for Additive random effects 
     dep.model <- 3
     dim.rv <- ncol(random.design);
     if (is.null(theta.des)) theta.des<-diag(dim.rv);
  } else {
      random.design <- matrix(0,1,1);  dim.rv <- 1;
      additive.gamma.sum <- matrix(1,1,1);
  }

  if (is.null(theta.des)) ptheta<-1;
  if (is.null(theta.des)) theta.des<-matrix(1,antpers,ptheta); ###  else theta.des<-as.matrix(theta.des);
  ptheta<-ncol(theta.des)

  if (is.matrix(pairs)) {
       if ( ncol(pairs)==6)  ptheta<- dim.theta else ptheta<-ncol(theta.des)
  }

  theta.des <- as.matrix(theta.des)
  if (is.null(theta)==TRUE) {
         if (var.link==1) theta<- rep(-0.7,ptheta);
         if (var.link==0) theta<- rep(exp(-0.7),ptheta);
  }

  if (length(theta)!=ptheta) {
         theta<-rep(theta[1],ptheta);
  }
  theta.score<-rep(0,ptheta);Stheta<-var.theta<-matrix(0,ptheta,ptheta);
  antpairs <- 1; ### to define

  if (is.null(additive.gamma.sum)) additive.gamma.sum <- matrix(1,dim.rv,ptheta)
  if (!is.null(pairs)) { pair.structure <- 1;} else  pair.structure <- 0;

  if (pair.structure==1 & dep.model==3) { #
  antpairs <- nrow(pairs);

  if ( (ncol(pairs)==2))
  {
      pairs <- cbind(pairs,pairs,1,ncol(random.design))
      theta.des <- matrix(c(theta.des),1,length(c(theta.des)))
  }
   clusterindex <- cbind(pairs[,1:5]-1,ncol(random.design))
  } #

  if (pair.structure==1 & dep.model!=3) { #
       antpairs <- nrow(pairs);
       if ((ncol(pairs)==2)) { ## when only pairs are given we refer to rows of theta.des from 3 column: same as index of pair-1
       pairs <- cbind(pairs,pairs[,1])
       }
       if (ncol(pairs)!=3) stop("with pairstructure and theta.des, 3rd column of pairs must give relevant design from theta.des\n")
       ## index in c are Rindex -1
       clusterindex <- pairs-1;
  }

###  print(head(pairs)); print(head(theta.des))
###  print(dim(pairs)); print(dim(theta.des))

  return(list(random.design=random.design,clusterindex=clusterindex,
	 antpairs=antpairs,pair.structure=pair.structure,
	 dep.model=dep.model,dim.rv=dim.rv, additive.gamma.sum=additive.gamma.sum,
         theta=theta,ptheta=ptheta,theta.des=theta.des))

}



##' @export
readmargsurv <- function(margsurv,data,clusters)
{
start.time <- 0

if (inherits(margsurv,c("aalen","cox.aalen"))) {
	 formula<-attr(margsurv,"Formula");
	 beta.fixed <- attr(margsurv,"beta.fixed")
	 if (is.null(beta.fixed)) beta.fixed <- 1;
	 ldata <- timereg::aalen.des(formula,data=data,model="cox.aalen");
	 id <- attr(margsurv,"id");
	 mclusters <- attr(margsurv,"cluster.call")
	 X<-ldata$X;
	 time<-ldata$time2;
	 Z<-ldata$Z;
	 status<-ldata$status;
	 time2 <- attr(margsurv,"stop");
	 start.time <- attr(margsurv,"start")
	 antpers<-nrow(X);
         if (is.null(Z)==TRUE) {npar<-TRUE; semi<-0;}  else {
		     Z<-as.matrix(Z); npar<-FALSE; semi<-1;}
         if (npar==TRUE) {Z<-matrix(0,antpers,1); pz<-1; fixed<-0;} else {fixed<-1;pz<-ncol(Z);}
	 px<-ncol(X);

         if (is.null(clusters) && is.null(mclusters)) stop("No cluster variabel specified in marginal or twostage call\n");
         if (is.null(clusters)) clusters <- mclusters
	 if (nrow(X)!=length(clusters)) stop("Length of Marginal survival data not consistent with cluster length\n"); 
	 } else if (inherits(margsurv,"phreg")) { 
            ### setting up newdata with factors and strata
                antpers <- nrow(data)
		 rr <- readPhreg(margsurv,data,nr=FALSE)
		 time2 <- rr$exit
		 if (!is.null(rr$entry)) start.time <- rr$entry
		 else start.time <- rep(0,antpers);
		 status <- rr$status
	         if (is.null(clusters)) clusters <- rr$clusters
	
} else { ### coxph 
	  antpers <- margsurv$n
	  id <- 0:(antpers-1)
	  mt <- model.frame(margsurv)
	  Y <- model.extract(mt, "response")
	  if (!inherits(Y, "Surv")) stop("Response must be a survival object")
	  if (attr(Y, "type") == "right") {
	      time2 <- Y[, "time"];
	      status <- Y[, "status"]
		start.time <- rep(0,antpers);
		} else {
		 start.time <- Y[, 1];
		 time2 <- Y[, 2];
		 status <- Y[, 3];
		}
	   Z <- matrix(1,antpers,length(coef(margsurv)));

	   if (is.null(clusters)) stop("must give clusters for coxph\n");
	   cluster.call <- clusters;
	   X <- matrix(1,antpers,1); ### Z <- matrix(0,antpers,1); ### no use for these
	   px <- 1; pz <- ncol(Z);
	   start <- rep(0,antpers);
	   beta.fixed <- 0
	   semi <- 1
}

  if (!is.null(start.time)) {
      if (any(abs(start.time)>0)) lefttrunk <- 1  else lefttrunk <- 0;
  } else lefttrunk <- 0

if (!is.null(margsurv))  {
  if (inherits(margsurv,c("aalen","cox.aalen")))  { 
         resi <- timereg::residualsTimereg(margsurv,data=data)
         RR  <- resi$RR
	 cum <- resi$cumhaz/RR
	 psurvmarg <- exp(-resi$cumhaz);
         ptrunc <- rep(1,length(psurvmarg));
	 if (lefttrunk==1) ptrunc <- exp(-resi$cumhazleft); 
  } else if (inherits(margsurv,"coxph")) {  
    ### some problems here when data is different from data used in margsurv
       residuals <- residuals(margsurv)
       cum <- cumhaz <- status-residuals
       psurvmarg <- exp(-cumhaz);
       cumhazleft <- rep(0,antpers)
       ptrunc <- rep(1,length(psurvmarg));
       RR<- exp(margsurv$linear.predictors+sum(margsurv$means*coef(margsurv)))
       cum <- cum/RR
        if ((lefttrunk==1)) {
         stop("Use cox.aalen function for truncation case \n");
         baseout <- survival::basehaz(margsurv,centered=FALSE);
         cumt <- cbind(baseout$time,baseout$hazard)
	 cumt <- cpred(cumt,start.time)[,2]
	 ptrunc <- exp(-cumt * RR)
	}
  } else if (inherits(margsurv,"phreg")) {

	ppsurvmarg <- predict(margsurv,data,tminus=TRUE,times=time2,individual.time=TRUE,se=FALSE)
        psurvmarg <- ppsurvmarg$surv
        cum <- ppsurvmarg$cumhaz
	RR <- ppsurvmarg$RR
        ptrunc <- rep(1,length(psurvmarg));
        if ((lefttrunk==1)) {
	  ptrunc <- predict(margsurv,data,tminus=TRUE,times=start.time,individual.time=TRUE,se=FALSE)$surv
	}
  }
}

if (is.null(clusters) &  (!inherits(margsurv,"phreg"))) stop("must give clusters")

return(list(psurvmarg=psurvmarg,ptrunc=ptrunc,entry=start.time,exit=time2,
	    status=status,clusters=clusters,cum=cum,RR=RR))

} #

##' @title Twostage survival model fitted by pseudo MLE
##'
##' @description
##' Fits Clayton-Oakes clustered  survival data
##' using marginals that are on Cox form in the likelihood for the dependence parameter
##' as in Glidden (2000). The dependence can be modelled via  a
##' \enumerate{
##' \item  Regression design on dependence parameter.
##' }
##'
##' We allow a regression structure for the indenpendent gamma distributed
##' random effects  and their variances that may depend on cluster covariates. So
##' \deqn{
##'  \theta = h( z_j^T \alpha)
##' }
##' where \eqn{z} is specified by theta.des . The link function can be the exp when var.link=1
##' @references
##'
##' Measuring early or late dependence for bivariate twin data
##' Scheike, Holst, Hjelmborg (2015), LIDA
##'
##' Twostage modelling of additive gamma frailty models for survival data.
##' Scheike and Holst, working paper
##'
##' Shih and Louis (1995) Inference on the association parameter in copula models for bivariate
##' survival data, Biometrics, (1995).
##'
##' Glidden (2000), A Two-Stage estimator of the dependence
##' parameter for the Clayton Oakes model, LIDA, (2000).
##'
##' @examples
##' data(diabetes)
##' dd <- phreg(Surv(time,status==1)~treat+cluster(id),diabetes)
##' oo <- twostageMLE(dd,data=diabetes)
##' summary(oo)
##'
##' theta.des <- model.matrix(~-1+factor(adult),diabetes)
##'
##' oo <-twostageMLE(dd,data=diabetes,theta.des=theta.des)
##' summary(oo)
##' @keywords survival
##' @author Thomas Scheike
##' @param margsurv Marginal model from phreg
##' @param data data frame
##' @param theta Starting values for variance components
##' @param theta.des design for dependence parameters, when pairs are given this is could be a
##' (pairs) x (numer of parameters)  x (max number random effects) matrix
##' @param var.link Link function for variance  if 1 then uses exp link
##' @param method type of opitmizer, default is Newton-Raphson "NR"
##' @param no.opt to not optimize, for example to get score and iid for specific theta
##' @param weights cluster specific weights, but given with length equivalent to data-set, weights for score equations
##' @param se.cluster specifies how the influence functions are summed before squared when computing the variance. Note that the id from the marginal model is used to construct MLE, and then these scores can be summed with the se.cluster argument. 
##' @param ... arguments to be passed to  optimizer
##' @export
twostageMLE <-function(margsurv,data=parent.frame(),
theta=NULL,theta.des=NULL,var.link=0,method="NR",no.opt=FALSE,weights=NULL,se.cluster=NULL,...)
{
 if (!inherits(margsurv,"phreg"))  stop("Must use phreg for this \n");

 clusters <- margsurv$cox.prep$id
 n <- nrow(margsurv$cox.prep$X)


 if (is.null(theta.des)==TRUE) ptheta<-1;
 if (is.null(theta.des)==TRUE) theta.des<-matrix(1,n,ptheta) else theta.des<-as.matrix(theta.des);
  ptheta<-ncol(theta.des);
  if (nrow(theta.des)!=n) stop("Theta design does not have correct dim");

  if (is.null(theta)==TRUE) {
      if (var.link==1) theta<- rep(0.0,ptheta);
      if (var.link==0) theta<- rep(1.0,ptheta);
  }
  if (length(theta)!=ptheta) theta<-rep(theta[1],ptheta);
  theta.score<-rep(0,ptheta);Stheta<-var.theta<-matrix(0,ptheta,ptheta);

  max.clust <- length(unique(clusters))
  theta.iid <- matrix(0,max.clust,ptheta)

  xx <- margsurv$cox.prep
  nn <- length(xx$strata)

  if (is.null(weights)) weights <-  rep(1,nn)
  if (length(weights)!=nn) stop("Weights do not have right length")

  statusxx <- rep(0,length(xx$strata))
  statusxx[xx$jumps+1] <- 1
  xx$status <- statusxx
  mid <- max(xx$id)+1

  ## Ni.(t-), Ni.(t)
  Nsum <- cumsumstratasum(statusxx,xx$id,mid,type="all")
  ## Ni.(tau)
  Ni.tau <- sumstrata(statusxx,xx$id,mid)
  ## cumahz(T_i-)
  S0i2 <- S0i <- rep(0, length(xx$strata))
  S0i[xx$jumps + 1] <- 1/margsurv$S0
  cumhazD <- cumsumstratasum(S0i, xx$strata, xx$nstrata)$lagsum
  if (!is.null(margsurv$coef)) RR <- exp(xx$X %*% margsurv$coef) else RR  <-  rep(1,nn)
  H <- c(cumhazD * RR)

  cc <- cluster.index(xx$id)
  firstid <- cc$firstclustid+1
  if (max(cc$cluster.size)==1) stop("No clusters !, maxclust size=1\n");

  ### order after time-sorted data
  theta.des <- theta.des[xx$ord+1,,drop=FALSE]
  weightsid <- weights <- weights[xx$ord+1]
  ### clusterspecific weights
  weights <- weights[firstid]

  thetaX<- as.matrix(theta.des[firstid,,drop=FALSE])

obj <- function(par,all=FALSE)
{

if (var.link==1) epar <- c(exp(c(par))) else epar <- c(par)

thetav <- c(as.matrix(theta.des) %*%  c(epar))
thetai <-thetav[firstid]

Hs <- sumstrata(H*exp(thetav*H),xx$id,mid)
R <- sumstrata((exp(thetav*H)-1),xx$id,mid) + 1
H2 <- sumstrata(H^2*exp(thetav*H),xx$id,mid)

l1 <- sumstrata(log(1+thetav*Nsum$lagsum)*statusxx,xx$id,mid)
l2 <- sumstrata(statusxx*H,xx$id,mid)
l3 <- -((thetai)^{-1}+Ni.tau) * log(R)
logliid <- (l1 + thetai* l2 + l3)*c(weights)
logl <- sum(logliid)
ploglik <- logl

## first derivative
l1s <- sumstrata(Nsum$lagsum/(1+thetav*Nsum$lagsum)*statusxx,xx$id,mid)
l2s <- (thetai^{-2}) * log(R)
l3s <- -(thetai^{-1}+Ni.tau) * Hs / R
Dltheta <- (l1s+l2s+l3s+l2)*c(weights)
scoreiid <- thetaX* c(Dltheta)

## second derivative
D2N <- -sumstrata(Nsum$lagsum^2/(1+thetav*Nsum$lagsum)^2*statusxx,xx$id,mid)
Dhes  <- c(D2N+(2/thetai^2)*Hs/R-(2/thetai^3)*log(R)-(1/thetai+Ni.tau)*(H2*R-Hs*Hs)/R^2)
Dhes  <-  Dhes * c(weights)

if (var.link==1) {
scoreiid  <-  scoreiid * c(thetai);  
Dhes<- Dhes* thetai^2 + thetai *  Dltheta
}

gradient <- apply(scoreiid,2,sum)
hessian <-  crossprod(thetaX,thetaX*c(Dhes))
hess2 <-    crossprod(scoreiid)

val <- list(id=xx$id,score.iid=scoreiid,logl.iid=logliid,ploglik=ploglik,
            gradient=-gradient,hessian=hessian,hess2=hess2)

if (all) return(val);

with(val, structure(-ploglik,gradient=-gradient,hessian=hessian))
}

  opt <- NULL
  if (no.opt==FALSE) {
      if (tolower(method)=="nr") {
          opt <- lava::NR(theta,obj,...)
          opt$estimate <- opt$par
      } else {
          opt <- nlm(obj,theta,...)
          opt$method <- "nlm"
      }
      cc <- opt$estimate;
###   names(cc) <- colnames(theta.des)
      val <- c(list(coef=cc),obj(opt$estimate,all=TRUE))
  } else val <- c(list(coef=theta),obj(theta,all=TRUE))
  val$score <- val$gradient


  theta <- matrix(c(val$coef),length(c(val$coef)),1)
  if (!is.null(colnames(theta.des))) thetanames <- colnames(theta.des) else thetanames <- paste("dependence",1:length(c(theta)),sep="")
  if (length(thetanames)==length(c(theta))) { rownames(theta) <- thetanames; rownames(var.theta) <- colnames(var.theta) <- thetanames; }

  hessianI <- solve(val$hessian)
  val$theta.iid.naive  <-  val$score.iid %*% hessianI

  ### iid due to Marginal model
  biid <- 1
  if (biid==1) {

  if ((margsurv$no.opt) | is.null(margsurv$coef)) fixbeta<- 1 else fixbeta <- 0

   id  <-  xx$id+1
   if (var.link==1) epar <- c(exp(theta)) else epar <- c(theta)

   thetaX<- as.matrix(theta.des[firstid,,drop=FALSE])
   thetav <- c(as.matrix(theta.des) %*%  c(epar))

   Hs <- c(sumstrata(H*exp(thetav*H),xx$id,mid))
   R <- c(sumstrata((exp(thetav*H)-1),xx$id,mid)) + 1
   H2 <- c(sumstrata(H^2*exp(thetav*H),xx$id,mid) )

   Ft <- ((1/(thetav*R[id]))*exp(thetav*H)-(1/thetav+Ni.tau[id])*(1+thetav*H)*exp(thetav*H)/R[id]+statusxx+(1+thetav*Ni.tau[id])*exp(thetav*H)*Hs[id]/R[id]^2)

  if (var.link==1) {
	Ft  <-  Ft * c(exp(theta.des %*% val$coef));
  }
  Gt <- c(RR *Ft * xx$sign * weightsid)
  Ft  <- c( Ft * H * weightsid )
  Fbeta <-  t(theta.des) %*% ( xx$X * Ft )
  Gbase <- apply(Gt* theta.des,2,revcumsumstrata,xx$strata,xx$nstrata)

  ### beta part
  if (fixbeta==0) {
	  Z <- xx$X
	  U <- E <- matrix(0,nrow(xx$X),margsurv$p)
	  E[xx$jumps+1,] <- margsurv$E
	  U[xx$jumps+1,] <- margsurv$U
          invhess <- -solve(margsurv$hessian)
	  S0i <- rep(0,length(xx$strata))
	  S0i[xx$jumps+1] <- 1/margsurv$S0
	  cumhaz <- c(cumsumstrata(S0i,xx$strata,xx$nstrata))
	  EdLam0 <- apply(E*S0i,2,cumsumstrata,xx$strata,xx$nstrata)
	  rr <- c(xx$sign*exp(Z %*% coef(margsurv) + xx$offset))
	  ### Martingale  as a function of time and for all subjects to handle strata
	  MGt <- U[,drop=FALSE]-(Z*cumhaz-EdLam0)*rr*c(xx$weights)
          UUbeta <- apply(MGt,2,sumstrata,id-1,max(id))
	  UUbeta <- UUbeta %*% invhess
	  GbaseEdLam0 <- t(Gbase) %*% (E*S0i)
	  Fbeta   <-  Fbeta -  GbaseEdLam0
          Fbetaiid <- UUbeta %*% t(Fbeta)
  }

  ###  \int_0^\tau (GBase(s) / S_0(s)) dN_i(s) - dLamba_i(s)
  xx <- margsurv$cox.prep
  S0i <- rep(0,length(xx$strata))
  S0i[xx$jumps+1] <- 1/margsurv$S0
  S0i2[xx$jumps+1] <- 1/margsurv$S0^2

  GbasedN <- Gbase*S0i
  GbasedLam0 <- apply(Gbase*S0i2,2,cumsumstrata,xx$strata,xx$nstrata)

  if (fixbeta==0) rr <- c(xx$sign*exp(Z %*% coef(margsurv) + xx$offset)) else rr <- c(xx$sign*exp( xx$offset))
  MGt <- (GbasedN[,drop=FALSE]-GbasedLam0*rr)*c(xx$weights)
  MGti <- apply(MGt,2,sumstrata,id-1,max(id))

###  if (fixbeta==0) print(cbind(val$score.iid,MGti,Fbetaiid)) else print(cbind(val$score.iid,MGti))

  if (fixbeta==0) MGti <- MGti+Fbetaiid
  theta.iid <-  val$score.iid + MGti
  theta.iid  <-  theta.iid %*% hessianI
  ## take names from phreg call
  if (!is.null(margsurv$id)) rownames(theta.iid) <- unique(margsurv$id)
  val$theta.iid <- theta.iid

  }

  ### if se.cluster is there we cluster iid-decomp before squaring 
  if (!is.null(se.cluster)) 
	  if (length(se.cluster)!=length(clusters)) stop("Length of seclusters and clusters must be same\n");

  if (!is.null(se.cluster)) {
      iids <-  unique(se.cluster);
      nseclust  <- length(iids);
      if (is.numeric(se.cluster)) se.cluster <-  fast.approx(iids,se.cluster)-1
       else se.cluster <- as.integer(factor(se.cluster, labels = seq(nseclust)))-1

      val$theta.iid <-       apply(val$theta.iid,se.cluster,nseclust)
      val$theta.iid.naive <- apply(val$theta.iid.naive,se.cluster,nseclust)
  }

  var <- robvar.theta  <-  var.theta  <-  crossprod(val$theta.iid)
  naive.var  <-   crossprod(val$theta.iid.naive)

  val  <- c(val,list(theta=theta,var.theta=var.theta,robvar.theta=robvar.theta,
		     var=var,thetanames=thetanames,model="clayton.oakes",
	             se=diag(robvar.theta)^.5),
	             var.naive=naive.var)
  class(val) <- "mets.twostage"
  attr(val,"clusters") <- clusters
  attr(val,"secluster") <- c(se.cluster)
  attr(val,"var.link")<-var.link;
  attr(val,"ptheta")<-ptheta
  attr(val,"n")<-n ;
  attr(val,"response")<- "survival"
  attr(val,"additive-gamma")<-0
  attr(val,"twostage") <- "two.stage"

  return(val)
}

##' @title Survival model for multivariate survival data
##'
##' @description
##' Fits additive gamma frailty model
##' with additive hazard condtional on the random effects
##' \deqn{
##' \lambda_{ij} = (V_{ij}^T Z) (X_{ij}^T \alpha(t))
##' }
##' The baseline \eqn{\alpha(t)} is profiled out using
##' marginal modelling adjusted for the random effects structure as in Eriksson and Scheike (2015).
##' One advantage of the standard frailty model is that one can deal with competing risks
##' for this model.
##'
##' For all models the
##' standard errors do not reflect this uncertainty of the baseline estimates, and might therefore be a bit to small.
##' To remedy this one can do bootstrapping.
##'
##' If clusters contain more than two times, we use a composite likelihood
##' based on the pairwise bivariate models. Can also fit a additive gamma random
##' effects model described in detail below.
##'
##' We allow a regression structure for the indenpendent gamma distributed
##' random effects  and their variances that may depend on cluster covariates. So
##' \deqn{
##'  \theta = z_j^T \alpha
##' }
##' where \eqn{z} is specified by theta.des
##' The reported standard errors are based on the estimated information from the
##' likelihood assuming that the marginals are known.
##'
##' Can also fit a structured additive gamma random effects model, such
##' as the ACE, ADE model for survival data.
##'
##' Now random.design specificies the random effects for each subject within a cluster. This is
##' a matrix of 1's and 0's with dimension n x d.  With d random effects.
##' For a cluster with two subjects, we let the random.design rows be
##'  \eqn{v_1} and \eqn{v_2}.
##' Such that the random effects for subject
##' 1 is \deqn{v_1^T (Z_1,...,Z_d)}, for d random effects. Each random effect
##' has an associated parameter \eqn{(\lambda_1,...,\lambda_d)}.
##' By construction subjects 1's random effect are Gamma distributed with
##' mean \eqn{\lambda_j/v_1^T \lambda}
##' and variance \eqn{\lambda_j/(v_1^T \lambda)^2}. Note that the random effect
##' \eqn{v_1^T (Z_1,...,Z_d)} has mean 1 and variance \eqn{1/(v_1^T \lambda)}.
##' It is here asssumed that  \eqn{lamtot=v_1^T \lambda} is fixed over all clusters
##' as it would be for the ACE model below.
##' The lamtot parameter may be specified separately for some sets of the parameter
##' is the additive.gamma.sum (ags) matrix is specified and then lamtot for the
##' j'th random effect is \eqn{ags_j^T \lambda}.
##'
##' Based on these parameters the relative contribution (the heritability, h) is
##' equivalent to  the expected values of the random effects  \eqn{\lambda_j/v_1^T \lambda}
##'
##' The DEFAULT parametrization uses the variances of the random effecs
##' \deqn{
##' \theta_j  = \lambda_j/(v_1^T \lambda)^2
##' }
##' For alternative parametrizations one can specify how the parameters relate to \eqn{\lambda_j}
##' with the function
##'
##' Given the random effects the survival distributions with a cluster are independent and
##' on the form
##' \deqn{
##' P(T > t| x,z) = exp( -Z  A(t) \exp( Z^t beta))
##' }
##'
##' The parameters \eqn{(\lambda_1,...,\lambda_d)}
##' are related to the parameters of the model
##' by a regression construction \eqn{pard} (d x k), that links the \eqn{d}
##' \eqn{\lambda} parameters
##' with the (k) underlying \eqn{\theta} parameters
##' \deqn{
##' \lambda = theta.des  \theta
##' }
##' here using theta.des to specify these low-dimension association. Default is a diagonal matrix.
##'
##' The case.control option that can be used with the pair specification of the pairwise parts
##' of the estimating equations. Here it is assumed that the second subject of each pair is the
##' proband.
##' @keywords survival
##' @author Thomas Scheike
##' @param margsurv Marginal model
##' @param data data frame
##' @param method Scoring method "nr", "nlminb", "optimize", "nlm"
##' @param Nit Number of iterations
##' @param detail Detail
##' @param clusters Cluster variable
##' @param silent Debug information
##' @param weights Weights
##' @param control Optimization arguments
##' @param theta Starting values for variance components
##' @param theta.des design for dependence parameters, when pairs are given this is could be a
##' (pairs) x (numer of parameters)  x (max number random effects) matrix
##' @param var.link Link function for variance
##' @param iid Calculate i.i.d. decomposition
##' @param step Step size
##' @param model model
##' @param marginal.trunc marginal left truncation probabilities
##' @param marginal.survival optional vector of marginal survival probabilities
##' @param marginal.status related to marginal survival probabilities
##' @param strata strata for fitting, see example
##' @param se.clusters for clusters for se calculation with iid
##' @param max.clust max se.clusters for se calculation with iid
##' @param numDeriv to get numDeriv version of second derivative, otherwise uses sum of squared score
##' @param random.design random effect design for additive gamma model, when pairs are given this is
##' a (pairs) x (2) x (max number random effects) matrix, see pairs.rvs below
##' @param pairs matrix with rows of indeces (two-columns) for the pairs considered in the pairwise
##' composite score, useful for case-control sampling when marginal is known.
##' @param pairs.rvs for additive gamma model and random.design and theta.des are given as arrays,
##' this specifice number of random effects for each pair.
##' @param numDeriv.method uses simple to speed up things and second derivative not so important.
##' @param additive.gamma.sum for two.stage=0, this is specification of the lamtot in the models via
##' a matrix that is multiplied onto the parameters theta (dimensions=(number random effects x number
##' of theta parameters), when null then sums all parameters.
##' @param var.par is 1 for the default parametrization with the variances of the random effects,
##' var.par=0 specifies that the \eqn{\lambda_j}'s are used as parameters.
##' @param cr.models competing risks models for two.stage=0, should be given as a list with models for each cause
##' @param case.control assumes case control structure for "pairs" with second column being the probands,
##' when this options is used the twostage model is profiled out via the paired estimating equations for the
##' survival model.
##' @param ascertained if the pair are sampled only when there is an event. This is in contrast to
##' case.control sampling where a proband is given. This can be combined with control probands. Pair-call
##' of twostage is needed  and second column of pairs are the first jump time with an event for ascertained pairs,
##' or time of control proband.
##' @param shut.up to make the program more silent in the context of iterative procedures for case-control
##' and ascertained sampling
survival.iterative <- function(margsurv,data=parent.frame(),method="nr",Nit=60,detail=0,clusters=NULL,
             silent=1,weights=NULL, control=list(),theta=NULL,theta.des=NULL,
             var.link=1,iid=1,step=0.5,model="clayton.oakes",
             marginal.trunc=NULL,marginal.survival=NULL,marginal.status=NULL,strata=NULL,
             se.clusters=NULL,max.clust=NULL,numDeriv=0,random.design=NULL,pairs=NULL,pairs.rvs=NULL,numDeriv.method="simple",
             additive.gamma.sum=NULL,var.par=1,cr.models=NULL,case.control=0,ascertained=0,shut.up=0)
{ #
# seting up design and variables
two.stage <- 0;
rate.sim <- 1; sym=1; var.func <- NULL
if (model=="clayton.oakes" || model=="gamma") dep.model <- 1
else if (model=="plackett" || model=="or") dep.model <- 2
else stop("Model must by either clayton.oakes or plackett \n");
start.time <- NULL; ptrunc <- NULL; psurvmarg <- NULL; status <- NULL
fix.baseline <- 0;
convergence.bp <- 1;  ### to control if baseline profiler converges
if ((!is.null(margsurv)) | (!is.null(marginal.survival))) fix.baseline <- 1


if (!is.null(margsurv)) {
if (inherits(margsurv,c("aalen","cox.aalen"))) { #
	 formula<-attr(margsurv,"Formula");
	 beta.fixed <- attr(margsurv,"beta.fixed")
	 if (is.null(beta.fixed)) beta.fixed <- 1;
	 ldata<- timereg::aalen.des(formula,data=data,model="cox.aalen");
	 id <- attr(margsurv,"id");
	 mclusters <- attr(margsurv,"cluster.call")
	 X<-ldata$X;
	 time<-ldata$time2;
	 Z<-ldata$Z;
	 status<-ldata$status;
	 time2 <- attr(margsurv,"stop");
	 start.time <- attr(margsurv,"start")
	 antpers<-nrow(X);
         if (is.null(Z)==TRUE) {npar<-TRUE; semi<-0;}  else { Z<-as.matrix(Z); npar<-FALSE; semi<-1;}
         if (npar==TRUE) {Z<-matrix(0,antpers,1); pz<-1; fixed<-0;} else {fixed<-1;pz<-ncol(Z);}
	 px<-ncol(X);

         if (is.null(clusters) && is.null(mclusters)) stop("No cluster variabel specified in marginal or twostage call\n");
         if (is.null(clusters)) clusters <- mclusters
###	 else if (sum(abs(clusters-mclusters))>0)
###         cat("Warning: Clusters for marginal model different than those specified for two.stage\n");

###         if (!is.null(attr(margsurv,"max.clust")))
###         if ((attr(margsurv,"max.clust")< attr(margsurv,"orig.max.clust")) && (!is.null(mclusters)))
###		  cat("Warning: Probably want to estimate marginal model with max.clust=NULL\n");

	 if (nrow(X)!=length(clusters)) stop("Length of Marginal survival data not consistent with cluster length\n");
#
} else { ### coxph #
	  notaylor <- 1
	  antpers <- margsurv$n
	  id <- 0:(antpers-1)
	  mt <- model.frame(margsurv)
	  Y <- model.extract(mt, "response")
	  if (!inherits(Y, "Surv")) stop("Response must be a survival object")
	  if (attr(Y, "type") == "right") {
	      time2 <- Y[, "time"];
	      status <- Y[, "status"]
		start.time <- rep(0,antpers);
		} else {
		 start.time <- Y[, 1];
		 time2 <- Y[, 2];
		 status <- Y[, 3];
		}
###	   Z <- na.omit(model.matrix(margsurv)[,-1]) ## Discard intercept
	   Z <- matrix(1,antpers,length(coef(margsurv)));

	   if (is.null(clusters)) stop("must give clusters for coxph\n");
	   cluster.call <- clusters;
	   X <- matrix(1,antpers,1); ### Z <- matrix(0,antpers,1); ### no use for these
	   px <- 1; pz <- ncol(Z);
	   start <- rep(0,antpers);
	   beta.fixed <- 0
	   semi <- 1
###	   start.time <- 0
} #
} else { start.time<- 0}

###print("00"); print(summary(psurvmarg)); print(summary(ptrunc))

  if (any(abs(start.time)>0)) lefttrunk <- 1 else lefttrunk <- 0
  if (!is.null(start.time)) {
      if (any(abs(start.time)>0)) lefttrunk <- 1  else lefttrunk <- 0;
  } else lefttrunk <- 0

if (!is.null(margsurv))  {
  if (inherits(margsurv,c("aalen","cox.aalen")))  { #
         resi <- timereg::residualsTimereg(margsurv,data=data)
         RR  <- resi$RR
	 psurvmarg <- exp(-resi$cumhaz);
         ptrunc <- rep(1,length(psurvmarg));
	 if (lefttrunk==1) ptrunc <- exp(-resi$cumhazleft);
  } #
  else if (inherits(margsurv,"coxph")) {  #
	### some problems here when data is different from data used in margsurv
       notaylor <- 1
       residuals <- residuals(margsurv)
       cumhaz <- status-residuals
       psurvmarg <- exp(-cumhaz);
       cumhazleft <- rep(0,antpers)
       ptrunc <- rep(1,length(psurvmarg));
       RR<- exp(margsurv$linear.predictors-sum(margsurv$means*coef(margsurv)))
        if ((lefttrunk==1)) {
         stop("Use cox.aalen function for truncation case \n");
         baseout <- survival::basehaz(margsurv,centered=FALSE);
         cum <- cbind(baseout$time,baseout$hazard)
	 cum <- cpred(cum,start.time)[,2]
	 ptrunc <- exp(-cum * RR)
	}
  } #
}

  antpers <- nrow(data); ## mydim(marginal.survival)[1]
  RR <-  rep(1,antpers);

  if (is.null(psurvmarg)) psurvmarg <- rep(1,antpers);
  if (!is.null(marginal.survival)) psurvmarg <- marginal.survival
  if (!is.null(marginal.trunc)) ptrunc <- marginal.trunc
  if (is.null(ptrunc)) ptrunc <- rep(1,length(psurvmarg))

###  print(summary(psurvmarg)); print(summary(ptrunc))

  if (!is.null(marginal.status)) status <- marginal.status
  if (is.null(status) & is.null(cr.models)) stop("must give status variable for survival via either margninal model (margsurv), marginal.status or as cr.models \n");

  if (is.null(weights)==TRUE) weights <- rep(1,antpers);
  if (is.null(strata)==TRUE) strata<- rep(1,antpers);
  if (length(strata)!=antpers) stop("Strata must have length equal to number of data points \n");

  # cluster set up
  cluster.call <- clusters
  out.clust <- cluster.index(clusters);
  clusters <- out.clust$clusters
  maxclust <- out.clust$maxclust
  antclust <- out.clust$cluster.size
  clusterindex <- out.clust$idclust
  clustsize <- out.clust$cluster.size
  call.secluster <- se.clusters

  if (is.null(se.clusters)) { se.clusters <- clusters; antiid <- nrow(clusterindex);} else  {
      iids <-  unique(se.clusters);
      antiid <- length(iids);
      if (is.numeric(se.clusters)) se.clusters <-  fast.approx(iids,se.clusters)-1
       else se.clusters <- as.integer(factor(se.clusters, labels = seq(antiid)))-1
  }
  if (length(se.clusters)!=length(clusters)) stop("Length of seclusters and clusters must be same\n");

  if ((!is.null(max.clust))) if (max.clust< antiid) {
        coarse.clust <- TRUE
	qq <- unique(quantile(se.clusters, probs = seq(0, 1, by = 1/max.clust)))
	qqc <- cut(se.clusters, breaks = qq, include.lowest = TRUE)
	se.clusters <- as.integer(qqc)-1
	max.clusters <- length(unique(se.clusters))
	maxclust <- max.clust
	antiid <- max.clusters
  }
  #

###  pxz <- px + pz;

   if (!is.null(random.design)) { ### different parameters for Additive random effects 
     dep.model <- 3

###  if (is.null(random.design)) random.design <- matrix(1,antpers,1);
     dim.rv <- ncol(random.design);
     if (is.null(theta.des)) theta.des<-diag(dim.rv);
 } else { random.design <- matrix(0,1,1);  dim.rv <- 1;
           additive.gamma.sum <- matrix(1,1,1);
   }

  if (is.null(theta.des)) ptheta<-1;
  if (is.null(theta.des)) theta.des<-matrix(1,antpers,ptheta); ###  else theta.des<-as.matrix(theta.des);

  if (length(dim(theta.des))==3) ptheta<-dim(theta.des)[2] else if (length(dim(theta.des))==2) ptheta<-ncol(theta.des)
  if (nrow(theta.des)!=antpers & dep.model!=3 ) stop("Theta design does not have correct dim");

  if (length(dim(theta.des))!=3) theta.des <- as.matrix(theta.des)

  if (is.null(theta)==TRUE) {
         if (var.link==1) theta<- rep(-0.7,ptheta);
         if (var.link==0) theta<- rep(exp(-0.7),ptheta);
  }

  if (length(theta)!=ptheta) {
         theta<-rep(theta[1],ptheta);
  }
  theta.score<-rep(0,ptheta);Stheta<-var.theta<-matrix(0,ptheta,ptheta);

  if (maxclust==1) stop("No clusters, maxclust size=1\n");

  antpairs <- 1; ### to define
  if (is.null(additive.gamma.sum)) additive.gamma.sum <- matrix(1,dim.rv,ptheta)

  if (!is.null(pairs)) { pair.structure <- 1;} else  pair.structure <- 0;

  if (pair.structure==1 & dep.model==3) { #
### something with dimensions of rv.des theta.des
       antpairs <- nrow(pairs);
       if ( (length(dim(theta.des))!=3)  & (length(dim(random.design))==3) )
       {
          Ptheta.des <- array(0,c(nrow(theta.des),ncol(theta.des),antpairs))
          for (i in 1:antpairs) Ptheta.des[,,i] <- theta.des
       theta.des <- Ptheta.des
       }
       if ( (length(dim(theta.des))==3)  & (length(dim(random.design))!=3) )
       {
           rv.des <- array(0,c(2,ncol(random.design),antpairs))
           for (i in 1:antpairs) {
		   rv.des[1,,i] <- random.design[pairs[i,1],]
		   rv.des[2,,i] <- random.design[pairs[i,2],]
	   }
       random.design <- rv.des
       }
       if ( (length(dim(theta.des))!=3)  & (length(dim(random.design))!=3) )
       {
          Ptheta.des <- array(0,c(nrow(theta.des),ncol(theta.des),antpairs))
          rv.des <- array(0,c(2,ncol(random.design),antpairs))
          for (i in 1:antpairs) {
		   rv.des[1,,i] <- random.design[pairs[i,1],]
		   rv.des[2,,i] <- random.design[pairs[i,2],]
                   Ptheta.des[,,i] <- theta.des
	   }
       theta.des <- Ptheta.des
       random.design <- rv.des
       }
       if (max(pairs)>antpers) stop("Indices of pairs should refer to given data \n");
       if (is.null(pairs.rvs)) pairs.rvs <- rep(dim(random.design)[2],antpairs)
       clusterindex <- pairs-1;
  } #

  if (pair.structure==1 & dep.model!=3) {
       clusterindex <- pairs-1;
       antpairs <- nrow(pairs);
       pairs.rvs <- 1
  }

  #

###  setting up arguments for Aalen baseline profile estimates
 if (fix.baseline==0)  { #  when baseline is estimated when baseline is estimated
 if (is.null(cr.models)) stop("give hazard models for different causes, ex cr.models=list(Surv(time,status==1)~+1,Surv(time,status==2)~+1) \n")

      if (case.control==0 & ascertained==0) { #
        # setting up random effects and covariates for marginal modelling
        timestatus <- all.vars(cr.models[[1]])
	times <- data[,timestatus[1]]
	if (is.null(status)) status <- data[,timestatus[2]]
	lstatus <- data[,timestatus[2]]
	### organize increments according to overall jump-times
	jumps <- lstatus!=0
	dtimes <- times[jumps]
	st <- order(dtimes)
	dtimesst <- dtimes[st]
	dcauses <- lstatus[jumps][st]
	ids <- (1:nrow(data))[jumps][st]

        nc <- 0
	for (i in 1:length(cr.models)) {
              a2 <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data)
	      X <- a2$X
	      nc <- nc+ncol(X)
	}
	dBaalen <- matrix(0,length(dtimes),nc)
	xjump <- array(0,c(length(cr.models),nc,length(ids)))

	## first compute marginal aalen models for all causes
	a <- list(); da <- list();
	for (i in 1:length(cr.models)) {
	      a[[i]] <-  timereg::aalen(as.formula(cr.models[[i]]),data=data,robust=0,weights=weights)
              a2 <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data)
	      X <- a2$X
	      da[[i]] <- apply(a[[i]]$cum[,-1,drop=FALSE],2,diff)
	      jumpsi <- (1:length(dtimes))[dcauses==i]
	      if (i==1) fp <- 1
	      indexc <- fp:(fp+ncol(X)-1)
	      dBaalen[jumpsi,indexc] <- da[[i]]
              xjump[i,indexc,] <- t(X[ids,])
	      fp <- ncol(X)+1
        }

	#

       	#### organize subject specific random variables and design
        ###  for additive gamma model
	#
	dimt <- dim(theta.des[,,1,drop=FALSE])[-3]
	dimr <- dim(random.design[,,,drop=FALSE])
	mtheta.des <- array(0,c(dimt,nrow(data)))
	mrv.des <- array(0,c(dimr[1]/2,dimr[2],nrow(data)))
	nrv.des <- rep(0,nrow(data))
	nrv.des[pairs[,1]] <- pairs.rvs
	nrv.des[pairs[,2]] <- pairs.rvs
	mtheta.des[,,pairs[,1]] <- theta.des
	mtheta.des[,,pairs[,2]] <- theta.des
	mrv.des[,,pairs[,1]] <- random.design[1:(dimr[1]/2),,,drop=FALSE]
	mrv.des[,,pairs[,2]] <- random.design[(dimr[1]/2+1):dimr[1],,,drop=FALSE]
	### array thetades to jump times (subjects)
	mtheta.des <- mtheta.des[,,ids,drop=FALSE]
	### array randomdes to jump times (subjects)
	mrv.des <- mrv.des[,,ids,drop=FALSE]
	nrv.des <- pairs.rvs[ids]
	#

      } #

      if (case.control==1 || ascertained==1) { #

         data1 <-        data[pairs[,1],]
         data.proband <- data[pairs[,2],]
	 weights1 <-     weights[pairs[,1]]

# setting up designs for jump times
        timestatus <- all.vars(cr.models[[1]])
        if (is.null(status)) status <- data[,timestatus[2]]
	alltimes      <- data[,timestatus[1]]
	times         <- data1[,timestatus[1]]
	lstatus       <- data1[,timestatus[2]]
	timescase     <- data.proband[,timestatus[1]]
	lstatuscase   <- data.proband[,timestatus[2]]
	### organize increments according to overall jump-times
	jumps         <- lstatus!=0
	dtimes        <- times[jumps]
	dtimescase    <- timescase[jumps]
	st            <- order(dtimes)
	dtimesst      <- dtimes[st]
	dtimesstcase  <- dtimescase[st]
	dcauses       <- lstatus[jumps][st]
	dcausescase   <- lstatuscase[jumps][st]
	ids           <- (1:nrow(data1))[jumps][st]
	### delayed entry for case because of ascertained sampling
	### controls are however control probands, and have entry=0
	entry <- timescase*lstatuscase
	data1$entry <- entry
	cr.models2 <- list()
	if (ascertained==1) {
           for (i in 1:length(cr.models)) {
	      cr.models2[[i]] <- update(cr.models[[i]],as.formula(paste("Surv(entry,",timestatus[1],",",timestatus[2],")~.",sep="")))
	    }
	} else cr.models2 <- cr.models

        nc <- 0
	for (i in 1:length(cr.models)) {
              X <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data1)$X
	      nc <- nc+ncol(X)
	}
	dBaalen    <- matrix(0,length(dtimes),nc)
	xjump      <- array(0,c(length(cr.models),nc,length(ids)))
	xjumpcase  <- array(0,c(length(cr.models),nc,length(ids)))

	## first compute marginal aalen models for all causes
	a <- list(); da <- list();
        ### starting values for iteration
        Bit <- Bitcase <- c()
	for (i in 1:length(cr.models)) { #
	      a[[i]]  <-  timereg::aalen(as.formula(cr.models2[[i]]),data=data1,robust=0,weights=weights1)
	      da[[i]] <- apply(a[[i]]$cum[,-1,drop=FALSE],2,diff)
	      jumpsi  <- (1:length(dtimes))[dcauses==i]
              X       <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data1)$X
              Xcase   <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data.proband)$X
	      if (i==1) fp <- 1
	      indexc  <- fp:(fp+ncol(X)-1)
	      dBaalen[jumpsi,indexc] <- da[[i]]
              xjump[i,indexc,]       <- t(X[ids,])
              xjumpcase[i,indexc,]   <- t(Xcase[ids,])
	      fp <- fp+ncol(X)
	      ### starting values
	      Bit <- cbind(Bit,cpred(a[[i]]$cum,dtimesst)[,-1,drop=FALSE])
       } #
       Bit.ini <- Bit
       #

####  organize subject specific random variables and design
####  already done in basic pairwise setup
	mtheta.des <- theta.des[,,ids,drop=FALSE]
	### array randomdes to jump times (subjects)
	mrv.des <- random.design[,,ids,drop=FALSE]
	nrv.des <- pairs.rvs[ids]
      } #

 }  else {
	 mrv.des <- array(0,c(1,1,1)); mtheta.des <- array(0,c(1,1,1)); margthetades <- array(0,c(1,1,1));
	 xjump <- array(0,c(1,1,1)); dBaalen <- matrix(0,1,1); nrv.des <- 3
 } #

  loglike <- function(par)
  { #

     if (pair.structure==0 | dep.model!=3) Xtheta <- as.matrix(theta.des) %*% matrix(c(par),nrow=ptheta,ncol=1);
     if (pair.structure==1 & dep.model==3) Xtheta <- matrix(0,antpers,1); ## not needed

      if (var.link==1 & dep.model==3) epar <- c(exp(par)) else epar <- c(par)
      partheta <- epar

      if (var.par==1 & dep.model==3) {
	 ## from variances to
	 if (is.null(var.func)) {
	    sp <- sum(epar)
	    partheta <- epar/sp^2
         } else partheta <- epar ## par.func(epar)
      }


      if (pair.structure==0) {
	      outl<-.Call("twostageloglikeRV", # only two stage model for this option
	      icause=status,ipmargsurv=psurvmarg,
	      itheta=c(partheta),
	      ithetades=theta.des,
	      icluster=clusters,iclustsize=clustsize,iclusterindex=clusterindex,
	      ivarlink=var.link,iid=iid,iweights=weights,isilent=silent,idepmodel=dep.model,
	      itrunkp=ptrunc,istrata=as.numeric(strata),iseclusters=se.clusters,iantiid=antiid,
	      irvdes=random.design,iags=additive.gamma.sum,iascertained=ascertained,
              PACKAGE="mets") #
      }
      else { ## pair-structure
	     ## twostage model,    case.control option,   profile out baseline
	     ## conditional model, case.control option,   profile out baseline
      if (two.stage==1) { # two-stage model

         if (fix.baseline==0) ## if baseline is not given
         {
          cum1 <- cbind(dtimesst,Bit)
          if ( (case.control==1 || ascertained==1) & (convergence.bp==1)) { #  profiles out baseline under case-control/ascertainment sampling

             ###  initial values , only one cr.model for survival
             if (detail>1) plot(dtimesst,Bit,type="l",main="Bit")

             if (ncol(Bit)==0) Bit <- Bit.ini
             Bitcase  <- cpred(cbind(dtimesst,Bit),dtimesstcase)[,-1,drop=FALSE]
             Bitcase <- .Call("MatxCube",Bitcase,dim(xjumpcase),xjumpcase,PACKAGE="mets")$X

             for (j in 1:5) { # profile via iteration
                   cncc <- .Call("BhatAddGamCC",1,dBaalen,dcauses,dim(xjump),xjump,
		                 c(partheta),dim(mtheta.des),mtheta.des,additive.gamma.sum,var.link,
				 dim(mrv.des),mrv.des,nrv.des,1,Bit,Bitcase,dcausescase,PACKAGE="mets")
		   d <- max(abs(Bit-cncc$B))
		   if (detail>1) print(d)
		   Bit <- cncc$B
		   if (detail>1) print(summary(cncc$caseweights))
		   cum1 <- cbind(dtimesst,cncc$B)
		   Bitcase  <-cbind(cpred(cum1,dtimesstcase)[,-1])
		   if (detail>1) lines(dtimesst,Bit,col=j+1);
		   if (is.na(d)) {
			  if (shut.up==0) cat("Baseline profiler gives missing values\n");
		          Bit <- Bit.ini; cum1 <- cbind(dtimesst,Bit); convergence.bp <<- 0; break;
		   }
		   Bitcase <- .Call("MatxCube",Bitcase,dim(xjumpcase),xjumpcase,PACKAGE="mets")$X
		   if (d<0.00001) break;
           } #


	   nulrow    <- rep(0,ncol(Bit)+1)
	   pbases    <- cpred(rbind(nulrow,cbind(dtimesst,Bit)),alltimes)[,-1,drop=FALSE]
           X         <- timereg::aalen.des(as.formula(cr.models[[1]]),data=data)$X
	   psurvmarg <- exp(-apply(X*pbases,1,sum))  ## psurv given baseline
	   if (ascertained==1) {
              Xcase     <- timereg::aalen.des(as.formula(cr.models[[1]]),data=data.proband)$X
              X         <- timereg::aalen.des(as.formula(cr.models[[1]]),data=data1)$X
	      pba.case  <- cpred(rbind(nulrow,cbind(dtimesst,Bit)),entry)[,-1,drop=FALSE]
	      ptrunc    <- rep(0,nrow(data))
	      ### for control probands ptrunc=1, thus no adjustment
	      ptrunc[pairs[,1]]     <- exp(-apply(X*    pba.case,1,sum)*lstatuscase) ## delayed entry at time of ascertainment proband
 	      ptrunc[pairs[,2]]     <- exp(-apply(Xcase*pba.case,1,sum)*lstatuscase)
	   }

          } #
          }


          outl<-.Call("twostageloglikeRVpairs", #
          icause=status,ipmargsurv=psurvmarg,
          itheta=c(partheta),
	  ithetades=theta.des,
          icluster=clusters,iclustsize=clustsize,iclusterindex=clusterindex,
          ivarlink=var.link,iiid=iid,iweights=weights,isilent=silent,idepmodel=dep.model,
          itrunkp=ptrunc,istrata=as.numeric(strata),iseclusters=se.clusters,iantiid=antiid,
          irvdes=random.design, iags=additive.gamma.sum,
	  iascertained=ascertained,PACKAGE="mets")
	  #

          if (fix.baseline==0)  {
              outl$baseline <- cum1;
	      outl$marginal.surv <- psurvmarg;
	      outl$marginal.trunc <- ptrunc
	  }

      } #
      else { #  survival model two.stage==0

	 entry.cause  <- rep(0,nrow(data))

         ### update aalen type baseline
	 if (fix.baseline==0)  { #

         if ((case.control==1 || ascertained==1) & (convergence.bp==1)) { #  profiles out baseline under case-control/ascertainment sampling

	     if (detail>1) print(summary(Bit))
             if (detail>1) matplot(dtimesst,Bit,type="l",main="Bit",ylim=c(0,2))
             if (ncol(Bit)==0) Bit <- Bit.ini
             Bitcase  <- cpred(cbind(dtimesst,Bit),dtimesstcase)[,-1,drop=FALSE]
             Bitcase <- .Call("MatxCube",Bitcase,dim(xjumpcase),xjumpcase,PACKAGE="mets")$X

            for (j in 1:10) { # profile via iteration
                   profile.baseline <- .Call("BhatAddGamCC",0,dBaalen,dcauses,dim(xjump),xjump,
	                               c(partheta), dim(mtheta.des),mtheta.des, additive.gamma.sum,var.link,
	                               dim(mrv.des),mrv.des,nrv.des,1,Bit,Bitcase,dcausescase,PACKAGE="mets")
                   d <- max(abs(Bit-profile.baseline$B))
		   Bit <- profile.baseline$B
		   cum1 <- cbind(dtimesst,Bit)
		   Bitcase  <-cbind(cpred(cum1,dtimesstcase)[,-1])

		   if (detail>1) matlines(dtimesst,Bit,col=j+1);
		   if (is.na(d)) {
			  if (shut.up==0) cat("Baseline profiler gives missing values\n");
		          Bit <- Bit.ini; cum1 <- cbind(dtimesst,Bit); convergence.bp <<- 0; break;
		   }
		   Bitcase <- .Call("MatxCube",Bitcase,dim(xjumpcase),xjumpcase,PACKAGE="mets")$X
                   if (d<0.00001) break;
           } #


           ### makes cumulative hazard for all subjects
	   nulrow <- rep(0,ncol(Bit)+1)
	   pbases <- cpred(rbind(nulrow,cbind(dtimesst,Bit)),alltimes)[,-1,drop=FALSE]
	   psurvmarg <- c()     ### update psurvmarg
	   if (ascertained==1 || case.control==1)  { ### sets up truncation probabilities to match situation
		   pbase.case  <- cpred(rbind(nulrow,cbind(dtimesst,Bit)),timescase)[,-1,drop=FALSE]
		   ptrunc <- c() ### update ptrunc
	   }
           for (i in 1:length(cr.models)) {
                  if (i==1) fp <- 1
                  a2 <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data)
	          X <- a2$X
		  indexc <- fp:(fp+ncol(X)-1)
	          psurvmarg <- cbind(psurvmarg,apply(X*pbases[,indexc],1,sum))
	          if (ascertained==1 || case.control==1) {
                     Xcase     <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data.proband)$X
                     X         <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data1)$X
		     ptrunc.new <- rep(0,length(alltimes))
		     ## delayed entry at time of ascertainment proband, for case control no adjustment for first
		     if (ascertained==1) ptrunc.new[pairs[,1]] <- apply(X*pbase.case[,indexc,drop=FALSE],1,sum)*lstatuscase
	             else   ptrunc.new[pairs[,1]] <- 0
		     ## for second component adjustment for marginal or ascertainment
		     ptrunc.new[pairs[,2]] <- apply(Xcase*pbase.case[,indexc,drop=FALSE],1,sum)
		     ptrunc <- cbind(ptrunc,ptrunc.new)
		  }
	          fp <- fp+ncol(X)
	   }

          } #
	  else { # profile out baseline, recursive estimator


          profile.baseline  <- .Call("BhatAddGam",recursive=1,
          dBaalen,dcauses,dim(xjump),xjump,c(partheta),
	  dim(mtheta.des),mtheta.des,
	  additive.gamma.sum,0,dim(mrv.des),mrv.des,0,matrix(0,1,1),PACKAGE="mets")


	  marg.model <- "no-cox"
          if (marg.model=="cox") {
	      Bit <- profile.baseline$B
              Bit <- .Call("MatxCube",Bit,dim(xjump),xjump,PACKAGE="mets")$X
	      caseweights <- profile.baseline$caseweights

	      pp <- timereg::timereg.formula(as.formula(cr.models[[i]]))
	      profile.cox <- timereg::cox.aalen(pp,data=data,robust=0,caseweight=caseweights)
	      print(summary(profile.cox))
	      plot(profile.cox)
	  }

	   nulrow <- rep(0,ncol(dBaalen)+1)
	   pbases <- cpred(rbind(nulrow,cbind(dtimesst,profile.baseline$B)),times)[,-1,drop=FALSE]
	   psurvmarg <- c()
           for (i in 1:length(cr.models)) {
	          if (i==1) fp <- 1
                  a2 <- timereg::aalen.des(as.formula(cr.models[[i]]),data=data)
	          X <- a2$X
	          indexc <- fp:(fp+ncol(X)-1)
	          psurvmarg <- cbind(psurvmarg,apply(X*pbases[,indexc],1,sum))
	          fp <- fp+ncol(X)
	  }
          ### no truncation in this case ?
	  ptrunc <- 0*psurvmarg

	  } #

	 } #

      ### cumulative hazard for this model  when fix.baseline==1
      if (fix.baseline==1 ) {
	      psurvmarg <- -log(psurvmarg);
	      ptrunc <- -log(ptrunc);
      }


      outl<-.Call("survivalloglikeRVpairs",icause=status,ipmargsurv=as.matrix(psurvmarg),
      itheta=c(partheta),
      ithetades=theta.des,
      icluster=clusters,iclustsize=clustsize,iclusterindex=clusterindex,
      iiid=iid,iweights=weights,isilent=silent,idepmodel=dep.model,
      itrunkp=as.matrix(ptrunc),istrata=as.numeric(strata),iseclusters=se.clusters,iantiid=antiid,
      irvs=pairs.rvs,iags=additive.gamma.sum,ientry.cause=entry.cause,iascertained=(ascertained+case.control>0)*1,
      PACKAGE="mets")

      if (fix.baseline==0)  {
	      outl$baseline <- cbind(dtimesst,profile.baseline$B);
	      outl$marginal.surv <- psurvmarg;
	      outl$marginal.trunc <- ptrunc
      }

      } #
      }

    if (detail==3) print(c(partheta,outl$loglike))

    ## variance parametrization, and inverse.link
    if (dep.model==3) {
    if (var.par==1) {
         ## from variances to and with sum for all random effects
         if (is.null(var.func)) {
	 if (var.link==0)  {
             mm <- matrix(-epar*2*sp,length(epar),length(epar))
	     diag(mm) <- sp^2-epar*2*sp
	 } else {
            mm <- -c(epar) %o% c(epar)*2*sp
            diag(mm) <- epar*sp^2-epar^2*2*sp
	 }
	    mm <- mm/sp^4
	 } else mm  <- numDeriv::hessian(var.func,par)
      } else {
	   if (var.link==0) mm <- diag(length(epar)) else
			  mm <- diag(length(c(epar)))*c(epar)
      }
      }

    if (dep.model==3) {
       outl$score <-  t(mm) %*% outl$score
       outl$Dscore <- t(mm) %*% outl$Dscore %*% mm
       if (iid==1) outl$theta.iid <- t(t(mm) %*% t(outl$theta.iid))
    }

    attr(outl,"gradient") <-outl$score
    if (oout==0) ret <- c(-1*outl$loglike) else if (oout==1) ret <- sum(outl$score^2) else if (oout==2) ret <- outl else ret <- outl$score
    return(ret)
  } #

  if (method=="optimize" && ptheta!=1) {
     cat("optimize only works for d==1, score.mehod set to nlminb \n");
     method <- "nlminb";
  }

  score1 <- NULL
  theta.iid <- NULL
  logl <- NULL
  p <- theta

    if (method=="nr") { #
        oout <- 2;  ### output control for obj
        if (Nit>0)
            for (i in 1:Nit)
            {
                    out <- loglike(p)
		    ## updating starting values for cumulative baselines
		    if (fix.baseline==0) Bit <- out$baseline[,-1,drop=FALSE]
		    if (fix.baseline==1) hess <-  out$Dscore
                    ###  uses simple second derivative for computing derivative of score
		    if (numDeriv==2 || (((fix.baseline==0)) & (i==1))) {
			    oout <- 3
			    hess <- numDeriv::jacobian(loglike,p,method="simple")
			    oout <- 2
		    }
                    if (!is.na(sum(hess))) hessi <- lava::Inverse(hess) else hessi <- hess
                    if (detail==1) {#
                        cat(paste("Fisher-Scoring ===================: it=",i,"\n"));
                        cat("theta:");print(c(theta))
                        cat("loglike:");cat(c(out$loglike),"\n");
                        cat("score:");cat(c(out$score),"\n");
                        cat("hess:\n"); cat(hess,"\n");
                    }#
                    delta <- step*( hessi %*% out$score )
		    ### update p, but note that score and derivative in fact related to previous p
		    ### unless Nit=0,
	            if (Nit>0) {
                    p <- p - delta
                    theta <- p;
		    }
                    if (is.nan(sum(out$score))) break;
                    if (sum(abs(out$score))<0.00001) break;
                    if (max(theta)>20 & var.link==1) { cat("theta too large lacking convergence \n"); break; }
           }
        if (!is.nan(sum(p))) {
            if (detail==1 && iid==1) cat("iid decomposition\n");
            out <- loglike(p)
            logl <- out$loglike
            score1 <- score <- out$score
            oout <- 0;
            hess1 <- hess  <- -1*out$Dscore

            if (iid==1) { theta.iid <- out$theta.iid
                if (inherits(margsurv,"phreg")) {  #
                    ## order after time
		       D1dltheta1 <- out$D1dltheta1[xx$order+1,]
		       D2dltheta1 <- out$D1dltheta1[xx$order+1,]

		       ### baseline iid
			xx <- margsurv$cox.prep
			S0i2 <- S0i <- rep(0,length(xx$strata))
			S0i[xx$jumps+1] <-  1/margsurv$S0
			rr <- exp(margsurv$cox.prep$X  %*% margsurv$coef)
			cumhazt <- cumsumstratasum(S0i,xx$strata,xx$nstrata)$lagsum
			psurvmarg <- exp(-cumhazt*rr)
	        } #
	   }

            if (detail==1 && iid==1) cat("finished iid decomposition\n");
	    ### for profile solutions update second derivative at final
	    if (numDeriv==2 || ((fix.baseline==0))) {
                    oout <- 3
                    hess <- numDeriv::jacobian(loglike,p,method="simple")
		    oout <- 2
		    }
        }
        if (numDeriv>=1) {
            if (detail==1 ) cat("starting numDeriv for second derivative \n");
            oout <- 0;
            score2 <- numDeriv::jacobian(loglike,p)
	    score1 <- matrix(score2,ncol=1)
            oout <- 3
            hess <- numDeriv::jacobian(loglike,p,method="simple")
            if (detail==1 ) cat("finished numDeriv for second derivative \n");
        }
        if (detail==1 & Nit==0) {#
            cat(paste("Fisher-Scoring ===================: final","\n"));
            cat("theta:");print(c(p))
            cat("loglike:");cat(c(out$loglike),"\n");
            cat("score:");cat(c(out$score),"\n");
            cat("hess:\n"); cat(hess,"\n");
        }#
        if (!is.na(sum(hess))) hessi <- lava::Inverse(hess) else hessi <- diag(nrow(hess))
        #
  } else if (method=="nlminb") { # nlminb optimizer
    oout <- 0;
    if (two.stage==0) oout <- 1 ## score
    tryCatch(opt <- nlminb(theta,loglike,control=control),error=function(x) NA)
    if (detail==1) print(opt);
    if (detail==1 && iid==1) cat("iid decomposition\n");
    oout <- 2
    theta <- opt$par
    out <- loglike(opt$par)
    logl <- out$loglike
    score1 <- score <- out$score
    hess1 <- hess <- -1* out$Dscore
    if (iid==1) theta.iid <- out$theta.iid
    if (numDeriv==1) {
    if (detail==1 ) cat("numDeriv hessian start\n");
      oout <- 3; ## returns score
      hess <- numDeriv::jacobian(loglike,opt$par)
    if (detail==1 ) cat("numDeriv hessian done\n");
    }
    hessi <- lava::Inverse(hess);
  #
  } else if (method=="optimize" && ptheta==1) { #  optimizer
    oout <- 0;
    if (var.link==1) {mino <- -20; maxo <- 10;} else {mino <- 0.001; maxo <- 100;}
    tryCatch(opt <- optimize(loglike,c(mino,maxo)));
    if (detail==1) print(opt);
    opt$par <- opt$minimum
    theta <- opt$par
    if (detail==1 && iid==1) cat("iid decomposition\n");
    oout <- 2
    out <- loglike(opt$par)
    logl <- out$loglike
    score1 <- score <- out$score
    hess1 <- hess <- -1* out$Dscore
    if (numDeriv==1) {
    if (detail==1 ) cat("numDeriv hessian start\n");
      oout <- 3;  ## to get jacobian
      hess <- numDeriv::jacobian(loglike,theta)
    if (detail==1 ) cat("numDeriv hessian done\n");
    }
    hessi <- lava::Inverse(hess);
    if (iid==1) theta.iid <- out$theta.iid
  #
  } else if (method=="nlm") { # nlm optimizer
    iid <- 0; oout <- 0;
    tryCatch(opt <- nlm(loglike,theta,hessian=TRUE,print.level=detail),error=function(x) NA)
    iid <- 1;
    hess <- opt$hessian
    score <- opt$gradient
    if (detail==1) print(opt);
    hessi <- lava::Inverse(hess);
    theta <- opt$estimate
    if (detail==1 && iid==1) cat("iid decomposition\n");
    oout <- 2
    out <- loglike(opt$estimate)
    logl <- out$loglike
    score1 <- out$score
    hess1 <- out$Dscore
    if (iid==1) theta.iid <- out$theta.iid
  #
  }  else stop("method = optimize(dim=1) nlm nlminb nr\n");

# handling output
  loglikeiid <- NULL
  robvar.theta <- NULL
  likepairs <- NULL
  if (fix.baseline==1)  { marginal.surv <- psurvmarg; marginal.trunc <- ptrunc; } else { marginal.surv <- out$marginal.surv; marginal.trunc <- out$marginal.trunc;}

  if (iid==1) {
  if (dep.model==3 & pair.structure==1) likepairs <- out$likepairs
  if (dep.model==3 & two.stage==0) {
	  hessi <- -1*hessi
	  all.likepairs <- out$all.likepairs
	  colnames(all.likepairs) <- c("surv","dt","ds","dtds","cause1","cause2")
  }
     theta.iid <- out$theta.iid %*% hessi
     if (is.null(call.secluster) & is.null(max.clust)) rownames(theta.iid) <- unique(cluster.call) else rownames(theta.iid) <- unique(se.clusters)
     robvar.theta  <- crossprod(theta.iid)
     loglikeiid <- out$loglikeiid
    } else { all.likepairs <- NULL}
  var.theta <- robvar.theta
  if (is.null(robvar.theta)) var.theta <- hessi

  if (!is.null(colnames(theta.des))) thetanames <- colnames(theta.des) else thetanames <- paste("dependence",1:length(theta),sep="")
  theta <- matrix(theta,length(c(theta)),1)
  if (length(thetanames)==nrow(theta)) { rownames(theta) <- thetanames; rownames(var.theta) <- colnames(var.theta) <- thetanames; }
  if (!is.null(logl)) logl <- -1*logl

  if (convergence.bp==0) theta <- rep(NA,length(theta))
  ud <- list(theta=theta,score=score,hess=hess,hessi=hessi,var.theta=var.theta,model=model,robvar.theta=robvar.theta,
             theta.iid=theta.iid,loglikeiid=loglikeiid,likepairs=likepairs,
	     thetanames=thetanames,loglike=logl,score1=score1,Dscore=out$Dscore,
	     marginal.surv=marginal.surv,marginal.trunc=marginal.trunc,baseline=out$baseline,
	     se=diag(robvar.theta)^.5)
  class(ud) <- "mets.twostage"
  attr(ud,"response") <- "survival"
  attr(ud,"Formula") <- formula
  attr(ud,"clusters") <- clusters
  attr(ud,"cluster.call") <- cluster.call
  attr(ud,"secluster") <- c(se.clusters)
  attr(ud,"sym")<-sym;
  attr(ud,"var.link")<-var.link;
  attr(ud,"var.par")<-var.par;
  attr(ud,"var.func")<-var.func;
  attr(ud,"ptheta")<-ptheta
  attr(ud,"antpers")<-antpers;
  attr(ud,"antclust")<-antclust;
  attr(ud,"Type") <- model
  attr(ud,"twostage") <- two.stage
  attr(ud,"additive-gamma") <- (dep.model==3)*1
  if (!is.null(marginal.trunc)) attr(ud,"trunclikeiid")<- out$trunclikeiid
  if (dep.model==3 & two.stage==0) attr(ud,"all.likepairs")<- all.likepairs
  if (dep.model==3 ) attr(ud,"additive.gamma.sum") <- additive.gamma.sum
  #likepairs=likepairs,##  if (dep.model==3 & pair.structure==1) attr(ud, "likepairs") <- c(out$likepairs)
  if (dep.model==3 & pair.structure==0) {
       attr(ud, "pardes") <- theta.des
       attr(ud, "theta.des") <- theta.des
       attr(ud, "rv1") <-    random.design[1,]
    }
    if (dep.model==3 & pair.structure==1) {
       nrv <- clusterindex[1,6]
       attr(ud, "theta.des") <- matrix(theta.des[1,1:(nrv*ptheta)],nrv,ptheta)
       attr(ud, "pardes") <-    matrix(theta.des[1,1:(nrv*ptheta)],nrv,ptheta)
       attr(ud, "rv1") <-    random.design[1,]
       attr(ud, "nrv") <- clusterindex[1,6]
    }

  return(ud);
  #

} #


##' @export
summary.mets.twostage <- function(object,digits = 3,silent=0,theta.des=NULL,...)
{ #
  if (!(inherits(object,"mets.twostage"))) stop("Must be a Two-Stage object")

  var.link<-attr(object,"var.link");
  var.par  <- attr(object,"var.par");
  model <- object$model
  if ((model=="plackett" || model=="or") )         model <- "or"
  if ((model=="clayton.oakes" || model=="gamma") ) model <- "gamma"
  if ((attr(object,"additive-gamma")==1) & (silent==0)) addgam <- TRUE else addgam <- FALSE

  if ((model=="or") && (silent==0)) cat("Dependence parameter for Odds-Ratio (Plackett) model \n");
  if (attr(object,"response")=="binomial") response <- "binomial" else response <- "survival"
  if ((model=="gamma") && (silent==0)) {
	  cat("Dependence parameter for Clayton-Oakes model\n")
	  if (var.par==1 || !addgam) cat("Variance of Gamma distributed random effects \n");
	  if (var.par==0 && addgam) cat("Inverse of variance of Gamma distributed random effects \n");
  }
  if (var.link==1 && silent==0) cat("With log-link \n")
  if ((sum(abs(object$score))>0.0001) & (silent==0))  {
	  cat(" Variance parameters did not converge, allow more iterations.\n");
	  cat(paste("Score:",object$score,"  \n"));
  }
  theta <- object$theta
  if (is.null(rownames(theta)))  rownames(theta) <- paste("dependence",1:nrow(theta),sep="")

  coefs <- coef.mets.twostage(object,response=response,...);

  if (attr(object,"additive-gamma")==1 & (!is.null(object$robvar.theta))  ) {
      var.link <- attr(object,"var.link");
      var.par  <- attr(object,"var.par");
      rv1      <- attr(object,"rv1");
      if (is.matrix(rv1)) rv1 <- rv1[1,]
      theta.des <- attr(object,"pardes");
      ags <- attr(object,"additive.gamma.sum");
      ptheta <- attr(object,"ptheta")
      npar <- nrow(object$theta)
      theta <- object$theta[seq(1,ptheta),1,drop=FALSE]
      robvar.theta <- object$robvar.theta[seq(1,ptheta),seq(1,ptheta)]
      if (var.link==1) par <- theta.des %*% exp(theta) else  par <- theta.des %*% theta

      if (model=="or" || model=="plackett") var.par<-1 ## same as var.par=0 for this model

      if (attr(object,"twostage")==0) {
      }

      if (var.par==0) { #
	      if (var.link==1) {
		  fp <- function(p,d,t){ res <- exp(p*t)/(sum(rv1* c(theta.des %*% matrix(exp(p),ncol=1))))^d;
					     if (t==0) res <- res[1]; return(res); }
                  pare <-   lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) exp(p),labels=rownames(theta))
	      } else {
		      fp <- function(p,d,t) {  res <- (p^t)/(sum(rv1* c(theta.des %*% matrix(p,ncol=1))))^d;
					     if (t==0) res <- res[1]; return(res); }
	          pare <- NULL
	      }

             e      <- lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) fp(p,1,1),labels=rownames(theta))
             vare <-   lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) fp(p,2,1),labels=rownames(theta))
             vartot <- lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) fp(p,1,0))
      } #
      if (var.par==1) { #

	   if (var.link==1) { #
	     fp <- function(p,d,t){  res <- exp(p*t)/(sum(rv1* c(theta.des %*% matrix(exp(p),ncol=1))))^d;
                                     if (t==0) res <- res[1]; return(res); }
             e      <-   lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) fp(p,1,1),labels=rownames(theta))
             vare   <-   lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) exp(p),labels=rownames(theta))
             vartot <-   lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) sum(exp(p)))
      } else {
              fp <- function(p,d,t) {  res <- (p^t)/(sum(rv1* c(theta.des %*% matrix(p,ncol=1))))^d;
                                     if (t==0) res <- res[1]; return(res); }
              e     <-  lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) fp(p,1,1),labels=rownames(theta))
              pare  <-  lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) fp(p,2,1),labels=rownames(theta))
              vartot <- lava::estimate(coef=theta,vcov=robvar.theta,f=function(p) sum(p))
	      vare <- NULL
	       } #
      } #
      res <- list(estimates=coefs, type=attr(object,"Type"),h=e,vare=vare,vartot=vartot)

  } else {
	 if (var.link==1) { #
	     if (model=="or") {
		or <-  lava::estimate(coef=object$theta,vcov=object$var.theta,f=function(p) exp(p),labels=rownames(theta))
	        res <- list(estimates=coefs,or=or,type=attr(object,"Type"))
	     } else {
	        vargam <-  lava::estimate(coef=object$theta,vcov=object$var.theta,f=function(p) exp(p),labels=rownames(theta))
                res <- list(estimates=coefs,vargam=vargam, type=attr(object,"Type"))
	     }
      } else {
	     if (model=="or") {
             lor <- lava::estimate(coef=object$theta,vcov=object$var.theta,f=function(p) log(p),labels=rownames(theta))
	     res <- list(estimates=coefs,log.or=lor,type=attr(object,"Type"))
	     } else  {
             res <- list(estimates=coefs,type=attr(object,"Type"))
	     }
      } #
  }

  class(res) <- "summary.mets.twostage"
  res
} #

##' @export
coef.mets.twostage <- function(object,var.link=NULL,response="survival",...)
{ #
  pt <- attr(object,"ptheta")
  theta <- object$theta[seq(pt),1]

  if (is.null(object$robvar.theta))
	  var.theta  <-  object$var.theta[seq(1,pt),seq(1,pt),drop=FALSE] else
	  var.theta  <-   object$robvar.theta[seq(1,pt),seq(1,pt),drop=FALSE]
  se <- diag(var.theta)^.5

  if (is.null(var.link))
     if (attr(object,"var.link")==1) vlink <- 1 else vlink <- 0
     else vlink <- var.link
  res <- cbind(theta, se )
  wald <- theta/se
  waldp <- (1 - pnorm(abs(wald))) * 2
  if (response=="survival") {
       if (object$model=="plackett") {
       spearman <- alpha2spear(theta,link=vlink)
       Dspear <- numDeriv::jacobian(alpha2spear,theta,link=vlink)
       var.spearman <- Dspear %*% var.theta %*%  Dspear
       se.spearman <- diag(var.spearman)^.5
       res <- as.matrix(cbind(res, wald, waldp,spearman,se.spearman))
       if (vlink==1) colnames(res) <- c("log-Coef.", "SE","z", "P-val","Spearman Corr.","SE")
	  else colnames(res) <- c("Coef.", "SE","z", "P-val","Spearman Corr.","SE")
	  if ((!is.null(object$thetanames)) & (nrow(res)==length(object$thetanames))) rownames(res)<-object$thetanames
       }
       if (object$model=="clayton.oakes") {
       kendall <- alpha2kendall(theta,link=vlink)
       Dken <- numDeriv::jacobian(alpha2kendall,theta,link=vlink)
       var.kendall<- Dken %*% var.theta %*%  Dken
       se.kendall <- diag(var.kendall)^.5
       res <- as.matrix(cbind(res, wald, waldp,kendall,se.kendall))
       if (vlink==1) colnames(res) <- c("log-Coef.", "SE","z", "P-val","Kendall tau","SE")
       else colnames(res) <- c("Coef.", "SE","z", "P-val","Kendall tau","SE")
       if ((!is.null(object$thetanames)) & (nrow(res)==length(object$thetanames))) rownames(res)<-object$thetanames
       }
  }
  return(res)
} #

##' @export
print.mets.twostage<-function(x,digits=3,...)
{ #
  cat("\n")
  print(summary(x,silent=0));
} #

##' @export
plot.mets.twostage<-function(x,pointwise.ci=1,robust=0,specific.comps=FALSE,
		level=0.05,
		start.time=0,stop.time=0,add.to.plot=FALSE,mains=TRUE,
                xlab="Time",ylab ="Cumulative regression function",...)
{ #
  if (!(inherits(x, 'two.stage'))) stop("Must be a Two-Stage object")
  object <- x; rm(x);

  B<-object$cum; V<-object$var.cum; p<-dim(B)[[2]];
  if (robust>=1) V<-object$robvar.cum;

  if (sum(specific.comps)==FALSE) comp<-2:p else comp<-specific.comps+1
  if (stop.time==0) stop.time<-max(B[,1]);

  med<-B[,1]<=stop.time & B[,1]>=start.time
  B<-B[med,]; Bs<-B[1,];  B<-t(t(B)-Bs); B[,1]<-B[,1]+Bs[1];
  V<-V[med,]; Vs<-V[1,]; V<-t( t(V)-Vs);
  Vrob<-object$robvar.cum;
  Vrob<-Vrob[med,]; Vrobs<-Vrob[1,]; Vrob<-t( t(Vrob)-Vrobs);

  c.alpha<- qnorm(1-level/2)
  for (v in comp) {
    c.alpha<- qnorm(1-level/2)
    est<-B[,v];ul<-B[,v]+c.alpha*V[,v]^.5;nl<-B[,v]-c.alpha*V[,v]^.5;
    if (add.to.plot==FALSE)
      {
        plot(B[,1],est,ylim=1.05*range(ul,nl),type="s",xlab=xlab,ylab=ylab)
        if (mains==TRUE) title(main=colnames(B)[v]); }
    else lines(B[,1],est,type="s");
    if (pointwise.ci>=1) {
      lines(B[,1],ul,lty=pointwise.ci,type="s");
      lines(B[,1],nl,lty=pointwise.ci,type="s"); }
    if (robust>=1) {
      lines(B[,1],ul,lty=robust,type="s");
      lines(B[,1],nl,lty=robust,type="s"); }
    abline(h=0);
  }
}  #

##' @export
matplot.mets.twostage <- function(object,...)
{ #
B <- object$baseline
matplot(B[,1],B[,-1],type="s",...)
} #

##' @export
predict.mets.twostage <- function(object,X=NULL,Z=NULL,times=NULL,times2=NULL,theta.des=NULL,diag=TRUE,...)
{ #
time.coef <- data.frame(object$cum)
if (!is.null(times)) {
cum <- cpred(object$cum,times);
cum2 <- cpred(object$cum,times);
} else { cum <- object$cum; cum2 <- object$cum }
if (!is.null(times2)) cum2 <- cpred(object$cum,times2);

if (is.null(X)) X <- 1;
if (is.null(X) & (!is.null(Z))) { Z <- as.matrix(Z);  X <- matrix(1,nrow(Z),1)}
if (is.null(Z) & (!is.null(X)))  {X <- as.matrix(X);  Z <- matrix(0,nrow(X),1); gamma <- 0}

if (diag==FALSE) {
   time.part <-  X %*% t(cum[,-1])
   time.part2 <-  X %*% t(cum2[,-1])
   if (!is.null(object$gamma)) { RR <- exp( Z %*% gamma );
       cumhaz <- t( t(time.part) * RR ); cumhaz2 <- t( t(time.part2) * RR )}
	    else { cumhaz <- time.part;  cumhaz2 <- time.part2;   }
} else {
	time.part <-  apply(as.matrix(X*cum[,-1]),1,sum)
	time.part2 <-  apply(as.matrix(X*cum2[,-1]),1,sum)
}

if (!is.null(object$gamma)) {
	RR<- exp(Z%*%gamma);
	cumhaz <- t( t(time.part) * RR );
	cumhaz2 <- t( t(time.part2) * RR )} else {
		cumhaz <- time.part;  cumhaz2 <- time.part2;
}
S1 <- exp(-cumhaz); S2 <- exp(-cumhaz2)

if (attr(object,"var.link")==1) theta  <- exp(object$theta) else theta <- object$theta
if (!is.null(theta.des)) theta <- c(theta.des %*% object$theta)

if (diag==FALSE) St1t2<- (outer(c(S1)^{-(theta)},c(S2)^{-(theta)},FUN="+") - 1)^(-(1/theta)) else
St1t2<- ((S1^{-(theta)}+S2^{-(theta)})-1)^(-(1/theta))

out=list(St1t2=St1t2,S1=S1,S2=S2,times=times,times2=times2,theta=theta)
return(out)
} #

##' @export ascertained.pairs
ascertained.pairs <-function (pairs,data,cr.models,bin=FALSE)
{
      timestatus <- all.vars(cr.models)
      ### let first event by second column and only
      ### use pairs where first is event
      apairs <- pairs
      if (bin==TRUE) fj <- ifelse(data[pairs[,1],timestatus[1]] > data[pairs[,2],timestatus[1]],1,2)
      else fj <- ifelse(data[pairs[,1],timestatus[1]] < data[pairs[,2],timestatus[1]],1,2)
      ### change order when 1st comes first
      apairs[fj==1,1] <- pairs[fj==1,2]
      apairs[fj==1,2] <- pairs[fj==1,1]
      ### only take pairs where first is a jump
      if (bin==FALSE) {
	      jmpf <- (data[apairs[,2],timestatus[2]]==1)
	      apairs <- apairs[data[apairs[,2],timestatus[2]]==1,]
              attr(pairs,"jump-first") <- jmpf
      }
      pairs <- apairs
      return(pairs)
} 

##' @export
alpha2spear <- function(theta,link=1) { #
   if (link==1) theta <- exp(theta)
if (length(theta)>1) {
   out <- c()
   for (thet in theta) {
   if (thet!=1) out <- c(out,( (thet+1)/(thet-1) -2* thet* log(thet)/ (thet-1)^2))
   else out <- c(out,0)
   }
} else { if (theta!=1) out <- ( (theta+1)/(theta-1) -2* theta* log(theta)/ (theta-1)^2) }

return(out)
} #

##' @export
alpha2kendall <- function(theta,link=0) {  #
   if (link==1) theta <- exp(theta)
   return(1/(1+2/theta))
} #

##' @export piecewise.twostage
piecewise.twostage <- function(cut1,cut2,data=parent.frame(),timevar="time",status="status",id="id",covars=NULL,covars.pairs=NULL,num=NULL,
            method="optimize",Nit=100,detail=0,silent=1,weights=NULL,
            control=list(),theta=NULL,theta.des=NULL,var.link=1,
	    step=0.5,model="plackett",data.return=0)
{ #
iid <- 1
ud <- list()
if (missing(cut2)) cut2 <- cut1;
nc1 <- length(cut1); nc2 <- length(cut2)
names1 <- names2 <- c()
theta.mat <- se.theta.mat <- cor.mat <- score.mat <- se.cor.mat <- matrix(0,nc1-1,nc2-1);
clusters <- data[,id]
cluster.call <- clusters
idi <- unique(data[,id]);

if (iid==1) { theta.iid <- matrix(0,length(idi),(nc1-1)*(nc2-1));
              rownames(theta.iid) <- idi
            } else theta.iid <- NULL

thetal <- c()
k <- 0;
for (i1 in 2:nc1)
for (i2 in 2:nc2)
{
k <-(i1-2)*(nc2-1)+(i2-1)
if (silent<=0) cat(paste("Data-set ",k,"out of ",(nc1-1)*(nc2-1)),"\n");
datalr <- surv.boxarea(c(cut1[i1-1],cut2[i2-1]),c(cut1[i1],cut2[i2]),data,timevar=timevar,
status=status,id=id,covars=covars,covars.pairs=covars.pairs,num=num,silent=silent)
if (silent<=-1) print("back in piecewise.twostage");
if (silent<=-1) print(summary(datalr));
if (silent<=-1) print(head(datalr));
if (silent<=-1) print(summary(datalr[,id]));
 boxlr <- list(left=c(cut1[i1-1],cut2[i2-1]),right=c(cut1[i1],cut2[i2]))
datalr$tstime <- datalr[,timevar]
datalr$tsstatus <- datalr[,status]
datalr$tsid <- datalr[,id]
###

if (is.null(covars))
f <- as.formula(with(attributes(datalr),paste("Surv(",time,",",status,")~-1+factor(",num,")")))
else f <- as.formula(with(attributes(datalr),paste("Surv(",time,",",status,")~-1+factor(",num,"):",covars)))
marg1 <- timereg::aalen(f,data=datalr,n.sim=0,robust=0)

fitlr<-  survival.twostageCC(marg1,data=datalr,clusters=datalr$tsid,model=model,method=method,
Nit=Nit,detail=detail,silent=silent,weights=weights,
baseline.iid=0,control=control,theta=theta,theta.des=theta.des,
var.link=var.link, step=step)
####
coef <- coef(fitlr)
theta.mat[i1-1,i2-1] <- fitlr$theta
se.theta.mat[i1-1,i2-1] <- fitlr$var.theta^.5
cor.mat[i1-1,i2-1] <- coef[1,5]
se.cor.mat[i1-1,i2-1] <- coef[1,6]
score.mat[i1-1,i2-1] <- fitlr$score
if (data.return==0)
ud[[k]] <- list(index=c(i1,i2),left=c(cut1[i1-1],cut2[i2-1]),right=c(cut1[i1],cut2[i2]),fitlr=fitlr)
if (data.return==1)
ud[[k]] <- list(index=c(i1,i2),left=c(cut1[i1-1],cut2[i2-1]),right=c(cut1[i1],cut2[i2]),fitlr=fitlr,data=datalr)
if (i2==2) names1 <- c(names1, paste(cut1[i1-1],"-",cut1[i1]))
if (i1==2) names2 <- c(names2, paste(cut2[i2-1],"-",cut2[i2]))
thetal <- c(thetal,fitlr$theta)

if ((silent<=-1) & (iid==1)) print(head(fitlr$theta.iid));
if ((silent<=-1) & (iid==1)) {
print(idi) ; print(datalr$tsid)
print(dim(fitlr$theta.iid))
print(head(fitlr$theta.iid))
print(dim(theta.iid))
print(length( idi %in% unique(datalr$tsid)))
}
if (iid==1) theta.iid[idi %in% unique(datalr$tsid),k] <-c(fitlr$theta.iid)
###if (iid==1) theta.iid[rownames(fitlr$theta.iid),k] <-  fitlr$theta.iid
}

var.thetal <- NULL
if (iid==1)  var.thetal <- t(theta.iid) %*% theta.iid

colnames(score.mat) <- colnames(cor.mat) <-  colnames(se.cor.mat)  <- colnames(se.theta.mat) <- colnames(theta.mat) <- names1;
rownames(score.mat) <- rownames(cor.mat) <-  rownames(se.cor.mat) <-  rownames(se.theta.mat) <- rownames(theta.mat) <- names2;

ud <- list(model.fits=ud,theta=theta.mat,var.theta=se.theta.mat^2,
	   se.theta=se.theta.mat,thetal=thetal,thetal.iid=theta.iid,var.thetal=var.thetal,model=model,
	   cor=cor.mat,se.cor=se.cor.mat,score=score.mat);
class(ud)<-"pc.twostage"
attr(ud,"var.link")<-var.link;
attr(ud, "Type") <- model
return(ud);
} #

##' @export piecewise.data
piecewise.data <- function(cut1,cut2,data=parent.frame(),timevar="time",status="status",id="id",covars=NULL,covars.pairs=NULL,num=NULL,silent=1)
{ #
ud <- list()
if (missing(cut2)) cut2 <- cut1;
nc1 <- length(cut1); nc2 <- length(cut2)
dataud <- c()

k <- 0;
for (i1 in 2:nc1)
for (i2 in 2:nc2)
{
k <-(i1-2)*(nc2-1)+(i2-1)
if (silent<=0) cat(paste("Data-set ",k,"out of ",(nc1-1)*(nc2-1)),"\n");
 datalr <- surv.boxarea(c(cut1[i1-1],cut2[i2-1]),c(cut1[i1],cut2[i2]),data,timevar=timevar,
			status=status,id=id,covars=covars,covars.pairs=covars.pairs,num=num,silent=silent)
if (silent<=-1) print(summary(datalr));
if (silent<=-1) print(head(datalr));
datalr$tstime <- datalr[,timevar]
datalr$tsstatus <- datalr[,status]
datalr$tsid <- datalr[,id]
datalr$strata <- paste( c(cut1[i1-1],cut2[i2-1]),c(cut1[i1],cut2[i2]),collapse=",",sep="-")
datalr$intstrata <-
c(paste(c(cut1[i1-1],cut1[i1]),collapse=",",sep="-"),paste( c(cut2[i2-1],cut2[i2]),collapse=",",sep="-"))

if (silent<=-1) print(head(datalr));
dataud <- rbind(dataud,datalr)
}

return(data.frame(dataud))
} #

##' @export
summary.pc.twostage <- function(object,var.link=NULL,...)
{ #
  if (!(inherits(object,"pc.twostage"))) stop("Must be a Piecewise constant two-Stage object")

  res <- list(estimates=object$theta,se=object$se.theta,cor=object$cor,se.cor=object$se.cor,
	      model=object$model,score=object$score)
  class(res) <- "summary.pc.twostage"
  attr(res,"var.link")<-attr(object,"var.link");
  attr(res, "Type") <- object$model
  res
} #

##' @export
print.pc.twostage <- function(x,var.link=NULL,...)
{ #
   if (!(inherits(x,"pc.twostage"))) stop("Must be a Piecewise constant two-Stage object")
   print( summary(x,var.link=var.link,...))
} #

##' @export
print.summary.pc.twostage <- function(x,var.link=NULL, digits=3,...)
{ #

  if (is.null(var.link)) { if (attr(x,"var.link")==1) vlink <- 1 else vlink <- 0; } else vlink <- var.link
  print(vlink)

  if (x$model=="plackett") cat("Dependence parameter for Plackett model \n");
  if (x$model=="clayton.oakes") cat("Dependence parameter for Clayton-Oakes model \n");

  if (max(x$score)>0.001) { cat("Score of log-likelihood for parameter estimates (too large?)\n"); print(x$score);cat("\n\n");}

  if (vlink==1) cat("log-coefficient for dependence parameter (SE) \n")  else cat("Dependence parameter (SE) \n");
  print(coefmat(x$estimate,x$se,digits=digits,...))
  cat("\n")

  if (x$model=="plackett") {cat("Spearman Correlation (SE) \n");cor.type <- "Spearman Correlation"; }
  if (x$model=="clayton.oakes") {cat("Kendall's tau (SE) \n"); cor.type <- "Kendall's tau";}

  print(coefmat(x$cor,x$se.cor,digits,...))
  cat("\n")
} #

##' @export
coefmat <- function(est,stderr,digits=3,...) { #
  myest <- round(10^digits*(est))/10^digits;
  myest <- paste(ifelse(myest<0,""," "),myest,sep="")
  mysd <- round(10^digits*(stderr))/10^digits;
  res <- matrix(paste(format(myest)," (",format(mysd),")",sep=""),ncol=ncol(est))
  dimnames(res) <- dimnames(est)
  colnames(res) <- paste("",colnames(res))
  noquote(res)
} #

##' Wrapper for easy fitting of Clayton-Oakes or bivariate Plackett models for bivariate survival data
##'
##' Fits two-stage model for describing depdendence in survival data
##' using marginals that are on cox or aalen form using the twostage funcion, but
##' call is different and easier and the data manipulation  build into the function.
##' Useful in particular for family design data.
##'
##' If clusters contain more than two times, the algoritm uses a composite likelihood
##' based on the pairwise bivariate models.
##'
##' The reported standard errors are based on the estimated information from the
##' likelihood assuming that the marginals are known.
##'
##' @examples
##' library(mets)
##' data("prt",package="mets")
##' prtsam <- blocksample(prt,idvar="id",1e3,replace=FALSE)
##' margp <- coxph(Surv(time,status==1)~factor(country),data=prtsam)
##' fitco <- survival.twostage(margp,data=prtsam,clusters=prtsam$id)
##' summary(fitco)
##'
##' des <- model.matrix(~-1+factor(zyg),data=prtsam);
##' fitco <- survival.twostage(margp,data=prtsam,theta.des=des,clusters=prtsam$id)
##' summary(fitco)
##' rm(prtsam)
##'
##' dfam <- simSurvFam(1000)
##' dfam <- fast.reshape(dfam,var=c("x","time","status"))
##'
##' desfs <- function(x,num1="num1",num2="num2")
##' {
##' pp <- (x[num1]=="m")*(x[num2]=="f")*1   ## mother-father
##' pc <- (x[num1]=="m" | x[num1]=="f")*(x[num2]=="b1" | x[num2]=="b2")*1 ## mother-child
##' cc <- (x[num1]=="b1")*(x[num2]=="b1" | x[num2]=="b2")*1               ## child-child
##' c(pp,pc,cc)
##' }
##'
##' marg <- coxph(Surv(time,status)~factor(num),data=dfam)
##' out3 <- easy.survival.twostage(marg,data=dfam,time="time",status="status",id="id",
##'              deshelp=0,
##'              method="nr",theta.formula=desfs,
##'              model="plackett",
##'              desnames=c("parent-parent","parent-child","child-cild"))
##' summary(out3)
##'
##' @keywords survival twostage
##' @export easy.survival.twostage
##' @param margsurv model
##' @param data data frame
##' @param method Scoring method
##' @param status Status at exit time
##' @param time Exit time
##' @param entry Entry time
##' @param id name of cluster variable in data frame
##' @param Nit Number of iterations
##' @param detail Detail for more output for iterations
##' @param silent Debug information
##' @param weights Weights for log-likelihood, can be used for each type of outcome in 2x2 tables.
##' @param control Optimization arguments
##' @param theta Starting values for variance components
##' @param theta.formula design for depedence, either formula or design function
##' @param desnames names for dependence parameters
##' @param deshelp if 1 then prints out some data sets that are used, on on which the design function operates
##' @param var.link Link function for variance (exp link)
##' @param step Step size for newton-raphson
##' @param model plackett or clayton-oakes model
##' @param marginal.surv vector of marginal survival probabilities
##' @param strata strata for fitting
##' @param se.clusters clusters for iid decomposition for roubst standard errors
easy.survival.twostage <- function(margsurv=NULL,data=parent.frame(),method="nlminb",
status="status",time="time",entry=NULL,id="id", Nit=60,detail=0, silent=1,weights=NULL, control=list(),
theta=NULL,theta.formula=NULL,desnames=NULL,deshelp=0,var.link=1,
step=0.5,model="plackett",marginal.surv=NULL,strata=NULL,se.clusters=NULL)
{ #
### marginal trunction probabilty, to be computed from model
pentry <- NULL

if (is.null(marginal.surv))
if (inherits(margsurv,"coxph"))
{ #
    coxformula <- margsurv$formula
    X <- model.matrix(coxformula,data=data)[,-1];
    baseout <- survival::basehaz(margsurv,centered=FALSE);
    baseout <- cbind(baseout$time,baseout$hazard)
    cumh <-  cpred(baseout,data[,time])[,2]
    RR<-exp(X %*% coef(margsurv))
    ps<-exp(-cumh*RR)
    #
  } else if (inherits(margsurv,"phreg"))
  {  #
	if (!is.null(margsurv$coef))
		rr <- c(exp(margsurv$X  %*% margsurv$coef))
	else rr <- rep(1,nrow(margsurv$X))
        ps <- exp(-rr * cpred(margsurv$cumhaz,margsurv$exit)[,2])
  } #
  else stop("marginal survival probabilities must be given as marginal.sur or margsurv \n");

  data <- cbind(data,ps)
  if (!is.null(pentry)) data <- cbind(data,pentry)

  ### make all pairs in the families,
  fam <- familycluster.index(data[,id])
  data.fam <- data[fam$familypairindex,]
  data.fam$subfam <- fam$subfamilyindex

  ### make dependency design using wide format for all pairs
  data.fam.clust <- fast.reshape(data.fam,id="subfam")
  if (is.function(theta.formula)) {
     desfunction <- compiler::cmpfun(theta.formula)
    if (deshelp==1){
 	  cat("These names appear in wide version of pairs for dependence \n")
	  cat("design function must be defined in terms of these: \n")
	  cat(names(data.fam.clust)); cat("\n")
	  cat("Here is head of wide version with pairs\n")
	  print(head(data.fam.clust)); cat("\n")
    }
    des.theta  <- t( apply(data.fam.clust,1,desfunction))
    colnames(des.theta) <- desnames
    desnames <- desnames
     } else {
	  if (is.null(theta.formula)) theta.formula <- ~+1
          des.theta <- model.matrix(theta.formula,data=data.fam.clust)
          desnames <- colnames(des.theta);
     }
     data.fam.clust <- cbind(data.fam.clust,des.theta)
     if (deshelp==1) {
	 cat("These names appear in wide version of pairs for dependence \n")
	     print(head(data.fam.clust))
     }

    ### back to long format keeping only needed variables
     if (is.null(pentry))
    data.fam <- fast.reshape(data.fam.clust,varying=c(id,"ps",status))
    else data.fam <- fast.reshape(data.fam.clust,varying=c(id,"ps",status,"pentry"))
    if (deshelp==1) {
	cat("Back to long format for twostage (head)\n");
        print(head(data.fam));
	cat("\n")
	cat(paste("cluster=",id,",  subcluster (pairs)=subfam \n"));
	cat(paste("design variables ="));
	cat(desnames)
	cat("\n")
    }

    if (is.null(pentry)) ptrunc <- NULL else ptrunc <- data.fam[,pentry]

    out <- survival.twostageCC(NULL,data=data.fam,
                    clusters=data.fam$subfam,
		    theta.des=as.matrix(data.fam[,desnames]),
                    detail=detail, method=method, Nit=Nit,step=step,
                    theta=theta, var.link=var.link,model=model,
                    marginal.survival=data.fam[,"ps"],
                    marginal.status=data.fam[,status],
		    marginal.trunc=ptrunc,baseline.iid=0,se.clusters=data.fam[,id])
   return(out)
} #

##' @export
simSurvFam <- function(n,beta=0.0,theta=1,lam0=0.5,lam1=1,lam2=1,ctime=10,...) { #
xm <- rbinom(n,1,0.5); xf <- rbinom(n,1,0.5);
xb1 <- rbinom(n,1,0.5); xb2 <- rbinom(n,1,0.5);
###
zf <- rgamma(n,shape=lam1); zb <- rgamma(n,shape=lam2);
tm <- rexp(n)/(zf*exp(xm*beta)*lam0)
tf <- rexp(n)/(zf*exp(xf*beta)*lam0)
tb1 <- rexp(n)/((zf+zb)*exp(xb1*beta)*2*lam0)
tb2 <- rexp(n)/((zf+zb)*exp(xb2*beta)*2*lam0)
cm <- ifelse(tm<ctime,1,0); cf <- ifelse(tf<ctime,1,0);
cb1 <- ifelse(tb1<ctime,1,0); cb2 <- ifelse(tb2<ctime,1,0);
tm <- ifelse(tm<ctime,tm,ctime); tf <- ifelse(tf<ctime,tf,ctime)
tb1 <- ifelse(tb1<ctime,tb1,ctime); tb2 <- ifelse(tb2<ctime,tb2,ctime)
#
data.frame(xm=xm,xf=xf,xb1=xb1,xb2=xb2,timem=tm,timef=tf,timeb1=tb1,timeb2=tb2,statusm=cm,statusf=cf,
	   statusb1=cb1,statusb2=cb2,id=1:n)
} #

##' @export
object.defined <- function(object)
{
   exists(as.character(substitute(object)))
}

##' @export
twin.polygen.design <-function (data,id="id",zyg="DZ",zygname="zyg",type="ace",tv=NULL,...) { #
  ### twin case
  id <- data[,id]
  tv <- diff(c(NA,id))
  tv[tv!=0 | is.na(tv)] <- 1
  tv[tv==0] <- 2

  zygbin <- (data[,zygname]==zyg)*1
  zygdes=model.matrix(~-1+factor(zygbin),data)
  n <- length(zygbin)

  if (type=="ace") { ### ace #
  ### random effects for each cluster
  DZns <- cbind((zygbin==1)*(tv==1)*cbind(rep(1,n),rep(0,n))+
		(zygbin==1)*(tv==2)*cbind(rep(0,n),rep(1,n)))
  des.rv <- cbind(zygdes,DZns,1)
  colnames(des.rv) <- c("MZ","DZ","DZns1","DZns2","env")
  pard <- rbind(c(1,0), c(0.5,0),c(0.5,0), c(0.5,0), c(0,1))
  } #

  if (type=="ae") { ### ae #
  ###AE model
  DZns <- cbind((zygbin==1)*(tv==1)*cbind(rep(1,n),rep(0,n))+
		(zygbin==1)*(tv==2)*cbind(rep(0,n),rep(1,n)))
  des.rv <- cbind(zygdes,DZns)
  colnames(des.rv) <- c("MZ","DZ","DZns1","DZns2")
  pard <- rbind(c(1,0), c(0.5,0),c(0.5,0), c(0.5,0))[,1,drop=FALSE]
  } #

  if (type=="dce") { ### dce #
  ### DCE
  ### random effects for each cluster
  DZns <- cbind((zygbin==1)*(tv==1)*cbind(rep(1,n),rep(0,n))+(zygbin==1)*(tv==2)*cbind(rep(0,n),rep(1,n)))
  des.rv <- cbind(zygdes,DZns,1)
  colnames(des.rv) <- c("MZ","DZ","DZns1","DZns2","env")
  pard <- rbind(c(1,0), c(0.25,0),c(0.75,0), c(0.75,0), c(0,1))
  } #

  if (type=="ade") { ### ade #
  #ADE
  DZns <- cbind((zygbin==1)*(tv==1)*cbind(rep(1,n),rep(0,n))+(zygbin==1)*(tv==2)*cbind(rep(0,n),rep(1,n)))
  des.rv <- cbind(zygdes,DZns,DZns,1)
  pard <- rbind(c(1,0),c(0.25,0),c(0.75,0),c(0.75,0),c(0,1),c(0,0.5),c(0,0.5),c(0,0.5) )
  pardes <- matrix(pard,n,16,byrow=TRUE)
  des.rv <- NULL
  } #

  if (type=="adce") { ### adce #
  } #

  if (type=="de") { ### ae #
  DZns <- cbind((zygbin==1)*(tv==1)*cbind(rep(1,n),rep(0,n))+
		(zygbin==1)*(tv==2)*cbind(rep(0,n),rep(1,n)))
  des.rv <- cbind(zygdes,DZns)
  colnames(des.rv) <- c("MZ","DZ","DZns1","DZns2")
  pard <- rbind(c(1,0), c(0.25,0),c(0.75,0), c(0.75,0))[,1,drop=FALSE]
  } #

  if (type=="un") { ### ae #
  des.rv <- cbind(zygdes)
  colnames(des.rv) <- c("MZ","DZ")
  pard <- rbind(c(1,0),c(0,1))
  } #

res <- list(pardes=pard,des.rv=des.rv)
return(res)
} #

##' @export
ace.family.design <-function (data,id="id",member="type",mother="mother",father="father",child="child",child1="child",type="ace",...) {
#
  ### standard family case
###  nid <- table(data[,id])
  id <- data[,id]

  if (type=="ace") { ### ace #
  ### random effects for each cluster
	  mom <- 1*(data[,member]==mother)
	  mo <- cbind(mom*1,1,1,1)*(mom)
	  fa <- (data[,member]==father)
	  fad <- cbind(fa,1,1,1)*fa
	  ch1 <- (data[,member]==child)*(data[,child1]==1)
	  cc1 <- cbind(ch1,1,0,0,1,1,0,0)*ch1
	  ch2 <- (data[,member]==child)*(data[,child1]==0)
	  cc2 <- cbind(ch2,0,1,0,1,0,1,0)*ch2
	  des.rv <- cbind(cbind(mo,fad)+cc1+cc2,1)

	  colnames(des.rv) <- c("m1","m2","m3","m4","f1","f2","f3","f4","env")
	  pard <- rbind( c(0.25,0), c(0.25,0),c(0.25,0), c(0.25,0),
			 c(0.25,0), c(0.25,0),c(0.25,0), c(0.25,0), c(0,1))
  } #

  if (type=="ae") { ### ae #
  ###AE model
          mom <- 1*(data[,member]==mother)
	  mo <- cbind(mom*1,1,1,1)*(mom)
	  fa <- (data[,member]==father)
	  fad <- cbind(fa,1,1,1)*fa
	  ch1 <- (data[,member]==child)*(data[,child1]==1)
	  cc1 <- cbind(ch1,1,0,0,1,1,0,0)*ch1
	  ch2 <- (data[,member]==child)*(data[,child1]==0)
	  cc2 <- cbind(ch2,0,1,0,1,0,1,0)*ch2
	  des.rv <- cbind(cbind(mo,fad)+cc1+cc2)
	  colnames(des.rv) <- c("m1","m2","m3","m4","f1","f2","f3","f4")
	  pard <- rbind( c(0.25), c(0.25),c(0.25), c(0.25),
			 c(0.25), c(0.25),c(0.25), c(0.25))
  } #

  if (type=="dce") { ### dce #
  ### DCE
  ### random effects for each cluster
	  stop("not done yet");
          mom <- 1*(data[,member]==mother)
	  mo <- cbind(mom*1,1,1,1)*(mom)
	  fa <- (data[,member]==father)
	  fad <- cbind(fa,1,1,1)*fa
	  ch1 <- (data[,member]==child)*(data[,child1]==1)
	  cc1 <- cbind(ch1,1,0,0,1,1,0,0)*ch1
	  ch2 <- (data[,member]==child)*(data[,child1]==0)
	  cc2 <- cbind(ch2,0,1,0,1,0,1,0)*ch2
	  des.rv <- cbind(cbind(mo,fad)+cc1+cc2)
	  colnames(des.rv) <- c("m1","m2","m3","m4","f1","f2","f3","f4")
	  pard <- rbind( c(0.25), c(0.25),c(0.25), c(0.25),
			 c(0.25), c(0.25),c(0.25), c(0.25))

  } #

  if (type=="ade") { ### ade #
  #ADE
    stop("not done yet");
###    pard <- rbind(c(1,0),c(0.25,0),c(0.75,0),c(0.75,0),c(0,1),c(0,0.5),c(0,0.5),c(0,0.5) )
###    pardes <- matrix(pard,n,16,byrow=TRUE)
  } #

  if (type=="adce") { ### adce #
	  stop("not done yet");
  } #

  if (type=="de") { ### ae #
    stop("not done yet");
    pard <- rbind(c(1,0), c(0.25,0),c(0.75,0), c(0.75,0))[,1,drop=FALSE]
  } #

  if (type=="un") { ### ae #
	 stop("not done yet");
         pard <- diag(4)
  } #

res <- list(pardes=pard,des.rv=des.rv)
return(res)
} #

##' @export
make.pairwise.design  <- function(pairs,kinship,type="ace")
{ #
### makes pairwise random effects design for shared and non-shared random effects
### kinship gives shared genes for each pair

uk <- sort(unique(kinship))
iuk <- fast.approx(uk,kinship)

if (type=="ace")  {
   theta.des <-  matrix(0,length(uk),8)
   i <- 0
   for (uuk in uk)  {
     i <- i+1
     theta.des[i,] <-c( rbind(c(uuk,0), c(1-uuk,0), c(1-uuk,0), c(0,1)))
   }
   random.des <- rbind(c(1,1,0,1),c(1,0,1,1))
   rvs <- rep(4,length(kinship))
   new.pairs <- cbind(pairs,1,2,iuk,4)
}

if (type=="ae") {
  i <- 0
  theta.des <-  matrix(0,length(uk),3)
  for (uuk in uk)  {
    i <- i+1
    theta.des[i,] <- rbind(c(uuk), c(1-uuk), c(1-uuk))
  }
  random.des <- rbind(c(1,1,0),c(1,0,1))
  rvs <- rep(3,length(kinship))
  new.pairs <- cbind(pairs,1,2,iuk,3)
}

if (type=="ade") {
   stop(" not specified yet \n")
   theta.des <-  matrix(0,length(uk),8)
   i <- 0
   for (uuk in uk)  {
      i <- i+1
      theta.des[i,] <- rbind(c((uuk==1)+(uuk!=1)*uuk*0.5,0), c(1-(uuk==1)+(uuk!=1)*uuk*0.5,0),
			c(1-(uuk==1)+(uuk!=1)*uuk*0.5,0),c(0,1))
   }
   random.des <- rbind(c(1,1,0,1),c(1,0,1,1))
   rvs <- rep(4,length(kinship))
   new.pairs <- cbind(pairs,1,2,iuk,4)
}

if (type=="ad") {
   stop(" not specified yet \n")
   i <- 0
   for (uuk in uk)  {
   i <- i+1
   theta.des[i,] <- rbind(c((uuk==1)+(uuk!=1)*uuk*0.5,0), c(1-(uuk==1)+(uuk!=1)*uuk*0.5,0),
			c(1-(uuk==1)+(uuk!=1)*uuk*0.5,0),c(0,1))
   random.des <- rbind(c(1,1,0),c(1,0,1))
   new.pairs <- cbind(pairs,1,2,iuk,2)
   }
}

return(list(new.pairs=new.pairs,theta.des=theta.des,random.design=random.des))
} #


##' Relative risk for additive gamma model
##'
##' Computes the relative risk for additive gamma model at time 0
##'
##' @references
##'
##' Eriksson and Scheike (2015), Additive Gamma frailty models for competing risks data, Biometrics (2015)
##'
##' @examples
##' lam0 <- c(0.5,0.3)
##' pars <- c(1,1,1,1,0,1)
##' ## genetic random effects, cause1, cause2 and overall
##' parg <- pars[c(1,3,5)]
##' ## environmental random effects, cause1, cause2 and overall
##' parc <- pars[c(2,4,6)]
##'
##' ## simulate competing risks with two causes with hazards 0.5 and 0.3
##' ## ace for each cause, and overall ace
##' out <- simCompete.twin.ace(10000,parg,parc,0,2,lam0=lam0,overall=1,all.sum=1)
##'
##' ## setting up design for running the model
##' mm <- familycluster.index(out$cluster)
##' head(mm$familypairindex,n=10)
##' pairs <- matrix(mm$familypairindex,ncol=2,byrow=TRUE)
##' tail(pairs,n=12)
##' #
##' kinship <- (out[pairs[,1],"zyg"]=="MZ")+ (out[pairs[,1],"zyg"]=="DZ")*0.5
##'
##' # dout <- make.pairwise.design.competing(pairs,kinship,
##' #          type="ace",compete=length(lam0),overall=1)
##' # head(dout$ant.rvs)
##' ## MZ
##' # dim(dout$theta.des)
##' # dout$random.design[,,1]
##' ## DZ
##' # dout$theta.des[,,nrow(pairs)]
##' # dout$random.design[,,nrow(pairs)]
##' #
##' # thetades <- dout$theta.des[,,1]
##' # x <- dout$random.design[,,1]
##' # x
##' ##EVaddGam(rep(1,6),x[1,],x[3,],thetades,matrix(1,18,6))
##'
##' # thetades <- dout$theta.des[,,nrow(out)/2]
##' # x <- dout$random.design[,,nrow(out)/2]
##' ##EVaddGam(rep(1,6),x[1,],x[4,],thetades,matrix(1,18,6))
##' @author Thomas Scheike
##' @export
##' @param theta theta
##' @param x1 x1
##' @param x2 x2
##' @param thetades thetades
##' @param ags ags
EVaddGam <- function(theta,x1,x2,thetades,ags)
{ #
	pars <- thetades %*% theta
	lamtot <- ags %*% theta

	mvar <- pars/lamtot
	vvar <- pars/lamtot^2

	x1mvar <- sum(x1 * mvar)
	x2mvar <- sum(x2 * mvar)
	x1x2vvar <- sum(x1*x2*vvar)

	list(x1m=x1mvar,mx2=x2mvar,
	     dN=x1x2vvar/x2mvar)
} #


##' @export
twostage.aalen <- function(object,...) survival.twostage(object,...)

##' @export
twostage.cox.aalen <- function(object,...) survival.twostage(object,...)

##' @export
twostage.coxph <- function(object,...) survival.twostage(object,...)

##' @export
twostage.phreg <- function(object,...) survival.twostage(object,...)

Try the mets package in your browser

Any scripts or data that you put into this service are public.

mets documentation built on Jan. 17, 2023, 5:12 p.m.