Nothing
#' Estimate a mixed proportional hazard model
#'
#' @description
#' \code{mphcrm} implements estimation of a mixed proportional hazard competing risk model.
#' The baseline hazard is of the form \eqn{exp(X \beta)} where \eqn{X} is a
#' matrix of covariates, and \eqn{\beta} is a vector of parameters to estimate.
#' In addition there is an intercept term \eqn{\mu}, i.e. the hazard is \eqn{exp(X \beta + \mu)}.
#' There are several transitions to be made, and a set of \eqn{X}, \eqn{\beta}, and \eqn{\mu} for
#' each possible transition.
#'
#' Each individual may have several observations, with either a transition at the end of the
#' observation, or not a transition. It is a competing risk, there can be more than one possible
#' transition for an observation, but only one is taken at the end of the period.
#'
#' For each individual \eqn{i} there is a log likelihood as a function of \eqn{\mu}, called \eqn{M_i(\mu)}.
#'
#' The mixture part is that the \eqn{\mu}'s are stochastic. I.e.
#' we have probabilities \eqn{p_j}, and a vector of \eqn{\mu_j} of masspoints (one for each transition),
#' for each such \eqn{j}.
#'
#' So the full likelihood for an individual is \eqn{L_i = \sum_j p_j M_i(\mu_j)}.
#'
#' The \code{mphcrm()} function maximizes the likelihood \eqn{\sum_i L_i} over
#' \eqn{p_j}, \eqn{\mu_j}, and \eqn{\beta}.
#'
#' In addition to the covariates specified by a formula, a variable which records the duration of each
#' observation must be specified.
#'
#' In some datasets it is known that not all risks are present at all times. Like, losing your
#' job when you do not have one. In this case it should be specified which risks are present.
#'
#' The estimation starts out with one masspoint, maximizes the likelihood, tries to add another
#' point, and continues in this fashion.
#'
#' @param formula
#' A formula specifying the covariates.
#' In a formula like \code{d ~ x1 + x2 + ID(id) + D(dur) + C(job,alpha) + S(state)},
#' the \code{d} is the transition which is taken, coded as an integer where \code{0} means
#' no transition, and otherwise \code{d} is the number of the transition which is taken.
#' \code{d} can also be a factor, in which case the level which is no transition must be named
#' \code{"0"} or \code{"none"}. If \code{d} is an integer, the levels for transitions will be named
#' \code{"t1"}, \code{"t2"}, ..., and \code{"none"}.
#'
#' The \code{x1+x2} part is like in \code{\link{lm}}, i.e. ordinary covariates or factors.
#'
#' The \code{D()} specifies the covariate which holds the duration of each observation. The
#' transition in \code{d} is assumed to be taken at the end of this period.
#'
#' The \code{ID()} part specifies the covariate which holds the individual identification.
#'
#' The \code{S()} specifies the covariate which holds an index into the \code{risksets} list.
#'
#' These three special symbols are replaced with \code{I()}, so it is possible to have calculations
#' inside them.
#'
#' If the covariates differ among the transitions, one can specify covariates conditional on
#' the transition taken. If e.g. the covariates \code{alpha} and \code{x3} should only explain transition to
#' job, specify \code{C(job, alpha+x3)}. This comes in addition to the ordinary covariates.
#' The name "\code{job}" refers to a level in the factor \code{d}, the transition taken.
#' @param data
#' A data frame which contains the covariates. It must be sorted on individuals.
#' @param timing
#' character. The timing in the duration model. Can be one of
#' \itemize{
#' \item \code{"exact"}. The timing is exact, the transition occured at the end of the observation interval.
#' \item \code{"interval"}. The transition occured some time during the observation interval. This model
#' can be notoriously hard to estimate due to unfavourable numerics.
#' \item \code{"none"}. There is no timing, the transition occured, or not. A logit model is used.
#' }
#' @param risksets
#' A list of character vectors. Each vector is a list of transitions, i.e. which risks are present for
#' the observation. The elements of the vectors must be levels of the covariate which is the
#' left hand side of the \code{formula}.
#' If the state variable in the formula is a factor, the \code{risksets} argument should be a named list, with
#' names matching the levels of \code{state}. If all risks are present at all times, the \code{risksets}-argument
#' can be specified as \code{NULL}, or ignored.
#' @param subset
#' For specifying a subset of the dataset, similar to \code{\link{lm}}.
#' @param na.action
#' For handling of missing cases, similar to \code{\link{lm}}.
#' @param control
#' List of control parameters for the estimation. See \code{\link{mphcrm.control}}.
#' @return
#' A list, one entry for each iteration. Ordered in reverse order. Ordinarily you will be
#' interested in the first entry.
#' @details
#' The estimation starts by estimating the null-model, i.e. all parameters set to 0, only one
#' intercept for each transition is estimated.
#'
#' Then it estimates the full model, still with one intercept in each transition.
#'
#' After the initial model has been estimated, it tries to add a masspoint to the mixing
#' distribution, then estimates the model with this new distribution.
#'
#' The algorithm continues to add masspoints in this way until either it can not improve the likelihood, or
#' the number of iterations as specified in \code{control$iters} are reached.
#'
#' The result of every iteration is returned in a list.
#'
#' If you interrupt \code{mphcrm} it will catch the interrupt and return with the
#' estimates it has found so far. This behaviour can be switched off with \code{control$trap.interrupt=FALSE}.
#' @note
#' The algorithm is not fully deterministic. New points are searched for randomly, there
#' is no canonical order in which they can be found. It can happen that a point is found
#' early which makes the rest of the estimation hard, so it terminates early. In particular
#' when using interval timing. One should then
#' make a couple of runs to ensure they yield reasonably equal results.
#' @examples
#' data(durdata)
#' head(durdata)
#' risksets <- list(c('job','program'), c('job'))
#' Fit <- mphcrm(d ~ x1+x2 + C(job,alpha) + ID(id) + D(duration) + S(alpha+1), data=durdata,
#' risksets=risksets, control=mphcrm.control(threads=1,iters=2))
#' best <- Fit[[1]]
#' summary(best)
#' @seealso A description of the dataset is available in \code{\link{datagen}} and \code{\link{durdata}},
#' and in the vignette \code{vignette("whatmph")}
#' @export
mphcrm <- function(formula,data,risksets=NULL,
timing=c('exact','interval','none'),
subset, na.action, control=mphcrm.control()) {
timing <- match.arg(timing)
F <- formula
# create model frame
mf <- match.call(expand.dots=TRUE)
m <- match(c('formula','data','subset','na.action'), names(mf), 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(model.frame)
dataset <- mymodelmatrix(F,mf,risksets,parent.frame())
dataset$timing <- timing
id <- dataset$id
if(length(unique(id)) != length(rle(as.integer(id))$values)) {
stop('dataset must be sorted on ID')
}
# zero-based index of beginning of spells. padded with one after the last observation
dataset$spellidx <- c(0L,which(diff(as.integer(id))!=0),length(id))
dataset$nspells <- length(dataset$spellidx)-1L
pset <- makeparset(dataset,1)
# reset timer
if(is.null(control$callback)) control$callback <- function(...) {}
if(!is.null(control$cluster)) {prepcluster(dataset,control); on.exit(cleancluster(control$cluster))}
z <- pointiter(dataset,pset,control)
structure(z,call=match.call())
}
#' Control parameters for mphcrm
#'
#' @description
#' Modify the default estimation parameters for
#' \code{\link{mphcrm}}.
#' @param ...
#' parameters that can be adjusted. See the \code{vignette("whatmph")} for more details.
#' \itemize{
#' \item threads. integer. The number of threads to use. Defaults to \code{getOption('durmod.threads')}.
#' \item iters. integer. How many iterations should we maximally run. Defaults to 50.
#' \item ll.improve. numeric. How much must the log-likelihood improve from the last iteration before
#' termination. Defaults to 0.001.
#' \item newpoint.maxtime. numeric. For how many seconds should a global search for a new point
#' improving the likelihood be conducted before we continue with the best we have found. Defaults to
#' 120.
#' \item callback. A
#' user-specified \code{function(fromwhere, opt, dataset, control,
#' ...)} which is called after each optimization step. It can be
#' used to report what is happening, check whatever it wants, and
#' optionally stop the estimation by calling stop(). In this case,
#' \code{mphcrm()} will return with the most recently estimated set
#' of parameters. See the help on \code{\link{mphcrm.callback}} for
#' information on the argument.
#' \item trap.interrupt. logical. Should interrupts be trapped so that \code{mphcrm} returns gracefully?
#' In this case the program will continue. Defaults to \code{interactive()}.
#' \item cluster. Cluster specification from package \pkg{parallel} or \pkg{snow}.
#' }
#' @note
#' There are more parameters documented in the \code{vignette("whatmph")}. Some of them
#' can be useful. Instead of cluttering
#' the source code with constants and stuff required by various optimization routines, they
#' have been put in this control list.
#' @return
#' List of control parameters suitable for the \code{control}
#' argument of \code{\link{mphcrm}}.
#' @export
mphcrm.control <- function(...) {
ctrl <- list(iters=50,threads=getOption('durmod.threads'),gradient=TRUE, fisher=TRUE,
method='BFGS', gdiff=TRUE, minprob=1e-20, eqtol=1e-3, newprob=1e-4, jobname='mphcrm',
overshoot=0.001,
startprob=1e-4,
ll.improve=1e-3, e.improve=1e-3,
trap.interrupt=interactive(),
tspec='%T', newpoint.maxtime=120,
lowint=4,highint=2,
tol=1e-4,
method='BFGS',
itfac=20L,
fishblock=128L,
addmultiple=Inf,
callback=mphcrm.callback,
numgrad=FALSE,
hessian=FALSE,
cluster=NULL,
nodeshares=NULL)
args <- list(...)
fullargs <- names(ctrl)[pmatch(names(args), names(ctrl))]
bad <- names(args)[is.na(fullargs)]
fullargs[is.na(fullargs)] <- bad
if(length(bad)) {message("unknown control parameter(s): ", bad, collapse=' ')}
ctrl[fullargs] <- args
ctrl
}
#' Default callback function for mphcrm
#'
#' The default callback function prints a line whenever estimation with a masspoint is
#' completed.
#' @param fromwhere
#' a string which identifies which step in the algorithm it is called from. \code{fromwhere=='full'} means
#' that it is a full estimation of all the parameters. There are also other codes, when adding a point,
#' when removing duplicate points. When some optimization is completed it is called with the
#' return status from \code{\link{optim}} (and in some occasions from \code{\link[nloptr]{nloptr}}).
#'
#' @param opt
#' Typically the result of a call to \code{\link{optim}}.
#' @param dataset
#' The dataset in a structured form.
#' @param control
#' The \code{control} argument given to \code{\link{mphcrm}}
#' @param ...
#' other arguments
#' @details
#' If you write your own callback function it will replace the default function, but you can
#' of course call the default callback from your own callback function, and in addition print your
#' own diagnostics, or save the intermediate \code{opt} in a file, or whatever. You can even
#' stop the estimation by doing a \code{stop('<some message>')}, and \code{mphcrm} will return
#' with the estimates done so far, provided the control parameter \code{trap.interrupt=TRUE}.
#'
#' @note
#' Beware that
#' \code{control} contains a reference to the callback function, which may contain a reference
#' to the top-level environment, which may contain the full dataset. So if you save \code{control}
#' to file, you may end up saving the entire dataset.
#' @examples
#' callback <- function(fromwhere, opt, dataset, control, ...) {
#' # call the standard callback to print a diagnostic line
#' mphcrm.callback(fromwhere, opt, dataset, control, ...)
#' # print the distribution and two coefficients
#' if(fromwhere == 'full') {
#' print(round(mphdist(opt),6))
#' print(summary(opt)$coefs[c('job.alpha','job.x1'),])
#' }
#' }
#' @export
mphcrm.callback <- local({
lastfull <- Sys.time()
function(fromwhere, opt, dataset, control, ...) {
now <- Sys.time()
jobname <- control$jobname
if(fromwhere == 'removepoints') {
p <- a2p(opt$pargs)
bad <- list(...)[['remove']]
cat(jobname, format(Sys.time(),control$tspec), 'remove probs', sprintf(' %.2e',p[bad]),'\n')
} else if(fromwhere=='equalpoints') {
cat(jobname, format(Sys.time(), control$tspec), 'equal points combined:\n')
print(list(...)[['eqpoints']])
} else {
if(is.numeric(opt$convergence) && opt$convergence != 0) {
if(fromwhere != 'newpoint' || !(opt$convergence %in% c(2,5))) {
mess <- if(is.null(opt$message)) '' else opt$message
cat(sprintf('%s %s %s: convergence failure %.4f, %d %s\n',
jobname, format(now,control$tspec), fromwhere,
opt$value, opt$convergence, mess))
}
}
}
if(fromwhere != 'full') return()
tdiff <- timestr(difftime(now,lastfull,units='s'))
assign('lastfull',now,environment(sys.function())) #
rc <- if(!is.null(opt$fisher)) rcond(opt$fisher) else NA
grd <- if(!is.null(opt$gradient)) sqrt(mean(opt$gradient^2)) else NA
p <- a2p(opt$par$pargs)
cat(jobname,format(now,control$tspec),
sprintf('i:%d p:%d L:%.4f g:%.3g mp:%.5g rc:%.2g e:%.4f t:%s\n',
control$mainiter, length(opt$par$pargs)+1, opt$value, grd, min(p), rc, -sum(p*log(p)),
tdiff))
# if(rc < sqrt(.Machine$double.eps)) {message(sprintf('%s ***bad condition: %.2e',jobname,rc)); return(FALSE)}
}
})
pointiter <- function(dataset,pset,control) {
# optimize null model first
assign('lastfull',Sys.time(),envir=environment(mphcrm.callback))
arg0 <- flatten(pset)
arg0[] <- 0
pset <- unflatten(arg0)
LL0 <- function(arg,dataset,pset) {
for(i in seq_along(pset$parset)) {
pset$parset[[i]]$mu[] <- arg[i]
}
-mphloglik(dataset,pset,control=control)
}
opt0 <- optim(runif(length(pset$parset),-10,0), LL0,method='BFGS',dataset=dataset,pset=pset)
# message('zero model: '); print(opt0$par)
for(i in seq_along(pset$parset)) {
pset$parset[[i]]$mu[] <- opt0$par[i]
}
class(pset) <- 'mphcrm.pset'
opt0$value <- -opt0$value
opt0$par <- pset
opt0$mainiter <- 0
opt0$nobs <- dataset$nobs
opt0$nspells <- dataset$nspells
opt0$entropy <- 0
class(opt0) <- 'mphcrm.opt'
intr <- FALSE
prevopt <- opt0
iopt <- opt <- list(nullmodel=rescale(dataset,opt0))
control$callback('nullmodel',opt[['nullmodel']],dataset,control)
tryCatch(
{
i <- 0; improve <- TRUE; redo <- FALSE
while(improve && i < control$iters || redo) {
i <- i+1
control$mainiter <- i
newopt <- ml(dataset,pset,control)
newopt$mainiter <- i
unscaleopt <- rescale(dataset,newopt)
# insert into list
opt <- c(list(unscaleopt), opt)
names(opt)[1L] <- sprintf('iter%d',i)
control$callback('full',unscaleopt, dataset,control)
# check termination
redo <- attr(newopt$par,'badremoved') && !redo # redo once if bad point
ll.improve <- newopt$value - prevopt$value
e.improve <- abs(newopt$entropy - prevopt$entropy)
improve <- (ll.improve > control$ll.improve || e.improve > control$e.improve)
prevopt <- newopt
if(improve && i < control$iters || redo) pset <- addpoint(dataset,newopt$par,newopt$value,control)
}
if(!improve) opt <- opt[-1]
badset <- badpoints(opt[[1]]$par,control)
if(isTRUE(attr(badset, 'badremoved'))) {
opt[[1]]$par <- optfull(dataset,badset,control)
}
},
error=function(e) {
assign('intr',TRUE,environment(sys.function()))
assign('iopt',structure(opt,status=conditionMessage(e)),environment(sys.function()))
if(!control$trap.interrupt) stop(e)
warning(e, 'Error occured, returning most recent estimate')
},
interrupt=function(e) {
assign('intr',TRUE,environment(sys.function()))
assign('iopt',structure(opt,status='interrupted'),environment(sys.function()))
if(!control$trap.interrupt) stop('interrupt')
# try(control$callback('interrupt', iopt, dataset, control))
warning(e, "returning most recent estimate ")
message('... interrupt cleanup ...')
},
finally=if(intr) opt <- iopt)
structure(opt,class='mphcrm.list')
}
# Rescale parameters
#
#
rescale <- function(dataset, opt) {
for(tr in names(dataset$data)) {
offset <- attr(dataset$data[[tr]],'recode')$offset
scale <- attr(dataset$data[[tr]],'recode')$scale
opt$par$parset[[tr]]$pars[] <- opt$par$parset[[tr]]$pars*scale
muadj <- sum(opt$par$parset[[tr]]$pars*offset)
opt$par$parset[[tr]]$mu <- opt$par$parset[[tr]]$mu - muadj
}
if(is.null(opt$gradient)) return(opt)
scales <- unlist(lapply(dataset$data, function(dd) attr(dd,'recode')[[2]]))
nm <- names(opt$gradient)
scalevec <- rep(1,length(nm))
names(scalevec) <- nm
scalevec[names(scales)] <- scales
scalemat <- scalevec * rep(scalevec,each=length(scalevec))
opt$gradient <- opt$gradient / scalevec
if(!is.null(opt$numgrad)) opt$numgrad <- opt$numgrad / scalevec
if(!is.null(opt$fisher)) opt$fisher <- opt$fisher / scalemat
if(!is.null(opt$hessian)) opt$hessian <- opt$hessian / scalemat
opt
}
optprobs <- function(dataset,pset,control) {
val <- flatten(pset)
probpos <- grep('^pargs[0-9]*$',names(val))
pfun <- function(a) {
pset$pargs[] <- a
-mphloglik(dataset,pset,control=control)
}
gpfun <- function(a) {
val[probpos] <- a
-attr(mphloglik(dataset,unflatten(val),dogradient=TRUE, onlyprobs=TRUE, control=control),'gradient')[probpos]
}
# go all the way with the relative tolerance, we need to get out of bad places.
aopt <- optim(pset$pargs,pfun,gpfun,method='BFGS',
control=list(trace=0,REPORT=1,
maxit=max(200,control$itfac*length(pset$pargs)),
reltol=1e-14))
# message('probs:',sprintf(' %.7f',a2p(aopt$par)), ' value: ',aopt$value)
pset$pargs[] <- aopt$par
control$callback('prob',aopt,dataset,control)
pset
}
optdist <- function(dataset,pset,control) {
# find position of the distribution
val <- flatten(pset)
distpos <- grep('(\\.mu[0-9]+|pargs[0-9]*)$',names(val))
dfun <- function(a) {
val[distpos] <- a
-mphloglik(dataset,unflatten(val),control=control)
}
gdfun <- function(a) {
val[distpos] <- a
-attr(mphloglik(dataset,unflatten(val),dogradient=TRUE, onlydist=TRUE, control=control),'gradient')[distpos]
}
dopt <- optim(val[distpos],dfun,gdfun,method='BFGS',
control=list(trace=0,REPORT=1,maxit=max(200,
control$itfac*length(distpos)),
reltol=1e-14))
val[distpos] <- dopt$par
dopt$par <- unflatten(val)
control$callback('dist',dopt,dataset,control)
structure(dopt$par,value=-dopt$value)
}
newpoint <- function(dataset,pset,value,control) {
gdiff <- control$gdiff
pr <- a2p(pset$pargs)
newprob <- if(gdiff) 0 else control$newprob
newpr <- c( (1 - newprob)*pr, newprob)
newset <- makeparset(dataset,length(pr)+1,pset)
newset$pargs[] = p2a(newpr)
np <- length(newpr)
ntrans <- length(pset$parset)
fun <- function(mu,gdiff) {
for(i in seq_along(mu)) {
newset$parset[[i]]$mu[np] <- mu[i]
}
-mphloglik(dataset,newset,gdiff=gdiff,control=control)
}
# find ranges for the mus.
# stick to the mean +/- something in each dimension
med <- mphmedian(pset)
low <- log(med) - control$lowint
high <- log(med) + control$highint
args <- runif(length(low),0,1)*(high-low) + low
muopt <- nloptr::nloptr(args, fun,lb=low, ub=high, gdiff=gdiff,
opts=list(algorithm='NLOPT_GN_ISRES',
stopval=if(gdiff) -control$overshoot else -value-control$ll.improve,
xtol_rel=0, xtol_abs=0,
maxtime=control$newpoint.maxtime,
ranseed=sample(.Machine$integer.max,1),
maxeval=10000*length(args),population=20*length(args)))
if(!(muopt$status %in% c(0,2))) {
muopt$convergence <- muopt$status
muopt$value <- -muopt$objective
control$callback('newpoint',muopt,dataset,control)
# that one failed, try broader interval
newset$pargs[] <- p2a(c(pr,0))
gdiff <- TRUE
muopt <- nloptr::nloptr(muopt$solution, fun, lb=low-control$lowint, ub=high+control$highint,gdiff=TRUE,
opts=list(algorithm='NLOPT_GN_ISRES',stopval=-control$overshoot,
xtol_rel=0, xtol_abs=0,
maxtime=control$newpoint.maxtime,
ranseed=sample(.Machine$integer.max,1),
maxeval=10000*length(args),population=20*length(args)))
}
muopt$eval_f <- NULL # remove, it may contain a big environment, so unsuitable to save
muopt$value <- -muopt$objective
muopt$convergence <- muopt$status
control$callback('newpoint',muopt,dataset,control)
for(i in seq_along(newset$parset)) {
newset$parset[[i]]$mu[np] <- muopt$solution[i]
}
if(gdiff) {
# set a tiny probability to start with, 0 can't be used, the parameter is then -Inf
newset$pargs[] = p2a(c((1-control$startprob)*pr,control$startprob))
}
optprobs(dataset,newset,control)
}
badpoints <- function(pset,control) {
# are any masspoints equal?
np <- length(pset$pargs)+1
p <- a2p(pset$pargs)
okpt <- p > control$minprob
jobname <- control$jobname
for(i in seq_len(np)) {
mui <- sapply(pset$parset, function(pp) pp$mu[i])
for(j in seq_len(i-1)) {
if(!okpt[j]) next
muj <- sapply(pset$parset, function(pp) pp$mu[j])
if(max(abs(exp(muj) - exp(mui))/(exp(muj)+exp(mui))) < control$eqtol) {
mumat <- rbind(muj,mui)
rownames(mumat) <- c(j,i)
colnames(mumat) <- names(pset$parset)
control$callback('equalpoints',pset,NULL,control,eqpoints=cbind(prob=c(p[j],p[i]),exp(mumat)))
# cat(sprintf('%s %s points %d and %d are equal\n',jobname, format(Sys.time(),control$tspec),j,i),'\n')
# print()
okpt[i] <- FALSE
p[j] <- p[j]+p[i]
}
}
}
if(!all(okpt)) {
control$callback('removepoints',pset,NULL,control,remove=!okpt)
} else {
return(structure(pset,badremoved=FALSE))
}
p <- p[okpt]
p <- p/sum(p)
pset$pargs <- p2a(p)
for(i in seq_along(pset$parset)) {
pset$parset[[i]]$mu <- pset$parset[[i]]$mu[okpt]
}
structure(pset,badremoved=!all(okpt))
}
# some functions for use in optim
#negative log likelihood
LL <- function(args,skel, dataset,ctrl) {
pset <- unflatten(args,skel)
-mphloglik(dataset,pset,control=ctrl)
}
# gradient of negative log likelihood
gLL <- function(args,skel, dataset,ctrl) {
pset <- unflatten(args,skel)
-attr(mphloglik(dataset,pset,dogradient=TRUE,control=ctrl),'gradient')
}
# fisher matrix
fLL <- function(args, skel, dataset, ctrl) {
pset <- unflatten(args,skel)
-attr(mphloglik(dataset,pset,dofisher=TRUE,control=ctrl),'fisher')
}
# numerical gradient, slow
dLL <- function(args, skel, dataset, ctrl) {
numDeriv::grad(LL, args, dataset=dataset, skel=skel, ctrl=ctrl)
}
# hessian, numerically from gradient, slow
hLL <- function(args, skel, dataset, ctrl) {
numDeriv::jacobian(gLL,args,dataset=dataset,skel=skel,ctrl=ctrl)
}
optfull <- function(dataset, pset, control) {
if(!identical(control$method,'BFGS')) {
method <- control$method
if(!length(grep('^NLOPT_',method))) method <- paste('NLOPT_LD_',method,sep='')
args <- flatten(pset)
lb <- rep(-Inf,length(args))
ub <- rep(Inf,length(args))
fun <- function(args,skel,dataset,ctrl) {
pset <- unflatten(args,skel)
val <- mphloglik(dataset,pset,dogradient=TRUE,control=ctrl)
list(objective=-val, gradient=-attr(val,'gradient'))
}
nlopt <- nloptr::nloptr(args, fun, lb=lb, ub=ub,
skel=attr(args,'skeleton'), ctrl=control, dataset=dataset,
opts=list(algorithm=method,
maxeval=control$itfac*length(args),
ftol_abs=control$tol, xtol_rel=0))
nlopt$eval_f <- NULL
nlopt$par <- unflatten(nlopt$solution,attr(args,'skeleton'))
nlopt$value <- nlopt$objective
nlopt$convergence <- if(nlopt$status %in% c(1,3)) 0 else nlopt$status
return(nlopt)
}
args <- flatten(pset)
val <- mphloglik(dataset,pset,dogradient=TRUE,control=control)
opt <- optim(args,LL,gLL,method=control$method,
control=list(trace=0,REPORT=10,maxit=control$itfac*length(args),lmm=60,
reltol=1e-14),
skel=attr(args,'skeleton'), ctrl=control,dataset=dataset)
opt$par <- unflatten(opt$par)
opt
}
addpoint <- function(dataset,pset,value,control) {
# find a new point
# optimize probabilities
newset <- pset
control$minprob <- 0
pval <- value
repeat {
newset <- newpoint(dataset,newset,pval,control)
newset <- optdist(dataset,newset,control)
val <- attr(newset,'value')
if(val <= pval+abs(control$addmultiple)) break
pval <- val
}
newset
}
ml <- function(dataset,pset,control) {
opt <- optfull(dataset,pset,control)
# control$minprob <- 0
opt$par <- badpoints(opt$par, control)
sol <- opt$par
# reorder the masspoints, highest probability first
probs <- a2p(sol$pargs)
oo <- order(probs,decreasing=TRUE)
sol$pargs <- p2a(probs[oo])
sol$parset <- lapply(sol$parset, function(pp) {
nm <- names(pp$mu)
pp$mu <- pp$mu[oo]
names(pp$mu) <- nm
pp
})
skel <- attr(flatten(sol),'skeleton')
opt$value <- -opt$value
opt$par <- sol
p <- a2p(sol$pargs)
opt$entropy <- -sum(p*log(p))
# create a covariance matrix, we use the inverse of the (negative) fisher matrix, it's fast to compute
nm <- names(flatten(opt$par))
if(isTRUE(control$gradient)) {
opt$gradient <- gLL(flatten(opt$par),skel,dataset,control)
names(opt$gradient) <- nm
}
if(isTRUE(control$fisher)) {
opt$fisher <- fLL(flatten(opt$par),skel,dataset,control)
dimnames(opt$fisher) <- list(row=nm,col=nm)
}
if(isTRUE(control$numgrad)) {
opt$numgrad <- dLL(flatten(opt$par), skel, dataset, control)
names(opt$numgrad) <- nm
}
if(isTRUE(control$hessian)) {
opt$hessian <- hLL(flatten(opt$par),skel,dataset,control)
dimnames(opt$hessian) <- list(nm,nm)
}
opt$nobs <- dataset$nobs
opt$nspells <- dataset$nspells
structure(opt,class='mphcrm.opt')
}
#' Convert probability parameters to probabilities
#'
#' @description
#' \code{\link{mphcrm}} parametrizes the probabilities that it optimizes.
#' For \eqn{n+1} probabilities there are \eqn{n} parameters \eqn{a_j}, such that
#' probability \eqn{P_i = \frac{a_i}{\sum_j \exp(a_j)}}, where we assume that \eqn{a_0 = 0}.
#'
#' @param a
#' a vector of parameters
#'
#' @examples
#' # Draw 5 parameters
#' a <- rnorm(5)
#' a
#' # make 6 probabilities
#' p <- a2p(a)
#' p
#' # convert back
#' p2a(p)
#' @return \code{a2p} returns a vector probabilities with sum 1.
#' @export
a2p <- function(a) {b <- c(0,a); p <- exp(b)/sum(exp(b)); ifelse(is.na(p),1,p)}
#' @rdname a2p
#' @param p
#' a vector of probabilities with sum(p) = 1
#' @return
#' \code{p2a} returns a vector of parameters.
#' @export
p2a <- function(p) log(p/p[1])[-1]
makeparset <- function(dataset,npoints,oldset) {
parset <- lapply(dataset$data, function(tt) {
list(pars=structure(rep(0,nrow(tt$mat)),names=rownames(tt$mat)),
facs=lapply(tt$faclist, function(ff) structure(rep(0,nlevels(ff)), names=levels(ff))),
mu=structure(rep(0,npoints), names=paste('mu',1:npoints,sep=''))
)
})
newset <- as.relistable(list(parset=parset, pargs=rnorm(npoints-1)))
if(!missing(oldset)) {
oldset <- flatten(oldset)
newset <- flatten(newset)
common <- intersect(names(oldset),names(newset))
newset[common] <- oldset[common]
newset <- unflatten(newset)
}
class(newset) <- c('mphcrm.pset',class(newset))
newset
}
mymodelmatrix <- function(formula,mf,risksets,frame) {
#### Handle the specials ####
mt <- terms(formula, specials=c('ID','D', 'S', 'C'))
spec <- attr(mt,'specials')
# replace with I() in formula
F <- formula
splist <- lapply(unlist(spec), function(sp) attr(mt,'variables')[sp+1L])
remove <- function(f, r) update(f, as.formula(substitute(. ~ . - R, list(R=r[[1]]))))
pureF <- Reduce(remove, splist, F)
Slist <- splist[names(splist) %in% c('ID','D','S')]
Ilist <- lapply(Slist, function(ss) substitute(I(R)(), list(R=ss[[1]][[2]])))
addI <- function(f,r) update(f, as.formula(substitute(. ~ . + I(R), list(R=r[[1]][[2]]))))
IF <- Reduce(addI,Slist,pureF)
# then the C-parts, it needs special treatment
Clist <- splist[!(names(splist) %in% c('ID','D','S'))]
names(Clist) <- sapply(Clist, function(cc) eval(cc[[1L]], list(C=function(a,b) as.character(substitute(a)))))
cvars <- lapply(Clist, function(cc) eval(cc[[1]], list(C=function(a,b) substitute(b))))
addc <- function(f,r) update(f, as.formula(substitute(. ~ . + R, list(R=r))))
IF <- Reduce(addc,cvars,IF)
# Now, IF is a suitable formula with all specials removed.
mf[[2L]] <- IF
mf <- eval(mf,frame)
# Pick up the special covariates ID, D, and S
id <- as.integer(mf[[as.character(Ilist$ID)]])
if(length(Ilist$D))
duration <- as.numeric(mf[[as.character(Ilist$D)]])
else
duration <- rep(1,nrow(mf))
# and the repsonse, convert to factor with appropriate levels
orig.d <- model.response(mf)
df <- as.factor(orig.d)
# put the zero/none/null/0 level first for conversion to integer
# i.e. ensure that the no-transition is zero
L <- levels(df)
nlpos <- L %in% c('0','none')
ztrans <- sum(nlpos) > 0
if(sum(nlpos) > 1) stop("There can't be both a 0 and \"none\" in the outcome")
newlev <- if(ztrans) c(L[nlpos],L[!nlpos]) else L
df <- factor(df,levels=newlev)
if(!is.factor(orig.d)) levels(df) <- paste('t',levels(df),sep='')
# make a zero-based d for use in C
d <- as.integer(df)-ztrans
# analyze the formula to figure out which
# covariates explain which transitions
# split off factors
# first check that the conditional covariates specifies existing transitions.
tlevels <- levels(df)
if(ztrans) tlevels <- tlevels[-1]
if(length(Clist)) {
trnames <- names(Clist)
enm <- match(trnames,tlevels)
if(anyNA(enm)) stop(sprintf('transition to %s specified in conditional covariate, but no such transition exists.\n',
trnames[is.na(enm)]))
}
state <- NULL
if(length(Ilist$S)) {
state <- mf[[as.character(Ilist$S)]]
if(!is.factor(state)) state <- as.integer(state)
}
hasriskset <- !is.null(risksets)
if(hasriskset && is.null(state))
warning("Riskset is specified, but no state S(). All risks are assumed to be present.")
if(is.null(state)) hasriskset <- FALSE
if(hasriskset) {
if(is.factor(state)) {
m <- match(levels(state), names(risksets))
if(anyNA(m)) stop(sprintf('level %s of state is not in the riskset names\n',levels(state)[is.na(m)]))
# recode state to integer
state <- m[state]
} else {
srange <- range(state)
if(srange[1] != 1) stop('smallest state must be 1 (index into riskset)')
if(srange[2] > length(risksets)) {
stop(sprintf('max state is %d, but there are only %d risksets\n',srange[2],length(risksets)))
}
}
# recode risksets from transition level names to integers
risksets <- lapply(risksets, function(set) {
ind <- match(set,tlevels)
if(anyNA(ind)) stop(sprintf('Non-existent transition %s in risk set\n',set[is.na(ind)]))
ind
})
# check if transitions are taken which are not in the riskset
badtrans <- mapply(function(dd,r) dd != 0 && !(dd %in% r), d, risksets[state])
if(any(badtrans)) {
n <- which(badtrans)[1]
stop(sprintf("In observation %d(id=%d), a transition to %s is taken, but the riskset of the state(%d) does not allow it",
n, id[n], tlevels[d[n]], state[n]))
}
} else {
state <- 0L
risksets <- list()
}
transitions <- nlevels(df)-ztrans
cls <- attr(terms(mf),'dataClasses')
# for each transition, make a model matrix for the numeric covariates,
# and a list of factors
data <- lapply(seq_len(transitions), function(t) {
# name of this transition
thistr <- levels(df)[t+ztrans]
# find the formula for
# this transition. Add in all the conditional covariates for this transition
ff <- pureF
if(thistr %in% names(cvars)) {
# awkward, cvars may have duplicate names if C(foo, a+b) + C(foo,d+c)
for(i in (which(names(cvars) %in% thistr)))
ff <- update(ff, as.formula(bquote(. ~ . + .(cvars[[i]]))))
}
mt <- terms(ff,keep.order=TRUE)
fact <- attr(mt,'factors')
# now, filter out those terms which are neither factors, nor interactions with factors
keep <- colSums(fact) > 0
fact <- fact[,keep,drop=FALSE]
attr(mt,'term.labels') <- attr(mt,'term.labels')[keep]
keep <- unlist(lapply(colnames(fact), function(lab) {
contains <- rownames(fact)[fact[,lab] > 0]
if(!any(cls[contains]=='factor')) return(lab)
return(NULL)
}))
attr(mt,'factors') <- fact[,keep,drop=FALSE]
attr(mt,'intercept') <- 0
# create model matrix from the constructed terms
mat <- model.matrix(mt,mf)
# which observations are under risk for this transition?
if(length(risksets)) {
riskobs <- sapply(risksets[state], function(r) t %in% r)
} else riskobs <- seq_along(df)
# remove constant covariates
var0 <- apply(mat[riskobs,,drop=FALSE],2,var) == 0
if(any(var0)) {
message(sprintf('*** Covariate %s is constant for transition %s, removing\n',
colnames(mat)[var0], thistr))
mat <- mat[,!var0,drop=FALSE]
}
mat <- t(mat)
# then the factor related stuff
faclist <- lapply(colnames(fact), function(term) {
codes <- fact[,term]
contains <- rownames(fact)[codes > 0]
# are there any factors in this term?
isfac <- cls[contains] == 'factor'
if(!any(isfac)) return(NULL)
# interact all the factors in the term
# but some are with contrasts, some are not. The code is 2 if no contrast, 1 otherwise
# make a list of the factors
# Here's the interaction with a covariate.
x <- NULL
if(any(!isfac))
x <- as.matrix(Reduce('*',eval(as.call(lapply(c('list',contains[!isfac]),as.name)),
mf,environment(formula))))
if(is.null(x)) x <- numeric(0)
codes <- codes[contains[isfac]]
flist <- eval(as.call(lapply(c('list',contains[isfac]),as.name)), mf,environment(formula))
# remove a reference level if contrasts
fl <- mapply(function(f,useall) {
if(useall) return(f)
# reference is the first level that is observed
factor(f,exclude=levels(factor(f[riskobs]))[1])
}, flist, codes==2, SIMPLIFY=FALSE)
# interact them
iaf <- Reduce(`:`, fl)
# we then remove interaction levels which are never observed/constant in a state
# which can make this transition
f <- factor(iaf[riskobs])
xx <- if(length(x)) x[riskobs] else 1
flev <- levels(f)
if(length(flev) == 0) {
message(sprintf("*** Removing %s from transition %s, no variation\n",term,thistr))
return(NULL) #no more levels, discard entire factor
}
val <- numeric(length(f));
excl <- flev[sapply(flev, function(ll) {
idx <- which(f==ll)
if(!length(idx)) return(TRUE)
val[idx] <- if(length(xx)>1) xx[idx] else 1
var(val) == 0
})]
if(length(excl))
message(sprintf("*** Removing level %s from %s in transition %s, no variation\n",excl,term,thistr))
iaf <- factor(iaf,levels=flev,exclude=excl)
# iaf <- factor(iaf,levels=levels(factor(iaf[riskobs])))
structure(iaf,x=x)
})
names(faclist) <- colnames(fact)
faclist <- Filter(Negate(is.null), faclist)
# force covariates to be between 0 and 1.
# i.e. subtract minimum, divide by span
# the estimated betas must also be divided by span
# beta <- estbeta / scale
# the gradient and Fisher matrix must also be adjusted
# the intercept estimates, the mus, must be adjusted
# mu <- estmu - sum(rang[1,]*beta)
# all this happens in the function ml()
offset <- apply(mat,1,mean)
# scale <- 1/apply(mat,1,sd)
scale <- 1/apply(mat,1,function(x) max(abs(x-mean(x))))
scalemat <- (mat-offset)*scale
structure(list(mat=scalemat,faclist=faclist), recode=list(offset=offset,scale=scale))
})
names(data) <- tlevels
dataset <- list(data=data, d=d, nobs=nrow(mf), tlevels=tlevels,duration=duration,id=id,state=state,
risksets=risksets)
dataset
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.