phmm: Proportional Hazards Model with Mixed Effects
In phmm: Proportional Hazards Mixed-Effects Model

Description Usage Arguments Details Value References See Also Examples

Fits a proportional hazards regression model incorporating random effects. The function implements an EM algorithm using Markov Chain Monte Carlo (MCMC) at the E-step as described in Vaida and Xu (2000).

phmm(
  formula,
  data,
  subset,
  na.action = na.fail,
  Sigma = "identity",
  varcov = "diagonal",
  NINIT = 10,
  VARSTART = 1,
  MAXSTEP = 100,
  CONVERG = 90,
  Gbs = 100,
  Gbsvar = 1000,
  verbose = FALSE,
  maxtime = 120,
  random
)

`formula`	model formula for the fixed and random components of the model (as in `glmer`). An intercept is implicitly included in the model by estimation of the error distribution. As a consequence `-1` in the model formula does not have any effect. The left-hand side of the `formula` must be a `Surv` object.
`data`	optional data frame in which to interpret the variables occuring in the formulas.
`subset`	subset of the observations to be used in the fit.
`na.action`	function to be used to handle any `NA`s in the data. The user is discouraged to change a default value `na.fail`.
`Sigma`	initial covariance matrix for the random effects. Defaults to "identity".
`varcov`	constraint on `Sigma`. Currently only `"diagonal"` is supported.
`NINIT`	number of starting values supplied to Adaptive Rejection Metropolis Sampling (ARMS) algorithm.
`VARSTART`	starting value of the variances of the random effects.
`MAXSTEP`	number of EM iterations.
`CONVERG`	iteration after which Gibbs sampling size changes from Gbs to Gbsvar.
`Gbs`	initial Gibbs sampling size (until CONVERG iterations).
`Gbsvar`	Gibbs sampling size after CONVERG iterations.
`verbose`	Set to `TRUE` to print EM steps.
`maxtime`	maximum time in seconds, before aborting EM iterations. Defaults to 120 seconds.
`random`	The argument `random` is no longer used. Random components are are expressed in `formula`.

The proportional hazards model with mixed effects is equipped to handle clustered survival data. The model generalizes the usual frailty model by allowing log-linearl multivariate random effects. The software can only handle random effects from a multivariate normal distribution. Maximum likelihood estimates of the regression parameters and variance components is gotten by EM algorithm, with Markov chain Monte Carlo (MCMC) used in the E-step.

Care must be taken to ensure the MCMC-EM algorithm has converged, as the algorithm stops after MAXSTEP iterations. No convergence criteria is implemented. It is advised to plot the estimates at each iteration using the plot method. For more on MCMC-EM convergence see Booth and Hobart (1999).

The function produces an object of class "phmm" consisting of:

Gilks WR and Wild P. (1992) Adaptive rejection sampling for Gibbs sampling. Applied Statistics 41, pp 337-348.

Donohue, MC, Overholser, R, Xu, R, and Vaida, F (January 01, 2011). Conditional Akaike information under generalized linear and proportional hazards mixed models. Biometrika, 98, 3, 685-700.

Vaida F and Xu R. 2000. "Proportional hazards model with random effects", Statistics in Medicine, 19:3309-3324.

Gamst A, Donohue M, and Xu R (2009). Asymptotic properties and empirical evaluation of the NPMLE in the proportional hazards mixed-effects model. Statistica Sinica, 19, 997.

Xu R, Gamst A, Donohue M, Vaida F, and Harrington DP. 2006. Using Profile Likelihood for Semiparametric Model Selection with Application to Proportional Hazards Mixed Models. Harvard University Biostatistics Working Paper Series, Working Paper 43.

Booth JG and Hobert JP. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society, Series B 1999; 61:265-285.

survfit, Surv.

n <- 50      # total sample size
nclust <- 5  # number of clusters
clusters <- rep(1:nclust,each=n/nclust)
beta0 <- c(1,2)
set.seed(13)
#generate phmm data set
Z <- cbind(Z1=sample(0:1,n,replace=TRUE),
           Z2=sample(0:1,n,replace=TRUE),
           Z3=sample(0:1,n,replace=TRUE))
b <- cbind(rep(rnorm(nclust),each=n/nclust),rep(rnorm(nclust),each=n/nclust))
Wb <- matrix(0,n,2)
for( j in 1:2) Wb[,j] <- Z[,j]*b[,j]
Wb <- apply(Wb,1,sum)
T <- -log(runif(n,0,1))*exp(-Z[,c('Z1','Z2')]%*%beta0-Wb)
C <- runif(n,0,1)
time <- ifelse(T<C,T,C)
event <- ifelse(T<=C,1,0)
mean(event)
phmmd <- data.frame(Z)
phmmd$cluster <- clusters
phmmd$time <- time
phmmd$event <- event

fit.phmm <- phmm(Surv(time, event) ~ Z1 + Z2 + (-1 + Z1 + Z2 | cluster),
   phmmd, Gbs = 100, Gbsvar = 1000, VARSTART = 1,
   NINIT = 10, MAXSTEP = 100, CONVERG=90)
summary(fit.phmm)
plot(fit.phmm)