epifit: Model fitting function for epifit package
In epifit: Flexible Modelling Functions for Epidemiological Data Analysis

Description Usage Arguments Details Value References See Also Examples

This function maximizes an arbitrary likelihood including generalized linear models and Cox partial likelihood.

epifit(modelexpr = NULL, preexpr = "", timedepexpr = "",
  nullparam = NULL, data = NULL, subset = NULL, weight = NULL,
  na.action = NULL, opt = c("newrap", "BFGS", "CG", "Nelder-Mead", "SANN"),
  tol1 = 1e-08, tol2 = 1e-08, maxiter = 200, init = NULL,
  verbatim = 0, ...)

`modelexpr`	a character string or string vector specifying the model. See ‘Details’.
`preexpr`	a character string or string vector specifying R expressions executed before ‘modelexpr’. Multiple expressions separated by ‘;’ is allowed. See ‘Details’.
`timedepexpr`	a character string or string vector specifying R expressions executed during Cox regression at each time of event occurrence. Typical use is incorporating time-dependent variables. Not yet implemented completely.
`nullparam`	a numeric vector specifying null model. The default is zero value is streched to a vector of same length as parameters, which indicate a null model where all parameter values are equal to zero.
`data`	a data.frame in which to interpret the variables named in the formula, or in the subset and the weights argument.
`subset`	an expression indicating which subset of the rows of data should be used in the fit. All observations are included by default.
`weight`	vector of case weights. If weights is a vector of integers, then the estimated coefficients are equivalent to estimating the model from data with the individual cases replicated as many times as indicated by weights. Only supported ‘breslow’ and ‘efron’ ties specification in Cox regression models and other likelihood specifications.
`na.action`	a missing-data filter function. This is applied when data.frame is supplied as ‘data’ parameter. Default is `options()$na.action`.
`opt`	a character string specifying the method for optimization. When sQuotenewrap is specified, `nlm` function that uses Newton-type algorithm used for obtaining maximum likelihood estimate. For the rest of specifications (‘BFGS’ for quasi-Newton method, ‘CG’ for conjugate gradients method, ‘SANN’ for simulated annealing, ‘Nelder-Mead’ for Nelder and Mead simplex method), `optim` is used. The default is ‘newrap’.
`tol1`	a numeric value specifying `gtol` in nlm, `abstol` in optim.
`tol2`	a numeric value specifying `stol` in nlm, `reltol` in optim.
`maxiter`	a integer value specifying the maximum number of iterations. Defaults is 200.
`init`	a numeric vector or list specifying initial values for the parameters specified in the form of vector. To search for the best initial values for parameters, provide a numeric vector including candidate values for a parameter with the name of its parameter. A default value can be specified as a list component without name. For example, there are three parameters (beta0, beta1, beta2) in a model and specified as `init=list(0, beta1=c(0,1,2)`, initial parameter values tried is (0, 0, 0), (0, 1, 0), and (0, 2, 0), and the best initial value is used for estimation.
`verbatim`	a integer value from 0 (minimum) to 2 (maximum) controlling the amount of information printed during calculation.
`...`	for the arguments used in the inner functions (currently not used).

This function provides flexible model fitting. The main model specification is written in modelexpr. modelexpr consisits of two parts separated by ‘~’. The distribution is specified in the first part, and the second part includes variable name which follows the specified distribution in the first part. Available distributional specifications are ‘cox’(Cox partial likelihood), ‘pois’(Poisson distribution), ‘norm’(normal distribution), ‘binom’(binomial distribution), ‘nbinom’(negative binomial distribution), ‘gamma’(gamma distribution) and ‘weibull’(Weibull distribution). Options can be specified for some distribution specification after ‘/’ in the distribution specification part. One optional specification format is ‘optionname=value’. Multiple options separated by ‘,’ can also be specified.

For Cox regressions, time and status variable must be specified in parentheses like cox(time, status). Counting process type of input is also supported for time-dependent Cox regressions (cox(time1, time2, status)), and see examples. Some options are available for Cox regressions, and ‘efron’, ‘breslow’, ‘discrete’ and ‘average’ is available for tie handling method. ties(discrete) specification corresponds to ‘exact’ ties specification in coxph function, and ties(average) corresonds to ‘exact’ specification in SAS PHREG procecure. See references for further details. Strata option which specifies a variable indicating strata is also available in Cox regressions. Subset option which has same functinality as subset argument below is also available for Cox regressions and other distribution specifications. For other distribution specifications, parameters must be specified in parentheses. For poisson distribution, mean parameter must be specified as pois(mean). Note that each parameter specificaiton can be a variable or R equation. For other distributions, norm(mean, variance), binom(size, probability), nbinom(size, probability), gamma(shape, scale), weibull(shape, scale).

When distributions are specified, additional R expressions can be specified in preexpr argument. R expressions are parsed to make variable list. Variables which exist in data.frame or the global environment must be vector, and the rest of variables are regarded as parameters. If you define variable ‘mu’ in preexpr, you can use ‘mu’ in modelexpr argument. Refer Poisson regression examples below.

Time dependent covariate in Cox regression is supported experimentally. Time can be referred as time_inner_ at event time. Refer time dependent covariate example below.

a list containing the result of model fitting including parameter estimates, variance of parameter estimates, log likelihood and so on.

DeLong, D. M., Guirguis, G.H., and So, Y.C. (1994). Efficient computation of subset selection probabilities with application to Cox regression. Biometrika 81, 607-611.

Gail, M. H., Lubin, J. H., and Rubinstein, L. V. (1981). Likelihood calculations for matched case-control studies and survival studies with tied death times. Biometrika 68, 703-707.

AIC.epifit, print.epifit

library(survival)

# Make sample data
set.seed(123)
nsub <- 20
follow <- 5
x <- rnorm(nsub)
rate <- exp(-0.5 + x)
etime <- rweibull(nsub, 1, 1/rate)
status <- as.integer(etime < follow)
time <- pmin(follow, etime)
dat <- data.frame(status, x, time)

coxph(Surv(time, status)~x, data=dat)
modelexpr <- "cox(time,status)~exp(beta*x)"
epifit(modelexpr=modelexpr, data=dat)

glm(status ~ x + offset(log(time)), family=poisson(), data=dat)
preexpr <- "mu <- time*exp(beta0 + beta1*x)"
modelexpr <- "pois(mu) ~ status"
epifit(modelexpr=modelexpr, preexpr=preexpr, data=dat)

# The simplest test data set from coxph function
test1 <- list(time=c(4,3,1,1,2,2,3),
              status=c(1,1,1,0,1,1,0),
              x=c(0,2,1,1,1,0,0),
              sex=c(0,0,0,0,1,1,1))

# Cox regressions with strata
coxph(Surv(time, status) ~ x + strata(sex), data=test1)
modelexpr <- "cox(time,status)/strata=sex~exp(beta*x)"
epifit(modelexpr=modelexpr, data=test1)

# Tie specification example in Cox regressions
coxph(Surv(time, status) ~ x + strata(sex), data=test1, ties="breslow")
modelexpr <- "cox(time,status)/strata=sex,ties=breslow~exp(beta*x)"
epifit(modelexpr=modelexpr, data=test1)

# Average partial likelihood
modelexpr <- "cox(time,status)/strata=sex,ties=average~exp(beta*x)"
epifit(modelexpr=modelexpr, data=test1)

# Conditional logistic regression for matched case-control studies
# hypothetical data
conlog <- data.frame(strata=c(1,1,2,2,3,3,4,4,5,5), outcome=c(1,0,1,0,1,0,1,0,1,0),
                     cov=c(1,3,2,1,5,2,4,2,2,2))
# Make dummy survival time so that all the cases in a matched set have the same survival
# time value, and the corresponding controls are censored at later times
conlog <- cbind(conlog, dummy=(2 - conlog$outcome))
coxph(Surv(dummy, outcome)~cov + strata(strata), ties="exact", data=conlog)
modelexpr <- "cox(dummy,outcome)/ties=discrete,strata=strata~exp(beta*cov)"
epifit(modelexpr=modelexpr, data=conlog)


# Joint model example (for demonstrating technical specifications)
nsub <- 1000
follow <- 10
x <- rnorm(nsub)
dose <- rweibull(nsub, 0.5, 1/(2.84)^2)
rate <- exp(-1 + x)*(1 + 0.5*dose)

# Generate survival data
etime <- rweibull(nsub, 1, 1/rate)
status <- as.integer(etime < follow)
time <- pmin(follow, etime)
dat2 <- data.frame(event=status, py=time, x, dose, model=1)

# Generate person-year table (baseline is different)
py <- runif(nsub)*follow
rate2 <- exp(-0.5 + 0.5*x)*(1 + 0.5*dose)
event <- sapply(rate2*py, function(x){rpois(1, x)})
dat3 <- cbind(pytable(event, py, cbind(x,dose)), model=2)
dat4 <- rbind(dat2, dat3)

modelexpr <- c("cox(py,event)/subset=(model==1)~exp(beta0*x)*(1 + beta*dose)",
             "pois(py*exp(beta1 + beta2*x)*(1 + beta*dose))/subset=(model==2) ~ event")
epifit(modelexpr, data=dat4)

# Time dependent covariate example
id <- 1:8
group <- c(0, 0, 0, 0, 1, 1, 1, 1)
time <- c(4, 5, 7, 9, 6, 10, 11, 12)
event <- c(1, 1, 0, 1, 1, 1, 1, 0)
dat5 <- data.frame(id=id, group=group, time=time, event=event)
modelexpr <- "cox(time, event) ~ exp(beta1*group + beta2*t_g)"
# t_g is time-dependent variable created by using event time time_inner_ (created automatically)
timedepexpr <- "t_g <- time_inner_ * group"
epifit(modelexpr=modelexpr, timedepexpr=timedepexpr, data=dat5)

coxph(Surv(time, event) ~ group + tt(group),
      tt = function(x, t, ...){x * t},data=dat5)

cov0 <- data.frame(id=id, time=0, value=0*group)
cov4 <- data.frame(id=id, time=3.9, value=4*group)
cov5 <- data.frame(id=id, time=4.9, value=5*group)
cov6 <- data.frame(id=id, time=5.9, value=6*group)
cov9 <- data.frame(id=id, time=8.9, value=9*group)
cov10 <- data.frame(id=id, time=9.9, value=10*group)
cov11 <- data.frame(id=id, time=10.9, value=11*group)
tdata <- data.frame(id=id, group=group)
tdata <- tmerge(tdata, dat5, id, status=event(time, event))
tdata <- tmerge(tdata, cov0, id, t_g=tdc(time, value))
tdata <- tmerge(tdata, cov4, id, t_g=tdc(time, value))
tdata <- tmerge(tdata, cov5, id, t_g=tdc(time, value))
tdata <- tmerge(tdata, cov6, id, t_g=tdc(time, value))
tdata <- tmerge(tdata, cov9, id, t_g=tdc(time, value))
tdata <- tmerge(tdata, cov10, id, t_g=tdc(time, value))
tdata <- tmerge(tdata, cov11, id, t_g=tdc(time, value))
coxph(Surv(tstart, tstop, status) ~ group + t_g, data=tdata)
epifit("cox(tstart, tstop, status) ~ exp(beta1*group + beta2*t_g)", data=tdata)

Call:
coxph(formula = Surv(time, status) ~ x, data = dat)

    coef exp(coef) se(coef)     z      p
x 1.1614    3.1945   0.3702 3.138 0.0017

Likelihood ratio test=11.89  on 1 df, p=0.0005637
n= 20, number of events= 19 

Choice of initial value of parameters are:
beta 
   0 
Call:
epifit(modelexpr = modelexpr, data = dat)

     coef se(coef)    z      p
beta 1.16     0.37 3.14 0.0017

Call:  glm(formula = status ~ x + offset(log(time)), family = poisson(), 
    data = dat)

Coefficients:
(Intercept)            x  
    -0.6453       0.7150  

Degrees of Freedom: 19 Total (i.e. Null);  18 Residual
Null Deviance:	    20.99 
Residual Deviance: 13.67 	AIC: 55.67

Choice of initial value of parameters are:
beta0 beta1 
    0     0 
Call:
epifit(modelexpr = modelexpr, preexpr = preexpr, data = dat)

        coef se(coef)     z       p
beta0 -0.645    0.234 -2.76 0.00584
beta1  0.715    0.260  2.75 0.00599
Call:
coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1)

    coef exp(coef) se(coef)     z     p
x 0.8023    2.2307   0.8224 0.976 0.329

Likelihood ratio test=1.09  on 1 df, p=0.2971
n= 7, number of events= 5 

Choice of initial value of parameters are:
beta 
   0 
Call:
epifit(modelexpr = modelexpr, data = test1)

      coef se(coef)     z     p
beta 0.802    0.822 0.976 0.329
Call:
coxph(formula = Surv(time, status) ~ x + strata(sex), data = test1, 
    ties = "breslow")

    coef exp(coef) se(coef)     z    p
x 0.7357    2.0870   0.8045 0.915 0.36

Likelihood ratio test=0.94  on 1 df, p=0.3315
n= 7, number of events= 5 

Choice of initial value of parameters are:
beta 
   0 
Call:
epifit(modelexpr = modelexpr, data = test1)

      coef se(coef)     z    p
beta 0.736    0.804 0.915 0.36

Choice of initial value of parameters are:
beta 
   0 
Call:
epifit(modelexpr = modelexpr, data = test1)

      coef se(coef)     z     p
beta 0.946    0.973 0.973 0.331
Call:
coxph(formula = Surv(dummy, outcome) ~ cov + strata(strata), 
    data = conlog, ties = "exact")

      coef exp(coef) se(coef)     z     p
cov 0.5002    1.6490   0.5635 0.888 0.375

Likelihood ratio test=0.94  on 1 df, p=0.332
n= 10, number of events= 5 

Choice of initial value of parameters are:
beta 
   0 
Call:
epifit(modelexpr = modelexpr, data = conlog)

     coef se(coef)     z     p
beta  0.5    0.564 0.888 0.375

Choice of initial value of parameters are:
beta0  beta beta1 beta2 
    0     0     0     0 
Call:
epifit(modelexpr = modelexpr, data = dat4)

        coef se(coef)     z p
beta0  0.933   0.0419  22.3 0
beta   0.502   0.0498  10.1 0
beta1 -0.505   0.0215 -23.5 0
beta2  0.490   0.0158  31.1 0
Warning message:
In nlm(f = LogLikelihood, p = init, iterlim = maxiter, hessian = TRUE,  :
  NA/Inf replaced by maximum positive value

Choice of initial value of parameters are:
beta1 beta2 
    0     0 
Call:
epifit(modelexpr = modelexpr, timedepexpr = timedepexpr, data = dat5)

        coef se(coef)       z     p
beta1 -0.332    3.945 -0.0842 0.933
beta2 -0.215    0.628 -0.3418 0.733
Call:
coxph(formula = Surv(time, event) ~ group + tt(group), data = dat5, 
    tt = function(x, t, ...) {
        x * t
    })

             coef exp(coef) se(coef)      z     p
group     -0.3323    0.7173   3.9455 -0.084 0.933
tt(group) -0.2148    0.8067   0.6287 -0.342 0.733

Likelihood ratio test=2.5  on 2 df, p=0.2859
n= 8, number of events= 6 
Call:
coxph(formula = Surv(tstart, tstop, status) ~ group + t_g, data = tdata)

         coef exp(coef) se(coef)      z     p
group -0.3323    0.7173   3.9455 -0.084 0.933
t_g   -0.2148    0.8067   0.6287 -0.342 0.733

Likelihood ratio test=2.5  on 2 df, p=0.2859
n= 38, number of events= 6 

Choice of initial value of parameters are:
beta1 beta2 
    0     0 
Call:
epifit(modelexpr = "cox(tstart, tstop, status) ~ exp(beta1*group + beta2*t_g)", 
    data = tdata)

        coef se(coef)       z     p
beta1 -0.332    3.945 -0.0842 0.933
beta2 -0.215    0.628 -0.3418 0.733