npdf_cox: Cox model for grouped survival data with nonparametric...

Description Usage Arguments Value References Examples

Description

This function fits a Cox proportional hazards model to grouped survival data, where the shared group-specific frailties have a nonparametric discrete distribution. An EM algorithm is used to maximise the marginal partial likelihood.

Usage

1
2
3
npdf_cox(formula, groups, data, K = 2, estK = TRUE,
  criterion = "BIC", eps_conv = 10^-4, se_method = c("louis",
  "exact"))

Arguments

formula

A formula expression in conventional R linear modelling syntax. The response must be a survival time constructed by the Surv function from the survival package, and any covariates are given on the right-hand side. For example,

Surv(time, dead) ~ age + sex

Only Surv objects of type="right" are supported, corresponding to right-censored observations.

groups

name of the variable which indicates the group in which each individual belongs (e.g. the hospital that the individual is treated in). This can be integer, factor or character. The name should be unquoted.

data

A data frame in which to find variables supplied in formula. If not given, the variables should be in the working environment.

K

initial number of latent populations, or clusters of groups which have the same discrete frailty.

estK

If TRUE (the default) then multiple models are fitted with number of latent groups ranging from 1 to K. The "best fitting" model according to the criterion specified in criterion is then highlighted when printing the object returned by this function.

If FALSE then the number of latent populations is fixed at K.

criterion

Criterion used to choose the best-fitting model to highlight when estK is TRUE.

"Laird" for the Laird criterion. Running from K latent populations to 1 latent population, this criterion selects the maximum number of latent populations that are non empty as the best K.

"AIC" for Akaike's information criterion.

"BIC" for the Bayesian information criterion (the default).

eps_conv

convergence tolerance for the EM algorithm.

se_method

Method or methods used to compute the standard errors. A character vector containing one or more of the following:

"louis" The method of Louis (1982) based on an approximation to the information matrix.

"exact" In this method the standard errors are computed directly from the observed information matrix obtained by analytic differentiation.

"numeric" This method uses numerical differentiation to approximate the information matrix, and is substantially slower.

By default this is c("louis","exact") because these two methods are equally fast. So that SEs from both these two methods are calculated and presented. Set se_method=NULL to compute no standard errors.

Value

If estK=FALSE this returns a list of class npdf which includes information about the model fit, including estimates and standard errors.

If estK=TRUE this returns a list of class npdflist. This has an element models that contains a list of length K, with one component of class npdf for each fitted model.

comparison is a matrix composed of K rows and 5 columns (K, K_fitted, llik, AIC, BIC). K_fitted is the number of estimated latent populations, which can be equal to or less than K. llik stands for log-likelihood, AIC for Akaike Information Criterion and BIC for Bayesian Information Criterion.

Kopt is optimal model under each criterion.

criterion is the preferred criterion.

In either case, the data frame used for the fit (the "model frame") is appended as a component mf.

References

Gasperoni, F., Ieva, F., Paganoni, A.M., Jackson, C. and Sharples, L. (2018). Nonparametric frailty Cox models for hierarchical time-to-event data. Biostatistics.

Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73(364), 805–811.

Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B, 44(2), 226–233.

Examples

1
2
3
test <- npdf_cox( Surv(time, status) ~ x, groups=family, data=weibdata2030, K = 4, eps_conv=10^-4)
test    # optimal model (by all criteria) has 2 latent populations
test$models[[1]] # examine alternative model with 1 latent population

discfrail documentation built on May 2, 2019, 9:58 a.m.