genfrail: Generate survival data

View source: R/genfrail.R

genfrailR Documentation

Generate survival data

Description

Generate clustered survival data from a shared frailty model, with hazard function given by

S(t)=\exp [-\Lambda_0(t) \omega_i \exp (\beta Z_{ij})]

where \Lambda_0 is the cumulative baseline hazard, \omega_i is the frailty value of cluster i, \beta is the regression coefficient vector, and Z_ij is the covariate vector for individual i in cluster j.

The baseline hazard can be specified by the inverse cumualative baseline hazard, cumulative baseline hazard, or simply the baseline hazard. Frailty values can be sampled from gamma, power variance function (PVF), log-normal, inverse Gaussian, and positive stable distributions.

Usage

genfrail(N = 300, K = 2, K.param = c(2, 0), beta = c(log(2)),
         frailty = "gamma", theta = c(2), 
         covar.distr = "normal", covar.param = c(0, 1), covar.matrix = NULL,
         censor.distr = "normal", censor.param = c(130, 15), 
         censor.rate = NULL, censor.time = NULL,
         lambda_0 = NULL, Lambda_0 = NULL, Lambda_0_inv = NULL, 
         round.base = NULL, control, ...)

Arguments

N

integer; number of clusters

K

integer, string, or vector; If an integer, the number of members in each cluster. If a string, the name of the distribution to sample the cluster sizes from. This can be one of: "poisson", "pareto", or "uniform". The K.param argument specifies the distribution parameters. If a vector, must be of length N and contains the integer size of each cluster.

K.param

vector of the cluster size distribution parameters if K is a string. If "possion", the vector should contain the rate and truncated value (see rtpois). If "pareto", the exponent, lower, and upper bounds (see rtzeta). If "uniform", the lower (noninclusive) and upper (inclusive) bounds.

beta

vector of regression coefficients.

frailty

string name of the frailty distribution. Can be one of: "gamma", "pvf", "lognormal", "invgauss", "posstab", or "none". See dgamma_r,dpvf_r, dlognormal_r, dinvgauss_r, posstab_r for the respective density functions. (Also see the *_c for C implementations of the respective density functions.)

theta

vector the frailty distribution parameters

covar.distr

string distribution to sample covariates from. Can be one of: "normal", "uniform", "zero"

covar.param

vector covariate distribution parameters.

covar.matrix

matrix with dimensions c(NK, length(beta)) that contains the desired covariates. If not NULL, this overrides covar.distr and covar.param.

censor.distr

string censoring distribution to use. Followup times are sampled from the censoring distribution to simulate non-informative right censorship. The censoring distribution can be one of: "normal", "lognormal", "uniform", "none".

censor.param

vector of censoring distribution parameters. For normal and lognormal censorship, this should be c(mu,sigma) where mu is the mean and sigma is the standard deviation (Note: this is still the mean and standard deviation for lognormal). For uniform censorship, the vector c(lower, upper) should specify the lower and upper bounds.

censor.rate

numeric value between 0 and 1 to specify the empirical censoring rate. The mean specified in the censor.param parameter is adjusted to achieve a desired censoring rate if censor.rate is given. Note that the standard deviation (the second parameter in censor.param) must still be specified so that the problem is identifiable. For uniform censorship, the interval given by c(lower, upper) is adjusted to achieve the desired censorship, while keeping the variance fixed (i.e., upper - lower does not change).

censor.time

vector of right-censorship times. This must have length N*K and specifies the right-censoring times of each observation. Note that this overrides all other censor.* params and cannot be used with variable cluster sizes.

lambda_0

function baseline hazard. Only one of lambda_0, Lambda_0, and Lambda_0_inv need to be specified. Passing the baseline hazard (lambda_0) is the most computationally expensive since this requires numerical integration inside a root-finding algorithm.

Lambda_0

function cumulative baseline hazard. This overrides lambda_0.

Lambda_0_inv

function inverse cumulative baseline hazard. This overrides both lambda_0 and Lambda_0.

round.base

numeric if specified, round the followup times to the nearest round.base

control

control parameters in the form of a genfrail.control object

...

additional arguments will be passed to genfrail.control

Value

A data.frame with row-observations is returned.

family

the cluster

rep

the member within each cluster

time

observed followup time

status

failure indicator

Z1...

covariates, where there are length(beta) Z columns

Author(s)

John V. Monaco, Malka Gorfine, and Li Hsu.

See Also

fitfrail

Examples

# Generate the same dataset 3 different ways

# Using the baseline hazard (least efficient)
set.seed(1234)
dat.1 <- genfrail(N = 300, K = 2, 
                  beta = c(log(2),log(3)),
                  frailty = "gamma", theta = 2,
                  lambda_0=function(t, tau=4.6, C=0.01) (tau*(C*t)^tau)/t)

# Using the cumulative baseline hazard
set.seed(1234)
dat.2 <- genfrail(N = 300, K = 2, 
                  beta = c(log(2),log(3)),
                  frailty = "gamma", theta = 2, 
                  Lambda_0 = function(t, tau=4.6, C=0.01) (C*t)^tau)

# Using the inverse cumulative baseline hazard (most efficient)
set.seed(1234)
dat.3 <- genfrail(N = 300, K = 2, 
                  beta = c(log(2),log(3)),
                  frailty = "gamma", theta = 2, 
                  Lambda_0_inv=function(t, tau=4.6, C=0.01) (t^(1/tau))/C)

# Generate data with PVF frailty, truncated Poisson cluster sizes, normal
# covariates, and 0.35 censorship from a lognormal distribution
set.seed(1234)
dat.4 <- genfrail(N = 100, K = "poisson", K.param=c(5, 1), 
                  beta = c(log(2),log(3)),
                  frailty = "pvf", theta = 0.3, 
                  covar.distr = "lognormal", 
                  censor.rate = 0.35) # Use the default baseline hazard

# Cluster sizes have size >= 2, summarized by
summary(dat.4)

# An oscillating baseline hazard
set.seed(1234)
dat.5 <- genfrail(lambda_0=function(t, tau=4.6, C=0.01, A=2, f=0.1) 
                              A^sin(f*pi*t) * (tau*(C*t)^tau)/t)

# Uniform censorship with 0.25 censoring rate
set.seed(1234)
dat.6 <- genfrail(N = 300, K = 2, 
                  beta = c(log(2),log(3)),
                  frailty = "gamma", theta = 2, 
                  censor.distr = "uniform", 
                  censor.param = c(50, 150), 
                  censor.rate = 0.25,
                  Lambda_0_inv=function(t, tau=4.6, C=0.01) (t^(1/tau))/C)

frailtySurv documentation built on Aug. 14, 2023, 1:06 a.m.