estimateMRH: Joint estimation of the hazard rate and covariate effects...

Description Usage Arguments Details Value Author(s) References Examples

Description

estimateMRH is a function used to jointly estimate the hazard rate for time-to-event data using the multi-resolution hazard method. Covariates can be included with or without the proportional hazards assumption.

Usage

1
2
3
4
5
estimateMRH(formula, data, M, numberBins, maxStudyTime, outfolder = "MRHresults", 
	prune = FALSE, prune.alpha = 0.05, prune.levels = NULL, burnIn = 50000, 
	maxIter = 5e+05, thin = 10, Rmp.init = NULL, a.init = 10, lambda.init, 
	beta.init = NULL, k.fixed, gamma.fixed, GR = FALSE, convGraphs = TRUE, 
	fix.burnIn = FALSE, fix.thin = FALSE, fix.max = FALSE, continue.chain = FALSE)

Arguments

formula

A formula object, with the response on the left of a ~ operator, and the covariates on the right. The response must be a survival object as returned by the Surv function. The non-proportional covariates can be included in the formula by using the function nph(). (See Example.)

data

A data frame in which to interpret the variables named in the formula.

M

The value of M that dictates the time "resolution" of the hazard estimate, approximated via 2^M hazard increments.

numberBins

An optional value that can be used instead of 'M' to dictate the number of bins. This value must be a power of 2.

maxStudyTime

Allows the user to set the end of the study time to something other than the maximum observed or censored failure time. This is typically used if a specific bin width is desired.

outfolder

The name given to the folder (subdirectory of the working directory) that stores the output for the MRH estimation routine (output includes the MCMC chains text file, the convergence graphs, and hazard rate graphs). Note that this folder will be automatically placed in the working directory. A pathname for the folder may also be specified, but the path must also be accessible from the working directory.

prune

If set to TRUE, two adjacent bins that are constructed via the same split parameter (Rm,p, or 1-Rm,p) are merged if the estimated hazard rate in these two bins are statistically similar. Pruning hypothesis tests (with the null hypothesis that the two bins have equal hazard rates) are performed using a modified Fisher's exact test, based on the 2 by 2 table composed of the number of failures within the time interval and at-risk patients at the end of the time interval, for each pair of adjacent bins sharing a split parameter. If the test fails to reject the null hypothesis at the alpha level, that split parameter (Rm,p) is set to 0.5. If a non-proportional covariate is used, each hazard rate is pruned separately. The user may also enter a pruning vector of their choice, which must contain a 0 or 1 for each Rmp in the model. For the non-proportional hazards model, the user-entered pruning indicator can either be a vector, which prunes all hazard rates the same, or it can be a matrix with each column containing a pruning vector for each hazard rate group, allowing the pruning for each hazard rate to be different. To make a user-defined pruning vector, please email Vanja Dukic <vanja.dukic@colorado.edu> or Yolanda Hagar <yolanda.hagar@colorado.edu>.

prune.alpha

The significance level for Fisher's exact test if the "prune" option is set to TRUE. The default is alpha = 0.05.

prune.levels

If pruning is used, the number of desired levels to be pruned (ranges from 1 to M). The default to prune all M levels. For multiple hazards, a single value or a vector with different pruning levels may be entered, with one value for each hazard.

burnIn

The number of iterations in the burn-in for the MCMC chain. Default is 50,000. See details.

maxIter

The maximum number of iterations in the MCMC routine. Default is 1,000,000.

thin

The thinning parameter that denotes the thinning of the MCMC chain to reduce autocorrelation. Default value is 10. See Details.

Rmp.init

The initial values for Rmp (split) parameters. If no values are entered, the default initial values is 0.5 for all Rmp.

a.init

The initial value for "a", a parameter in the (Gamma) prior for the baseline cumulative hazard H. If no value is entered, the default value is 10.

lambda.init

The initial value for "lambda", a parameter in the (Gamma) prior for the baseline cumulative hazard H. If no value is entered, the default value is -log(mean(delta))/a, where delta is the vector of censoring indicators. This initialization of lambda approximates the cumulative hazard rate divided by the value of a.

beta.init

The initial value(s) for the beta (covariate effect) parameter(s). If no value is entered, the default values are obtained from the Cox proportional hazards estimates.

k.fixed

Can either be a boolean indicator or a numeric value assigned to k, a parameter in the Beta prior for the split parameters Rmp. By default (or if set equal to TRUE), k is fixed at 0.5, implying zero prior correlation among the hazard increments. The user may also specify the fixed value of k by entering a numeric value for k.fixed greater than 0. When k is greater than 0.5, the increments are positively correlated a priori. Correspondingly, for k less than 0.5, the hazard increments will have negative prior correlation. If k.fixed is set to FALSE, then k will be sampled in the MCMC routine.

gamma.fixed

Can either be a boolean indicator or a numeric vector of length 2^M-1 for values assigned to the gamma parameters (parameters in the Beta prior for the split parameters Rmp), with one gamma.mp associated with each Rmp, and E(Rmp) = gamma.mp. The gamma parameter allows the user to a priori "center" the baseline hazard in each bin. If gamma.fixed is omitted or set equal to TRUE, the values are fixed at 0.5, assuming centering occurs in the middle of the bin. The user may also specify values (between 0 and 1) for each gamma. If gamma.fixed is set to FALSE, then each gamma parameter will be sampled in the MCMC routine.

GR

Denotes whether the Gelman-Rubin test statistic will be applied to the results of the MCMC chain. See Details.

convGraphs

Allows user to specify if graphs for convergence should be saved. Includes trace plots, density plots, and running means. Default is TRUE.

fix.burnIn

If set equal to TRUE, the specified burn-in number (i.e. the ""burnIn"" value) will be fixed throughout the routine, and will not be increased, regardless of the convergence diagnostics performed by the routine. See details.

fix.thin

If set equal to TRUE, the specified thinning value (i.e. the "thin" value) will be fixed throughout the routine, and will not be increased, regardless of the autocorrelation diagnostics performed by the routine. See details.

fix.max

If set equal to TRUE, the specified maximum number of iterations (i.e. the "maxIter" value) will be fixed throughout the routine, and will not be decreased, regardless of convergence diagnostics. See details.

continue.chain

If set to TRUE, the MCMC routine will continue running where the routine left off. The same output folder (“outfolder"") must be used, and parameters will be initialized with the last recorded sample from the MCMC chain in the previous run. Any thinning or burn-in specifications made in the model call will be ignored, and only values from the text file containing the chains will be used. The maximum number of new iterations is specified by “maxIter", and the new MCMC chains are appended to the existing text file in the output folder.

Details

This function returns the estimate of the hazard rate using the Multi-Resolution Hazard (MRH) method. The user must have the survival time and censoring variable for each subject. Parameters are estimated using MCMC.

After the first 100,000 MCMC iterations, the results are checked to determine an appropriate thinning value. The auto-correlation estimate, available via the coda package, is examined, and the first lag with an autocorrelation below 0.10 for any parameter is taken as the new thinning value. Default or user entered burn-in and thin values are taken as the minimum possible thinning values, and the routine only checks if the values should be greater (not smaller) than those specified by the user.

Every 100,000 iterations, the results are checked to determine if there is evidence of convergence and to determine an appropriate burn-in number. This is done through the Geweke diagnostic test using the geweke.diag() function available in the coda package. The geweke.diag() function returns the resulting z-score from the Geweke test for each parameter. If any of the z-scores are outside of the 0.005 range, then convergence is assumed not to have been reached. If convergence is not reached, 20,000 more iterations are burned, and the Geweke diagnostic test is performed again. This continues until there are fewer than 1,000 retained iterations left (i.e. retained after thinning). If there is no evidence of convergence, and if there are fewer than 1,000 retained iterations, the MCMC sampler runs another 100,000 times. This continues until the maximum number of iterations (maxIter) is reached, at which point the routine stops and the results are returned to the user. If the maximum number of iterations is reached before convergence and the user would like to continue the routine, the last parameter estimates in the output file can be used as initial values in the another call of the routine. The user has the option to fix the burn-in number, maximum number of iterations, and thinning value if fix.burnIn, fix.max, or fix.thin are set equal to TRUE. The convergence routine will not change or check these values if they are fixed.

If the user would like to run the routine multiple times on the same data set and test for convergence using the Gelman-Rubin diagnostic test, the user should specify GR = TRUE when calling the MRH estimation function. This way the initial parameter values will be randomly sampled to explore the parameter space. Additionally, this will fix the user-entered or default burn-in number, and thinning value, and the number of MCMC iterations will be fixed at the maximum number of iterations (maxIter) so that the diagnostic test can be performed. NOTE: If the user desires less than the default 1,000,000 maximum number of iterations and has set GR = TRUE, maxIter should be changed. Likewise, if fix.max is set equal to TRUE, the routine will run exactly maxIter number of times, regardless of convergence.

There may be instances in which the user may desire to combine some of the hazard rate bins, particularly if there is evidence that the hazard rate does not change from one bin to the next. In these cases, a pruning indicator can be used to denote which bins can be combined, with a "0" indicating the bin should not be combined with another bin, and a "1" indicating it should. While it is possible to specify this manually, the user can use the built-in pruning function by setting prune = TRUE, which performs Fisher's exact test to determine if bins can be combined based on observed failures and censoring in each bin. Based on the results of the hypothesis tests, certain bins may be pruned. If there are multiple hazards being calculated (i.e. the non-proportional hazards model is used), by default each hazard rate is pruned separately. The user may create their own pruning vector or matrix, although care must be taken in accounting for the tree-like structure of the Rmp parameters. For assistance with creation of a pruning vector or matrix, please contact Vanja Dukic <vanja.dukic@colorado.edu> or Yolanda Hagar <yolanda.hagar@colorado.edu>.

Value

estimateMRH returns a list with class MRH that contains the results of the MCMC algorithm. The routine also writes a file of the thinned and burned chain of MCMC iterations and saves pdf graphs of the hazard function with credible bounds in an MRH results folder (with the default title "MRHresults"). The components returned in the MRH fitted object are:

summary

A summary table containing the rounded estimates and central credible intervals for the estimates of the within-bin hazard rate (labeled “h.binj" for the jth bin) and the covariate effects. Estimates are calculated as the median of the thinned and burned MCMC chain, and bounds on the credible intervals are calculated as the 2.5% and 97.5% of the thinned and burned MCMC chain.

hazardRate

The estimate and lower and upper bounds of the 95% central credible interval for the hazard rate in each bin (labeled “h.binj" for the hazard rate of bin j).

beta

The estimate and lower and upper bounds of the 95% central credible interval for the covariate effects. This can include the log-ratios of the non-proportional hazards if the non-proportional assumption is used, which are labeled with “binj" following the covariate name to denote the log-ratio estimate within the jth bin.

SurvivalCurve

The estimate and lower and upper bounds of the 95% central credible interval for the survival curve. In the non-proportional hazards setting, separate survival curves and credible intervals are provided for each strata of the covariate.

CumulativeHazard

The estimate and lower and upper bounds of the 95% central credible interval for the cumulative hazard. In the non-proportional hazards setting, separate cumulative hazard curves and credible intervals are provided for each strata of the covariate.

d

The estimate and lower and upper bounds of the 95% central credible interval for the within-bin cumulative hazard (denoted as ‘d’) for each bin.

H

The estimate and lower and upper bounds of the 95% central credible interval for the baseline cumulative hazard H at the end of the study (denoted as ‘H00’ or 'H(tJ)' in manuscripts).

Rmp

The estimate and lower and upper bounds of the 95% central credible interval for the split parameters Rmp.

gamma

The estimate and lower and upper bounds of the 95% central credible interval for the gamma parameter (used in the prior for the Rmp split parameters). This is only returned to the user if k.fixed is set equal to FALSE when fitting the model.

k

The estimate and lower and upper bounds of the 95% central credible interval for the k parameters (used in the priors for the Rmp split parameters). This is only returned to the user if gamma.fixed is set equal to FALSE when fitting the model.

AIC

Akaike's Information Criterion, calculated as 2p-2ln(L), where k is the number of parameters in the model and ln(L) is the minimum of the likelihood values calculated at each chain iteration. The number of parameters ‘p’ is calculated as 2^M (one for each split parameter Rmp, and one for the cumulative hazard at H), plus 2 for a and lambda (parameters in the Gamma prior for H), and one for each covariate included under the proportional hazards assumption. If k and/or gamma (parameters in the prior for Rmp) are sampled, the number of estimated parameters is increased by 1 for k and 2^M-1 for gamma. If a covariate is included under the non-proportional hazards assumption, the number of estimated parameters (excluding any covariates included under the proportional hazards assumption) is multiplied by the number of strata in the non-proportional covariate.

BIC

Bayesian Information Criterion, calculated as -2ln(L) + p*log(n)

DIC

Deviance Information Criterion, calculated as .5*var(-2*ln(L)) + mean(-2ln(L)), where ln(L) is calculated at each retained chain iteration.

burnIn

The number of iterations burned during the MCMC routine.

thin

The thinning parameter used to account for autocorrelation.

TotalIters

The number of iterations the MCMC algorithm performed before convergence.

convergence

A TRUE/FALSE indicator denoting whether there is evidence that the algorithm converged based on methods used in the MRH routine.

gelman.rubin.used

Denotes whether the Gelman-Rubin option was used in the routine.

fix.thin

A TRUE/FALSE indicator denoting whether the thinning value was fixed by the user.

fix.burnin

A TRUE/FALSE indicator denoting whether the burn-in value was fixed by the user.

fix.max

A TRUE/FALSE indicator denoting whether the maximum value was fixed by the user.

InitialValues

The initial values of the parameters at the start of the MCMC chain.

gamma.fixed

Returns the fixed value of gamma or FALSE if the gamma parameters are sampled.

k.fixed

Returns the fixed value of k or FALSE if k is sampled.

runtime

The amount of time the routine ran in hours

outfolder

The pathname of the folder containing the MCMC chains and the graphical output produced by the routine.

maxStudyTime

The maximum time in the study (either as a censored observation or an observed failure)

Author(s)

Yolanda Hagar <yolanda.hagar@colorado.edu>

References

P. Bouman, J. Dignam, V. Dukic, XL. Meng. (2005) Bayesian multiresolution hazard model with application to an aids reporting delay study. Statistica Sinica, 102, 1145–1157.

Bouman, P., Dignam, J., Dukic, V. (2007), A multiresolution hazard model for multi-center survival studies: Application to Tamoxifen treatment in early stage breast cancer. JASA. 102, 1145–1157.

Dukic, V., Dignam, J. (2007), Bayesian hierarchical multiresolution hazard model for the study of time-dependent failure patterns in early stage breast cancer. Bayesian Analysis. 2, 591–610.

Chen, Y., Hagar, Y., Dignam, J., Dukic, V. (2014), Pruned Multiresolution Hazard (PMRH) models for time-to-event data. In review. Available upon request via email to vanja.dukic@colorado.edu.

http://amath.colorado.edu/faculty/vdukic/software/MRH.html

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
#####################################################
# NOTE: Examples may take a few minutes, so please be
# patient. In addition, warning messages about
# convergence may appear as more iterations are
# typically needed for chain convergence.  
	
########### PROPORTIONAL HAZARDS EXAMPLE ############
# Examine the NCCTG lung cancer data set (from the survival package),  
# and quantify how age, gender, and physician rated 
# Karnofsky performance scores affect survival times (in days).  
# Assume the hazards are proportional for all covariates.

data(cancer)

# Adjust "status" so that it is a 0/1 
# variable (currently it is 1 = censored, 2 = observed death)
cancer$censorvar = cancer$status - 1

# Run the estimateMRH routine.  Set the maximum 
# study time to 960 days, which makes each bin
# 120 days long.  This censors 0 extra subjects 
# (see FindBinWidth() for an example).  Save
# the output in a folder titled 'MRH_lung' 
# (default is 'MRHresults').
# Generally it is recommended to use a higher burn-in value, 
# thinning value, and maximum number
# of iterations, but for illustrative purposes 
# these values have been lowered.
# Note that the routine may produce a warning 
# message that the algorithm has not converged,
# as typically more iterations are needed for convergence.
# However, for the purposes of this example, the number
# of iterations is sufficient.
## Not run: 
fit.lung = estimateMRH(formula = Surv(time, censorvar) ~ 
	age + as.factor(sex) + ph.karno, data = cancer,
	M = 3, maxStudyTime = 960, burnIn = 200, maxIter = 1000, 
	thin = 1, outfolder = 'MRH_lung')
## End(Not run)


# See all items returned in the model fit
## Not run: 
	fit.lung

## End(Not run)
# See the main summary
## Not run: 
fit.lung$summary

## End(Not run)
# NOTE: If estimateMRH is run as a background job, 
# or if the output folder has been saved for use
# at a later instance, then  the fit can be calculated 
# using the as.MRH() and summary.MRH() functions.  
# See the those help pages or the vignette for 
# more information.

# Run the same model as above, but with pruning. 
# Save the output in a folder titled 'MRH_lung_prune'
## Not run: 
fit.lung.prune = estimateMRH(formula = Surv(time, censorvar) ~ 
	age + as.factor(sex) + ph.karno, data = cancer,
	M = 3, maxStudyTime = 960, burnIn = 200, maxIter = 1000, 
	thin = 1, prune = TRUE, outfolder = 'MRH_lung_prune')
## End(Not run)


########### NON-PROPORTIONAL HAZARDS EXAMPLE ############
# Examine the tongue data set (from the KMsurv package), and
# quantify how the rumor DNA profile 
# affects survival time (in weeks).
data(tongue)

# Fit the MRH model, including tumor type using 
# the non-proportional hazards model.  
# With 16 bins (M = 4), each bin represents 25 weeks.
# Generally it is recommended to use a higher burn-in value, 
# thinning value, and maximum number
# of iterations, but for illustrative purposes 
# these values have been lowered.
# Note that the routine may produce a warning 
# message that the algorithm has not converged,
# as typically more iterations are needed for convergence.
# However, for the purposes of this example, the number
# of iterations is sufficient.
## Not run: 
fit.tongue = estimateMRH(formula = Surv(time, delta) ~ 
	nph(type), data = tongue, M = 4, 
	burnIn = 200, maxIter = 2000, thin = 1, 
	outfolder = 'MRH_tongue_nph')
## End(Not run)

# Get the time-varying hazard ratios 
## Not run: 
	fit.tongue$beta

## End(Not run)

MRH documentation built on May 2, 2019, 11:10 a.m.