STpredictor_xvBLH: This function performs a cross validation on the full data...

Description Usage Arguments Value Author(s) References See Also Examples

Description

Using the full data provided by the user, this function splits the data set k times, into a smaller validation set, and a much bigger training set. The regression coefficients of the model are estimated from the training set and used to predict the survival times of the validation set. The patients can then be split into patients two groups around a cut off value also specified by the user.

Usage

1
2
STpredictor_xvBLH(geData, survData, k = 10, cut.off, file = paste(getwd(), "STpredictor.xv.BLH_results", sep = "/"), q = 1, s = 1, a = 2, b = 2, groups = 3, geneweights = NULL
, BLHs = NULL, method = "BFGS", noprior = 1, extras = list())

Arguments

geData

A matrix with the co-variate data of the full set of subjects. It is constructed with the co-variate in the columns and the subjects in the rows.Each cell corresponds to that rowth subject's column th co-variate's value.

survData

The survival data of the entire set of subjects. It takes on the form of a data frame with at least have the following columns “True_STs” and “censored”, corresponding to the observed survival times and the censoring status of the subjects consecutively. Censored patients are assigned a “1” while patients who experience an event are assigned “1”.

k

The number of times the cross-validation is.

cut.off

The value of the separator around which the patients are grouped according to their predicted survival times.

file

The path of the file to which the log file of this session is saved.

q

One of the two parameters on the prior distribution used on the weights (regression coefficients) in the model.

s

The second of the two parameters on the prior distribution used on the weights (regression coefficients) in the model.

a

The shape parameter for the gamma distribution used as a prior on the baseline hazards.

b

The scale parameter for the gamma distribution used as a prior on the baseline hazards.

groups

The number of partitions along the time axis for which a different baseline hazard is to be assigned. This number should be the same as the number of initial values passed for the baseline hazards in the beginning of the “weights_baselineH” argument

geneweights

A vector with the initial values of the weights(regression coefficients) for the co-variates. The default is NULL, in which case a vector of zeros the same length as ncol(geData) is created as the initial starting value.

BLHs

A vector with the initial values for the baseline hazards. Should be of length groups. The default is NULL, in which case a vector of length groups with values corresponding to the maximum of the gamma distributions with the given parameters is created.

method

The preferred optimization method. It can be one of the following: "Nelder-Mead": for the Nelder-Mead simplex algorithm. "L-BFGS-B": for the L-BFGS-B quasi-Newtonian method. "BFGS": for the BFGS quasi-Newtonian method. "CG": for the Conjugate Gradient decent method "SANN": for the simulated annealing algorithm.

noprior

An integer indicating the number of iterations to be done without assuming a prior on the regression coefficients.

extras

The extra arguments to passed to the optimization function optim. For further details on them, see the documentation for the optim function.

Value

predicted_STs

A data frame of the results for all patients, with the columns True_STs (the observed survival times), Predicted_STs (the predicted survival times), censored(the censoring status of the patient,absolute_error(the signless difference between the predicted and oberved survival times), PatientOrderValidation (The patient's number)

short_survivors

A data frame of results for the patients living less than the cut off value; with the columns True_STs (the observed survival times), Predicted_STs (the predicted survival times), censored(the censoring status of the patient,absolute_error(the signless difference between the predicted and oberved survival times), PatientOrderValidation (The patient's number)

long_survivors

A data frame with the results for the patients living at least as long as the cut off value; with columns True_STs (the observed survival times), Predicted_STs (the predicted survival times), censored(the censoring status of the patient,absolute_error(the sign-less difference between the predicted and observed survival times), PatientOrderValidation (The patient's number)

weights

A vector with the mean value of the regression coefficients obtained from the k training sets

baselineHs

A vector with the mean value of the baseline hazards returned from the k training sets

Author(s)

Douaa Mugahid

References

The basic model is based on the Cox regression model as first introduced by Sir David Cox in: Cox,D.(1972).Regression models & life tables. Journal of the Royal Society of Statistics, 34(2), 187-220. The extension of the Cox model to its stepwise form was adapted from: Ibrahim, J.G, Chen, M.-H. & Sinha, D. (2005). Bayesian Survival Analysis (second ed.). NY: Springer.// as well as Kaderali, Lars.(2006) A Hierarchial Bayesian Approach to Regression and its Application to Predicting Survival Times in Cancer Patients. Aachen: Shaker The prior on the regression coefficients was adopted from: Mazur, J., Ritter,D.,Reinelt, G. & Kaderali, L. (2009). Reconstructing Non-Linear dynamic Models of Gene Regulation using Stochastic Sampling. BMC Bioinformatics, 10(448).

See Also

STpredictor_BLH

Examples

1
2
3
4
data(Bergamaschi)
data(survData)
STpredictor_xvBLH(geData=Bergamaschi[1:20, 1:2], survData=survData[1:20, 9:10], k = 10, cut.off=3, file = paste(getwd(), "STpredictor.xv.BLH_results", sep = "/"), q = 1, s = 1, a = 2, b = 2, 
groups = 3, geneweights = NULL, BLHs = NULL, method = "CG", noprior = 1, extras = list(reltol=1))

RCASPAR documentation built on Nov. 8, 2020, 6:56 p.m.