iterateBMAsurv.train: Iterative Bayesian Model Averaging: training

Description Usage Arguments Details Value Note References See Also Examples

Description

Survival analysis and variable selection on microarray data. This is a multivariate technique to select a small number of relevant variables (typically genes) to perform survival analysis on microarray data. This function performs the training phase. It repeatedly calls bic.surv from the BMA package until all variables are exhausted. The variables in the dataset are assumed to be pre-sorted by rank.

Usage

1
iterateBMAsurv.train (x, surv.time, cens.vec, curr.mat, stopVar=0, nextVar, nbest=10, maxNvar=25, maxIter=200000, thresProbne0=1, verbose = FALSE, suff.string="")

Arguments

x

Data matrix where columns are variables and rows are observations. The variables (columns) are assumed to be sorted using a univariate measure. In the case of gene expression data, the columns (variables) represent genes, while the rows (observations) represent samples.

surv.time

Vector of survival times for the patient samples. Survival times are assumed to be presented in uniform format (e.g., months or days), and the length of this vector should be equal to the number of rows in x.

cens.vec

Vector of censor data for the patient samples. In general, 0 = censored and 1 = uncensored. The length of this vector should equal the number of rows in x and the number of elements in surv.time.

curr.mat

Matrix of independent variables in the active bic.surv window. There can be at most maxNvar variables in the window at any given time.

stopVar

0 to continue iterations, 1 to stop iterations (default 0)

nextVar

Integer placeholder indicating the next variable to be brought into the active bic.surv window.

nbest

A number specifying the number of models of each size returned to bic.surv in the BMA package. The default is 10.

maxNvar

A number indicating the maximum number of variables used in each iteration of bic.surv from the BMA package. The default is 25.

maxIter

A number indicating the maximum iterations of bic.surv. The default is 200000.

thresProbne0

A number specifying the threshold for the posterior probability that each variable (gene) is non-zero (in percent). Variables (genes) with such posterior probability less than this threshold are dropped in the iterative application of bic.surv. The default is 1 percent.

verbose

A boolean variable indicating whether or not to print interim information to the console. The default is FALSE.

suff.string

A string for writing to file.

Details

The training phase consists of first ordering all the variables (genes) by a univariate measure such as Cox Proportional Hazards Regression, and then iteratively applying the bic.surv algorithm from the BMA package. In the first application of the bic.surv algorithm, the top maxNvar univariate ranked genes are used. After each application of the bic.surv algorithm, the genes with probne0 < thresProbne0 are dropped, and the next univariate ordered genes are added to the active bic.surv window.

Value

On the last iteration of bic.surv, four items are returned:

curr.mat

A vector containing the names of the variables (genes) from the final iteration of bic.surv

.

stopVar

The ending value of stopVar after all iterations.

nextVar

The ending value of nextVar after all iterations.

An object of class bic.surv resulting from the last iteration of bic.surv. The object is a list consisting of the following components:

namesx

the names of the variables in the last iteration of bic.surv.

postprob

the posterior probabilities of the models selected.

label

labels identifying the models selected.

bic

values of BIC for the models.

size

the number of independent variables in each of the models.

which

a logical matrix with one row per model and one column per variable indicating whether that variable is in the model.

probne0

the posterior probability that each variable is non-zero (in percent).

postmean

the posterior mean of each coefficient (from model averaging).

postsd

the posterior standard deviation of each coefficient (from model averaging).

condpostmean

the posterior mean of each coefficient conditional on the variable being included in the model.

condpostsd

the posterior standard deviation of each coefficient conditional on the variable being included in the model.

mle

matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model.

se

matrix with one row per model and one column per variable giving the standard error of each coefficient for each model.

reduced

a logical indicating whether any variables were dropped before model averaging.

dropped

a vector containing the names of those variables dropped before model averaging.

call

the matched call that created the bma.lm object.

Note

The BMA package is required.

References

Annest, A., Yeung, K.Y., Bumgarner, R.E., and Raftery, A.E. (2008). Iterative Bayesian Model Averaging for Survival Analysis. Manuscript in Progress.

Raftery, A.E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.

Volinsky, C., Madigan, D., Raftery, A., and Kronmal, R. (1997) Bayesian Model Averaging in Proprtional Hazard Models: Assessing the Risk of a Stroke. Applied Statistics 46: 433-448.

Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005) Bayesian Model Averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21: 2394-2402.

See Also

iterateBMAsurv.train.wrapper, iterateBMAsurv.train.predict.assess, singleGeneCoxph, predictBicSurv, trainData, trainSurv, trainCens, testData

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
library(BMA)
library(iterativeBMAsurv)
data(trainData)
data(trainSurv)
data(trainCens)
data(testData)

## Training data should be pre-sorted before beginning

## Initialize the matrix for the active bic.surv window with variables 1 through maxNvar
maxNvar <- 25
curr.mat <- trainData[, 1:maxNvar]
nextVar <- maxNvar + 1

## Training phase: select relevant genes using nbest=5 for fast computation
ret.bic.surv <- iterateBMAsurv.train (x=trainData, surv.time=trainSurv, cens.vec=trainCens, curr.mat, stopVar=0, nextVar, nbest=5, maxNvar=25)

# Apply bic.surv again using selected genes
ret.bma <- bic.surv (x=ret.bic.surv$curr.mat, surv.t=trainSurv, cens=trainCens, nbest=5, maxCol=(maxNvar+1))

## Get the matrix for genes with probne0 > 0
ret.gene.mat <- ret.bic.surv$curr.mat[ret.bma$probne0 > 0]

## Get the gene names from ret.gene.mat
selected.genes <- dimnames(ret.gene.mat)[[2]]

## Show the posterior probabilities of selected models
ret.bma$postprob

## Get the subset of test data with the genes from the last iteration of
## 'bic.surv'
curr.test.dat <- testData[, selected.genes]

## Compute the predicted risk scores for the test samples
y.pred.test <- apply (curr.test.dat, 1, predictBicSurv, postprob.vec=ret.bma$postprob, mle.mat=ret.bma$mle)

Example output

Loading required package: BMA
Loading required package: survival
Loading required package: leaps
Loading required package: robustbase

Attaching package: 'robustbase'

The following object is masked from 'package:survival':

    heart

Loading required package: inline
Loading required package: rrcov
Scalable Robust Estimators with High Breakdown Point (version 1.4-4)

Loading required package: splines
17: Explored up to variable # 100
Iterate bic.surv is done!
Selected genes:
 [1] "X31687" "X33840" "X31242" "X16948" "X31471" "X17154" "X28531" "X19241"
 [9] "X26146" "X17804" "X27332" "X17241" "X32212" "X29911" "X33558" "X33013"
[17] "X27884" "X33706" "X16817" "X31968" "X30209" "X29650" "X25054" "X16988"
[25] "X32904"
Posterior probabilities of selected genes:
 [1] 100.0  47.5  47.3   2.4  38.5  28.5  40.1  96.7   2.8   1.7   0.0  59.9
[13]   0.0   0.0  10.0   0.0   2.5  58.3   2.1  98.8  28.4   7.1  95.1   0.0
[25] 100.0
 [1] 0.075782322 0.068183539 0.062240254 0.056227073 0.045761712 0.044794588
 [7] 0.043328132 0.042831731 0.039567629 0.039285627 0.038997242 0.034867824
[13] 0.032225236 0.030210326 0.026904418 0.025508701 0.025052995 0.024869256
[19] 0.021711946 0.021061750 0.020689119 0.020114454 0.017345536 0.017179713
[25] 0.017104052 0.015294500 0.014059561 0.014050900 0.012658966 0.010182444
[31] 0.008768581 0.007844758 0.007014883 0.006609877 0.006555310 0.005115046
There were 50 or more warnings (use warnings() to see the first 50)

iterativeBMAsurv documentation built on Nov. 8, 2020, 11:10 p.m.