Iterative Bayesian Model Averaging: training step
Description
Classification and variable selection on microarray data. This is a multivariate technique to select a small number of relevant variables (typically genes) to classify microarray samples. This function performs the training phase. The data is assumed to consist of two classes. Logistic regression is used for classification.
Usage
1  iterateBMAglm.train (train.expr.set, train.class, p=100, nbest=10, maxNvar=30, maxIter=20000, thresProbne0=1)

Arguments
train.expr.set 
an 
train.class 
class vector for the observations (samples or experiments) in the training data. Class numbers are assumed to start from 0, and the length of this class vector should be equal to the number of rows in train.dat. Since we assume 2class data, we expect the class vector consists of zero's and one's. 
p 
a number indicating the maximum number of top univariate genes used in the iterative BMA algorithm. This number is assumed to be less than the total number of genes in the training data. A larger p usually requires longer computational time as more iterations of the BMA algorithm are potentially applied. The default is 100. 
nbest 
a number specifying the number of models of each size
returned to 
maxNvar 
a number indicating the maximum number of variables used in
each iteration of 
maxIter 
a number indicating the maximum of iterations of

thresProbne0 
a number specifying the threshold for the posterior
probability that each variable (gene) is nonzero (in
percent). Variables (genes) with such posterior
probability less than this threshold are dropped in
the iterative application of 
Details
The training phase consists of first
ordering all the variables (genes) by a univariate measure
called betweengroups to withingroups sumsofsquares (BSS/WSS)
ratio, and then iteratively applying the bic.glm
algorithm
from the BMA
package. In the first application of
the bic.glm
algorithm, the top maxNvar
univariate
ranked genes are used. After each application of the bic.glm
algorithm, the genes with probne0
< thresProbne0
are dropped, and the next univariate ordered genes are added
to the BMA window.
Value
An object of class bic.glm
returned by the last iteration
of bic.glm
. The object is a list consisting of
the following components:
namesx 
the names of the variables in the last iteration of

postprob 
the posterior probabilities of the models selected. 
deviance 
the estimated model deviances. 
label 
labels identifying the models selected. 
bic 
values of BIC for the models. 
size 
the number of independent variables in each of the models. 
which 
a logical matrix with one row per model and one column per variable indicating whether that variable is in the model. 
probne0 
the posterior probability that each variable is nonzero (in percent). 
postmean 
the posterior mean of each coefficient (from model averaging). 
postsd 
the posterior standard deviation of each coefficient (from model averaging). 
condpostmean 
the posterior mean of each coefficient conditional on the variable being included in the model. 
condpostsd 
the posterior standard deviation of each coefficient conditional on the variable being included in the model. 
mle 
matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model. 
se 
matrix with one row per model and one column per variable giving the standard error of each coefficient for each model. 
reduced 
a logical indicating whether any variables were dropped before model averaging. 
dropped 
a vector containing the names of those variables dropped before model averaging. 
call 
the matched call that created the bma.lm object. 
Note
The BMA
and Biobase
packages are required.
References
Raftery, A.E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111196, Cambridge, Mass.: Blackwells.
Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005) Bayesian Model Averaging: Development of an improved multiclass, gene selection and classification tool for microarray data. Bioinformatics 21: 23942402.
See Also
iterateBMAglm.train.predict
,
iterateBMAglm.train.predict.test
,
bma.predict
,
brier.score
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  library (Biobase)
library (BMA)
library (iterativeBMA)
data(trainData)
data(trainClass)
## training phase: select relevant genes
ret.bic.glm < iterateBMAglm.train (train.expr.set=trainData, trainClass, p=100)
## get the selected genes with probne0 > 0
ret.gene.names < ret.bic.glm$namesx[ret.bic.glm$probne0 > 0]
## show the posterior probabilities of selected models
ret.bic.glm$postprob
data (testData)
## get the subset of test data with the genes from the last iteration of bic.glm
curr.test.dat < t(exprs(testData)[ret.gene.names,])
## to compute the predicted probabilities for the test samples
y.pred.test < apply (curr.test.dat, 1, bma.predict, postprobArr=ret.bic.glm$postprob, mleArr=ret.bic.glm$mle)
## compute the Brier Score if the class labels of the test samples are known
data (testClass)
brier.score (y.pred.test, testClass)
