Classification and variable selection on microarray data. This is a multivariate technique to select a small number of relevant variables (typically genes) to classify microarray samples. This function performs the training, and prediction steps. The data is assumed to consist of two classes. Logistic regression is used for classification.
1  iterateBMAglm.train.predict (train.expr.set, test.expr.set, train.class, p=100, nbest=10, maxNvar=30, maxIter=20000, thresProbne0=1)

train.expr.set 
an 
test.expr.set 
an 
train.class 
class vector for the observations (samples or experiments) in the training data. Class numbers are assumed to start from 0, and the length of this class vector should be equal to the number of rows in train.dat. Since we assume 2class data, we expect the class vector consists of zero's and one's. 
p 
a number indicating the maximum number of top univariate genes used in the iterative BMA algorithm. This number is assumed to be less than the total number of genes in the training data. A larger p usually requires longer computational time as more iterations of the BMA algorithm are potentially applied. The default is 100. 
nbest 
a number specifying the number of models of each size
returned to 
maxNvar 
a number indicating the maximum number of variables used in
each iteration of 
maxIter 
a number indicating the maximum of iterations of

thresProbne0 
a number specifying the threshold for the posterior
probability that each variable (gene) is nonzero (in
percent). Variables (genes) with such posterior
probability less than this threshold are dropped in
the iterative application of 
This function consists of the training phase and the prediction
phase. The training phase consists of first
ordering all the variables (genes) by a univariate measure
called betweengroups to withingroups sumsofsquares (BSS/WSS)
ratio, and then iteratively applying the bic.glm
algorithm
from the BMA
package. The prediction phase uses the variables
(genes) selected in the training phase to classify the samples
in the test set.
A vector consisting of the predicted probability that each test sample belongs to class 1 is returned.
The BMA
and Biobase
packages are required.
Raftery, A.E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111196, Cambridge, Mass.: Blackwells.
Yeung, K.Y., Bumgarner, R.E. and Raftery, A.E. (2005) Bayesian Model Averaging: Development of an improved multiclass, gene selection and classification tool for microarray data. Bioinformatics 21: 23942402.
iterateBMAglm.train
,
iterateBMAglm.train.predict.test
,
brier.score
1 2 3 4 5 6 7 8 9 10 11 12  library (Biobase)
library (BMA)
library (iterativeBMA)
data(trainData)
data(trainClass)
data (testData)
ret.vec < iterateBMAglm.train.predict (train.expr.set=trainData, test.expr.set=testData, trainClass, p=100)
## compute the Brier Score
data (testClass)
brier.score (ret.vec, testClass)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.