The function preprocess
performs a preprocessing of microarray data.
1 2 
Xtrain 
a (ntrain x p) data matrix of predictors. 
Xtest 
a (ntest x p) matrix containing the predictors for the test data
set. 
Threshold 
a vector of length 2 containing the values (threshmin,threshmax) for
thresholding data in preprocess. Data is thresholded to value threshmin and ceiled to value
threshmax. If 
Filtering 
a vector of length 2 containing the values (FiltMin,FiltMax) for filtering genes
in preprocess. Genes with max/min$<= FiltMin$ and (maxmin)$<= FiltMax$ are excluded.
If 
log10.scale 
a logical value equal to TRUE if a log10transformation has to be done. 
row.stand 
a logical value equal to TRUE if a standardisation in row has to be done. 
The preprocessing steps recommended by Dudoit et al. (2002) are performed. The default values are those adapted for Colon data.
A list with the following components:
pXtrain 
the (ntrain x p') matrix containing the preprocessed train data. 
pXtest 
the (ntest x p') matrix containing the preprocessed test data. 
Sophie LambertLacroix (http://membrestimc.imag.fr/Sophie.Lambert/) and Julie Peyre (http://wwwlmc.imag.fr/lmcsms/Julie.Peyre/).
Dudoit, S. and Fridlyand, J. and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 97, 77–87.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  # load plsgenomics library
library(plsgenomics)
# load Colon data
data(Colon)
IndexLearn < c(sample(which(Colon$Y==2),27),sample(which(Colon$Y==1),14))
Xtrain < Colon$X[IndexLearn,]
Ytrain < Colon$Y[IndexLearn]
Xtest < Colon$X[IndexLearn,]
# preprocess data
resP < preprocess(Xtrain= Xtrain, Xtest=Xtest,Threshold = c(100,16000),Filtering=c(5,500),
log10.scale=TRUE,row.stand=TRUE)
# how many genes after preprocess ?
dim(resP$pXtrain)[2]

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.