preprocess: preprocess for microarray data

Description Usage Arguments Details Value Author(s) References Examples

View source: R/preprocess.R

Description

The function preprocess performs a preprocessing of microarray data.

Usage

1
2
preprocess(Xtrain, Xtest=NULL,Threshold=c(100,16000),Filtering=c(5,500),
		log10.scale=TRUE,row.stand=TRUE)

Arguments

Xtrain

a (ntrain x p) data matrix of predictors. Xtrain must be a matrix. Each row corresponds to an observation and each column to a predictor variable.

Xtest

a (ntest x p) matrix containing the predictors for the test data set. Xtest may also be a vector of length p (corresponding to only one test observation).

Threshold

a vector of length 2 containing the values (threshmin,threshmax) for thresholding data in preprocess. Data is thresholded to value threshmin and ceiled to value threshmax. If Threshold is NULL then no thresholding is done. By default, if the value given for Threshold is not valid, no thresholding is done.

Filtering

a vector of length 2 containing the values (FiltMin,FiltMax) for filtering genes in preprocess. Genes with max/min$<= FiltMin$ and (max-min)$<= FiltMax$ are excluded. If Filtering is NULL then no thresholding is done. By default, if the value given for Filtering is not valid, no filtering is done.

log10.scale

a logical value equal to TRUE if a log10-transformation has to be done.

row.stand

a logical value equal to TRUE if a standardisation in row has to be done.

Details

The pre-processing steps recommended by Dudoit et al. (2002) are performed. The default values are those adapted for Colon data.

Value

A list with the following components:

pXtrain

the (ntrain x p') matrix containing the preprocessed train data.

pXtest

the (ntest x p') matrix containing the preprocessed test data.

Author(s)

Sophie Lambert-Lacroix (http://membres-timc.imag.fr/Sophie.Lambert/) and Julie Peyre (http://www-lmc.imag.fr/lmc-sms/Julie.Peyre/).

References

Dudoit, S. and Fridlyand, J. and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data, Journal of the American Statistical Association, 97, 77–87.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# load plsgenomics library
library(plsgenomics)

# load Colon data
data(Colon)
IndexLearn <- c(sample(which(Colon$Y==2),27),sample(which(Colon$Y==1),14))

Xtrain <- Colon$X[IndexLearn,]
Ytrain <- Colon$Y[IndexLearn]
Xtest <- Colon$X[-IndexLearn,]

# preprocess data
resP <- preprocess(Xtrain= Xtrain, Xtest=Xtest,Threshold = c(100,16000),Filtering=c(5,500),
				log10.scale=TRUE,row.stand=TRUE)

# how many genes after preprocess ?
dim(resP$pXtrain)[2]

Example output

For any news related to the 'plsgenomics' package (update, corrected bugs), please check http://thoth.inrialpes.fr/people/gdurif/
C++ based sparse PLS routines will soon be available on the CRAN in the new 'fastPLS' package.
[1] 1157

plsgenomics documentation built on May 2, 2019, 4:51 p.m.