View source: R/variable.selection.R
variable.selection | R Documentation |
The function variable.selection
performs variable selection for binary classification.
variable.selection(X, Y, nvar=NULL)
X |
a (n x p) data matrix of predictors. X may be a matrix or a data frame. Each row corresponds to an observation and each column corresponds to a predictor variable. |
Y |
a vector of length n giving the classes of the n observations. The two classes must be coded as 1,2. |
nvar |
the number of variables to be returned. If |
The function variable.selection
orders the variables according to
the absolute value of the weight defining the first PLS
component. This ordering is equivalent to the ordering obtained with the
F-statistic and t-test with equal variances (Boulesteix, 2004).
For computational reasons, the function variable.selection
does not use
the pls algorithm, but the obtained ordering of the variables is exactly
equivalent to the ordering obtained using the PLS weights output by
pls.regression
.
A vector of length nvar
(or of length p if nvar=NULL
) containing the indices of
the variables to be selected. The variables are ordered from the best to the worst variable.
Anne-Laure Boulesteix (https://www.ibe.med.uni-muenchen.de/mitarbeiter/professoren/boulesteix/index.html)
A. L. Boulesteix (2004). PLS dimension reduction for classification with microarray data, Statistical Applications in Genetics and Molecular Biology 3, Issue 1, Article 33.
A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 7:32-44.
S. de Jong (1993). SIMPLS: an alternative approach to partial least squares regression, Chemometrics Intell. Lab. Syst. 18, 251–263.
pls.regression
.
# load plsgenomics library
library(plsgenomics)
# generate X and Y (4 observations and 3 variables)
X<-matrix(c(4,3,3,4,1,0,6,7,3,5,5,9),4,3,byrow=FALSE)
Y<-c(1,1,2,2)
# select the 2 best variables
variable.selection(X,Y,nvar=2)
# order the 3 variables
variable.selection(X,Y)
# load the leukemia data
data(leukemia)
# select the 50 best variables from the leukemia data
variable.selection(leukemia$X,leukemia$Y,nvar=50)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.