The function `variable.selection`

performs variable selection for binary classification.

1 | ```
variable.selection(X, Y, nvar=NULL)
``` |

`X` |
a (n x p) data matrix of predictors. X may be a matrix or a data frame. Each row corresponds to an observation and each column corresponds to a predictor variable. |

`Y` |
a vector of length n giving the classes of the n observations. The two classes must be coded as 1,2. |

`nvar` |
the number of variables to be returned. If |

The function `variable.selection`

orders the variables according to
the absolute value of the weight defining the first PLS
component. This ordering is equivalent to the ordering obtained with the
F-statistic and t-test with equal variances (Boulesteix, 2004).

For computational reasons, the function `variable.selection`

does not use
the pls algorithm, but the obtained ordering of the variables is exactly
equivalent to the ordering obtained using the PLS weights output by
`pls.regression`

.

A vector of length `nvar`

(or of length p if `nvar=NULL`

) containing the indices of
the variables to be selected. The variables are ordered from the best to the worst variable.

Anne-Laure Boulesteix (http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/ 020_professuren/boulesteix/index.html)

A. L. Boulesteix (2004). PLS dimension reduction for classification with microarray data,
Statistical Applications in Genetics and Molecular Biology **3**, Issue 1, Article 33.

A. L. Boulesteix, K. Strimmer (2007). Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics 7:32-44.

S. de Jong (1993). SIMPLS: an alternative approach to partial least squares
regression, Chemometrics Intell. Lab. Syst. **18**, 251–263.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ```
# load plsgenomics library
library(plsgenomics)
# generate X and Y (4 observations and 3 variables)
X<-matrix(c(4,3,3,4,1,0,6,7,3,5,5,9),4,3,byrow=FALSE)
Y<-c(1,1,2,2)
# select the 2 best variables
variable.selection(X,Y,nvar=2)
# order the 3 variables
variable.selection(X,Y)
# load the leukemia data
data(leukemia)
# select the 50 best variables from the leukemia data
variable.selection(leukemia$X,leukemia$Y,nvar=50)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.