fs.pls: Feature Selection Using PLS

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/mt_fs.R

Description

Feature selection using coefficient of regression and VIP values of PLS.

Usage

1
2
3
4
  fs.pls(x,y, pls="simpls",ncomp=10,...)
  fs.plsvip(x,y, ncomp=10,...)
  fs.plsvip.1(x,y, ncomp=10,...)
  fs.plsvip.2(x,y, ncomp=10,...)

Arguments

x

A data frame or matrix of data set.

y

A factor or vector of class.

pls

A method for calculating PLS scores and loadings. The following methods are supported:

  • simpls: SIMPLS algorithm.

  • kernelpls: kernel algorithm.

  • oscorespls: orthogonal scores algorithm.

For details, see simpls.fit, kernelpls.fit and oscorespls.fit in package pls.

ncomp

The number of components to be used.

...

Arguments passed to or from other methods.

Details

fs.pls ranks the features by regression coefficient of PLS. Since the coefficient is a matrix due to the dummy multiple response variables designed for the classification (category) problem, the Mahalanobis distance of coefficient is applied to select the features. (Other ways, for example, the sum of absolute values of coefficient, or squared root of coefficient, can be used.)

fs.plsvip and fs.plsvip.1 carry out feature selection based on the the Mahalanobis distance and absolute values of PLS's VIP, respectively.

fs.plsvip.2 is similar to fs.plsvip and fs.plsvip.1, but the category response is not treated as dummy multiple response matrix.

Value

A list with components:

fs.rank

A vector of feature ranking scores.

fs.order

A vector of feature order from best to worst.

stats

A vector of measurements.

Author(s)

Wanchang Lin

See Also

feat.rank.re

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## prepare data set
data(abr1)
cls <- factor(abr1$fact$class)
dat <- abr1$pos
## dat <- abr1$pos[,110:1930]

## fill zeros with NAs
dat <- mv.zene(dat)

## missing values summary
mv <- mv.stats(dat, grp=cls) 
mv    ## View the missing value pattern

## filter missing value variables
## dim(dat)
dat <- dat[,mv$mv.var < 0.15]
## dim(dat)

## fill NAs with mean
dat <- mv.fill(dat,method="mean")

## log transformation
dat <- preproc(dat, method="log10")

## select class "1" and "2" for feature ranking
ind <- grepl("1|2", cls)
mat <- dat[ind,,drop=FALSE] 
mat <- as.matrix(mat)
grp <- cls[ind, drop=TRUE]   

## apply PLS methods for feature selection
res.pls      <- fs.pls(mat,grp, ncomp=4)
res.plsvip   <- fs.plsvip(mat,grp, ncomp=4)
res.plsvip.1 <- fs.plsvip.1(mat,grp, ncomp=4)
res.plsvip.2 <- fs.plsvip.2(mat,grp, ncomp=4)

## check differences among these methods
fs.order <- data.frame(pls      = res.pls$fs.order,
                       plsvip   = res.plsvip$fs.order,
                       plsvip.1 = res.plsvip.1$fs.order,
                       plsvip.2 = res.plsvip.2$fs.order)
head(fs.order, 20)

mt documentation built on Feb. 2, 2022, 1:07 a.m.

Related to fs.pls in mt...