DaMiR.FSelect: Feature selection for classification

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/Feature_Selection.R

Description

This function identifies the class-correlated principal components (PCs) which are then used to implement a backward variable elimination procedure for the removal of non informative features.

Usage

1
2
3
4
5
6
7
8
DaMiR.FSelect(
  data,
  df,
  th.corr = 0.6,
  type = c("spearman", "pearson"),
  th.VIP = 3,
  nPlsIter = 1
)

Arguments

data

A transposed data frame or a matrix of normalized expression data. Rows and Cols should be, respectively, observations and features

df

A data frame with known variables; at least one column with 'class' label must be included

th.corr

Minimum threshold of correlation between class and PCs; default is 0.6. Note. If df$class has more than two levels, this option is disable and the number of PCs is set to 3.

type

Type of correlation metric; default is "spearman"

th.VIP

Threshold for bve_pls function, to remove non-important variables; default is 3

nPlsIter

Number of times that bve_pls has to run. Each iteration produces a set of selected features, usually similar to each other but not exacly the same! When nPlsIter is > 1, the intersection between each set of selected features is performed; so that, only the most robust features are selected. Default is 1

Details

The function aims to reduce the number of features to obtain the most informative variables for classification purpose. First, PCs obtained by principal component analysis (PCA) are correlated with "class". The correlation threshold is defined by the user in th.corr argument. The higher is the correlation, the lower is the number of PCs returned. Importantly, if df$class has more than two levels, the number of PCs is automatically set to 3. In a binary experimental setting, users should pay attention to appropriately set the th.corr argument because it will also affect the total number of selected features that ultimately depend on the number of PCs. The bve_pls function of plsVarSel package is, then, applied. This function exploits a backward variable elimination procedure coupled to a partial least squares approach to remove those variable which are less informative with respect to class. The returned vector of variables is further reduced by the following DaMiR.FReduct function in order to obtain a subset of non correlated putative predictors.

Value

A list containing:

Author(s)

Mattia Chiesa, Luca Piacentini

References

Tahir Mehmood, Kristian Hovde Liland, Lars Snipen and Solve Saebo (2011). A review of variable selection methods in Partial Least Squares Regression. Chemometrics and Intelligent Laboratory Systems 118, pp. 62-69.

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# use example data:
data(data_norm)
data(df)
# extract expression data from SummarizedExperiment object
# and transpose the matrix:
t_data<-t(assay(data_norm))
t_data <- t_data[,seq_len(100)]
# select class-related features
data_reduced <- DaMiR.FSelect(t_data, df,
th.corr = 0.7, type = "spearman", th.VIP = 1)

BioinfoMonzino/DaMiRseq documentation built on Aug. 22, 2021, 3:11 p.m.