Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/Feature_Selection.R
This function identifies the class-correlated principal components (PCs) which are then used to implement a backward variable elimination procedure for the removal of non informative features.
1 2 | DaMiR.FSelect(data, df, th.corr = 0.6, type = c("spearman", "pearson"),
th.VIP = 3, nPlsIter = 1)
|
data |
A transposed data frame or a matrix of normalized expression data. Rows and Cols should be, respectively, observations and features |
df |
A data frame with known variables; at least one column with 'class' label must be included |
th.corr |
Minimum threshold of correlation between class and PCs; default is 0.6. Note. If df$class has more than two levels, this option is disable and the number of PCs is set to 3. |
type |
Type of correlation metric; default is "spearman" |
th.VIP |
Threshold for |
nPlsIter |
Number of times that bve_pls has to run. Each iteration produces a set of selected features, usually similar to each other but not exacly the same! When nPlsIter is > 1, the intersection between each set of selected features is performed; so that, only the most robust features are selected. Default is 1 |
The function aims to reduce the number of features to obtain
the most informative variables for classification purpose. First,
PCs obtained by principal component analysis (PCA) are correlated
with "class". The correlation threshold is defined by the user
in th.corr
argument. The higher is the correlation, the
lower is the number of PCs returned. Importantly, if df$class has
more than two levels, the number of PCs is automatically set to 3.
In a binary experimental setting, users should pay attention to
appropriately set the th.corr
argument because it will also
affect the total number of selected features that ultimately
depend on the number of PCs. The bve_pls
function
of plsVarSel
package is, then, applied.
This function exploits a backward variable elimination procedure
coupled to a partial least squares approach to remove those variable
which are less informative with respect to class. The returned
vector of variables is further reduced by the following
DaMiR.FReduct
function in order to obtain a subset of
non correlated putative predictors.
A list containing:
An expression matrix with only informative features.
A data frame with class and optional variables information.
Mattia Chiesa, Luca Piacentini
Tahir Mehmood, Kristian Hovde Liland, Lars Snipen and Solve Saebo (2011). A review of variable selection methods in Partial Least Squares Regression. Chemometrics and Intelligent Laboratory Systems 118, pp. 62-69.
bve_pls
DaMiR.FReduct
1 2 3 4 5 6 7 8 9 10 | # use example data:
data(data_norm)
data(df)
# extract expression data from SummarizedExperiment object
# and transpose the matrix:
t_data<-t(assay(data_norm))
t_data <- t_data[,seq_len(100)]
# select class-related features
data_reduced <- DaMiR.FSelect(t_data, df,
th.corr = 0.7, type = "spearman", th.VIP = 1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.