DaMiR.FSort: Order features by importance, using RReliefF filter

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/Feature_Selection.R

Description

This function implements a procedure in order to rank features by their importance evaluated by RReliefF score.

Usage

1
DaMiR.FSort(data, df, fSample = 1)

Arguments

data

A transposed data frame of expression data, i.e. transformed counts by vst or rlog. A log2 transformed expression matrix is also accepted. Rows and Cols should be, respectively, observations and features

df

A data frame with class and known variables; at least one column with 'class' label must be included

fSample

Fraction of sample to be used for the implementation of RReliefF algorithm; default is 1

Details

This function is very time-consuming when the number of features is high. We observed there is a quadratic relationship between execution time and the number of features. Thus, we have also implemented a formula which allows the users to estimate the time to perform this step, given the number of features. The formula is:

T = 0.0011 * N^2 - 0.1822 * N + 27.092

where T = Time and N = Number of genes. We strongly suggest to filter out non informative features before performing this step.

Value

A data frame with two culmuns, where features are sorted by importance scores:

A plot with the first 50 features ordered by their importance.

Author(s)

Mattia Chiesa, Luca Piacentini

References

Marko Robnik-Sikonja, Igor Kononenko: An adaptation of Relief for attribute estimation in regression. In: Fourteenth International Conference on Machine Learning, 296-304, 1997

See Also

relief, DaMiR.FSelect, DaMiR.FReduct

Examples

1
2
3
4
5
6
# use example data:
data(data_reduced)
data(df)
# rank features by importance:
df.importance <- DaMiR.FSort(data_reduced[,1:10],
 df, fSample = 0.75)

BioinfoMonzino/DaMiRseq documentation built on Aug. 22, 2021, 3:11 p.m.