Ranking the features according to their importance

Share:

Description

The rankFeatures function performs a Recursive Feature Elimination (RFE) on subsets of the feature matrix. For each subset the features are ranked according to the weight attributed by SVM at each round of elimination and the average rank of each feature over the subsets is returned. We recommand to save the object containing the ranked features for the following steps.

Usage

1
2
3
4
rankFeatures(data, cl = 1, halve.above = 100, valid.times = 10,
  kernel = "linear", cost = 1, gamma = 1,
  numcores = ifelse(.Platform$OS.type == "windows", 1, parallel::detectCores()
  - 1), file.prefix = NULL)

Arguments

data

data.frame containing the training set

cl

integer indicating the column number corresponding to the response vector that classify positive and negative regions (default = 1)

halve.above

During RFE, all the features are ranked at the first round and the half lowest ranked features (that contribute the least in the model) are removed for the next round. When the number of feauture is lower or equal to halve.above, the features are removed one by one. (default=100)

valid.times

Integer indicating how many times the training set will be split (default = 10). This number must be smaller than positive and negative sets sizes.

kernel

SVM kernel, a character string: "linear" or "radial". (default = "radial")

cost

The SVM cost parameter for both linear and radial kernels. If NULL (default), the function mcTune is run.

gamma

The SVM gamma parameter for radial kernel. If radial kernel and NULL (default), the function mcTune is run.

numcores

Number of cores to use for parallel computing (default: the number of available cores in the machine - 1)

file.prefix

A character string that will be used as a prefix for output file, if it is NULL (default), no file is writen.

Value

A 3-columns data frame with ranked features. First column contains the feature names, the second the original position of the feature in the feature.matrix and the third the average rank over the subsets.

Examples

1
2
3
4
5
data(crm.features)
cost <- 1
gamma <- 1
 #feature.ranking <- rankFeatures(data.granges=crm.features, cost=cost,gamma=gamma,
 #    kernel='linear', file.prefix = "test", halve.above=10)