mRMRfs: Feature Subset Selection using mRMR

Description Usage Arguments Value Author(s) References Examples

Description

An implementation of the mRMR feature selection freamework for the purpose of DISESOR project. In this version the random probe test is used as the main stopping criteria.

Usage

1
2
3
mRMRfs(dataT, target, dependencyF = corrDependency,
  randomnessTest = permutationTest, Nprobes = 1000,
  allowedRandomness = 0.01, nMax = 20, nCores = 1, ...)

Arguments

dataT

a data table in data.table format. Columns of the table shoud correspond to features (attributes) and rows shoul represent cases (objects).

target

a vector of target values for data in dataT. For the default dependency function it is assumed that values of target are numeric, integer or binary.

dependencyF

a function for computing dependencies between attributes and between attributes and the decisions. The default is an absolute value of Pearson's correlation (function corrDependency).

randomnessTest

a function implementing a randomness test used as a stopping criteria. The default (permutationTest) is a permutation test based on random probes.

Nprobes

an integer specifying the number of probes to use in estimation of attribute irrelevance (for stopping criteria). The default is 1000.

allowedRandomness

a numeric value specifying allowed attribute irrelevance probability. The default is 0.01.

nMax

an integer specifying maximal number of features that can be returned by the function. The default is 20.

nCores

an integer specifying the number of available processor cores for parallel computations using forking. The default is 1 for compatibility with Windows systems.

...

optional arguments (currently omitted).

Value

an integer vector representing indexes of attributes from the selected subset.

Author(s)

Andrzej Janusz

References

Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 27(8):1226–1238

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#############################################
data(methaneSampleData)

## an experiment on a sample from the data used in a data mining competition - 
## IJCRS'15 Data Challenge: Mining Data from Coal Mines 
## (https://knowledgepit.fedcsis.org/contest/view.php?id=109).
## The whole data set can be downloaded from the competition web page.

mrmrAttrs = mRMRfs(dataT = methaneData$methaneTraining,
                   target = methaneData$methaneTrainingLabels[, as.integer(V2 == 'warning')],
                   dependencyF = corrDependency)

mrmrAttrs

regModel = glm(targets ~.,
               cbind(methaneData$methaneTraining[, mrmrAttrs, with = FALSE],
                     targets = methaneData$methaneTrainingLabels[, as.integer(V2 == 'warning')]),
               family = gaussian(link = "identity"))

preds = predict(regModel, methaneData$methaneTest, type = "response")
caTools::colAUC(preds, methaneData$methaneTestLabels[, V2])

janusza/RmRMR documentation built on May 18, 2019, 2:39 p.m.