mRMRfs: Feature Subset Selection using mRMR
In janusza/RmRMR: Feature Subset Selection Using the mRMR Approach

Description Usage Arguments Value Author(s) References Examples

An implementation of the mRMR feature selection freamework for the purpose of DISESOR project. In this version the random probe test is used as the main stopping criteria.

1
2
3

mRMRfs(dataT, target, dependencyF = corrDependency,
  randomnessTest = permutationTest, Nprobes = 1000,
  allowedRandomness = 0.01, nMax = 20, nCores = 1, ...)

`dataT`	a data table in `data.table` format. Columns of the table shoud correspond to features (attributes) and rows shoul represent cases (objects).
`target`	a vector of target values for data in `dataT`. For the default dependency function it is assumed that values of `target` are numeric, integer or binary.
`dependencyF`	a function for computing dependencies between attributes and between attributes and the decisions. The default is an absolute value of Pearson's correlation (function `corrDependency`).
`randomnessTest`	a function implementing a randomness test used as a stopping criteria. The default (`permutationTest`) is a permutation test based on random probes.
`Nprobes`	an integer specifying the number of probes to use in estimation of attribute irrelevance (for stopping criteria). The default is `1000`.
`allowedRandomness`	a numeric value specifying allowed attribute irrelevance probability. The default is `0.01`.
`nMax`	an integer specifying maximal number of features that can be returned by the function. The default is `20`.
`nCores`	an integer specifying the number of available processor cores for parallel computations using forking. The default is `1` for compatibility with Windows systems.
`...`	optional arguments (currently omitted).

an integer vector representing indexes of attributes from the selected subset.

Andrzej Janusz

Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 27(8):1226–1238

#############################################
data(methaneSampleData)

## an experiment on a sample from the data used in a data mining competition - 
## IJCRS'15 Data Challenge: Mining Data from Coal Mines 
## (https://knowledgepit.fedcsis.org/contest/view.php?id=109).
## The whole data set can be downloaded from the competition web page.

mrmrAttrs = mRMRfs(dataT = methaneData$methaneTraining,
                   target = methaneData$methaneTrainingLabels[, as.integer(V2 == 'warning')],
                   dependencyF = corrDependency)

mrmrAttrs

regModel = glm(targets ~.,
               cbind(methaneData$methaneTraining[, mrmrAttrs, with = FALSE],
                     targets = methaneData$methaneTrainingLabels[, as.integer(V2 == 'warning')]),
               family = gaussian(link = "identity"))

preds = predict(regModel, methaneData$methaneTest, type = "response")
caTools::colAUC(preds, methaneData$methaneTestLabels[, V2])