overlapping: Measures of overlapping
In ECoL: Complexity Measures for Supervised Problems

Description Usage Arguments Details Value References See Also Examples

View source: R/overlapping.R

Classification task. The overlapping measures evaluate how informative the available features are to separate the classes. If there is at least one very discriminative feature in the dataset, the problem can be considered simpler than if there is no such an attribute.

overlapping(...)

## Default S3 method:
overlapping(x, y, measures = "all",
  summary = c("mean", "sd"), ...)

## S3 method for class 'formula'
overlapping(formula, data, measures = "all",
  summary = c("mean", "sd"), ...)

`...`	Not used.
`x`	A data.frame contained only the input attributes.
`y`	A factor response vector with one label for each row/component of x.
`measures`	A list of measures names or `"all"` to include all them.
`summary`	A list of summarization functions or empty for all values. See summarization method to more information. (Default: `c("mean", "sd")`)
`formula`	A formula to define the class column.
`data`	A data.frame dataset contained the input attributes and class.

The following measures are allowed for this method:

"F1": Maximum Fisher's Discriminant Ratio (F1) measures the overlap between the values of the features and takes the value of the largest discriminant ratio among all the available features.
"F1v": Directional-vector maximum Fisher's discriminant ratio (F1v) complements F1 by searching for a vector able to separate two classes after the training examples have been projected into it.
"F2": Volume of the overlapping region (F2) computes the overlap of the distributions of the features values within the classes. F2 can be determined by finding, for each feature its minimum and maximum values in the classes.
"F3": The maximum individual feature efficiency (F3) of each feature is given by the ratio between the number of examples that are not in the overlapping region of two classes and the total number of examples. This measure returns the maximum of the values found among the input features.
"F4": Collective feature efficiency (F4) get an overview on how various features may work together in data separation. First the most discriminative feature according to F3 is selected and all examples that can be separated by this feature are removed from the dataset. The previous step is repeated on the remaining dataset until all the features have been considered or no example remains. F4 returns the ratio of examples that have been discriminated.

A list named by the requested overlapping measure.

Albert Orriols-Puig, Nuria Macia and Tin K Ho. (2010). Documentation for the data complexity library in C++. Technical Report. La Salle - Universitat Ramon Llull.

Other complexity-measures: balance, correlation, dimensionality, linearity, neighborhood, network, smoothness

1
2
3

## Extract all overlapping measures for classification task
data(iris)
overlapping(Species ~ ., iris)

$F1
     mean        sd 
0.2775642 0.2612623 

$F1v
      mean         sd 
0.02679963 0.03377042 

$F2
       mean          sd 
0.006381766 0.011053544 

$F3
     mean        sd 
0.1233333 0.2136196 

$F4
      mean         sd 
0.04333333 0.07505553