complexity: Complexity meta-features

Description Usage Arguments Details Value References See Also Examples

View source: R/complexity.R

Description

The complexity group is a set of measures to characterize the complexity of classification problems based on aspects that quantify the linearity of the data, the presence of informative feature, the sparsity and dimensionality of the datasets.

Usage

1
2
3
4
5
6
7
complexity(...)

## Default S3 method:
complexity(x, y, features = "all", summary = c("mean", "sd"), ...)

## S3 method for class 'formula'
complexity(formula, data, features = "all", summary = c("mean", "sd"), ...)

Arguments

...

Not used.

x

A data.frame contained only the input attributes.

y

A factor response vector with one label for each row/component of x.

features

A list of features names or "all" to include all them. The supported values are described in the details section. (Default: "all")

summary

A list of summarization functions or empty for all values. See post.processing method to more information. (Default: c("mean", "sd"))

formula

A formula to define the class column.

data

A data.frame dataset contained the input attributes and class

Details

The following features are allowed for classification problems:

"C1"

Entropy of class proportions.

"C2"

Multi-class imbalance ratio.

"F1"

Fisher's discriminant ratio.

"F1v"

The directional-vector Fisher's discriminant ratio.

"F2"

Overlapping of the per-class bounding boxes.

"F3"

Maximum individual feature efficiency.

"F4"

Collective feature efficiency.

"L1"

Distance of erroneous instances to a linear classifier.

"L2"

Training error of a linear classifier.

"L3"

Nonlinearity of a linear classifier.

"LSC"

Local-Set cardinality average.

"N1"

Fraction of points lying on the class boundary.

"N2"

Average intra/inter class nearest neighbor distances.

"N3"

Leave-one-out error rate of the 1-nearest neighbor algorithm.

"N4"

Nonlinearity of the one-nearest neighbor classifier.

"T1"

Fraction of maximum covering spheres on data.

"T2"

Average number of samples per dimension.

"T3"

Average intrinsic dimensionality per number of examples.

"T4"

Intrinsic dimensionality proportion.

Also it is possible to ask for a subgroup of features:

"balance"

Include the measures C1 and C2.

"dimensionality"

Include the measures T2, T3 and T4.

"linearity"

Include the measures L1, L2 and L3.

"neighborhood"

Include the measures N1, N2, N3, N4, T1 and LSC.

"network"

Include the measures Density, ClsCoef and Hubs.

"overlapping"

Include the measures F1, F1v, F2, F3 and F4.

Value

A list named by the requested meta-features.

References

Ana C. Lorena, Luis P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin Kam Ho. 2019. How Complex Is Your Classification Problem?: A Survey on Measuring Classification Complexity. ACM Comput. Surv. 52, 5.

Lorena, A. C., Maciel, A. I., de Miranda, P. B. C., Costa, I. G., and Prudencio, R. B. C. (2018). Data complexity meta-features for regression problems. Machine Learning, 107(1):209-246.

Ho, T., and Basu, M. (2002). Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):289-300.

See Also

Other meta-features: clustering(), concept(), general(), infotheo(), itemset(), landmarking(), model.based(), relative(), statistical()

Examples

1
2
3
4
5
## Extract all metafeatures
complexity(Species ~ ., iris)

## Extract some metafeatures
complexity(iris[30:120, 1:4], iris[30:120, 5], c("F1", "F2", "linearity"))

mfe documentation built on July 1, 2020, 10:46 p.m.