complexity: Extract the complexity measures from datasets

Description Usage Arguments Details Value References Examples

View source: R/complexity.R

Description

This function is responsable to extract the complexity measures from the classification and regression tasks. For such, they take into account the overlap between classes imposed by feature values, the separability and distribution of the data points and the value of structural measures based on the representation of the dataset as a graph structure. To set specific parameters for each group, use the characterization function.

Usage

1
2
3
4
5
6
7
8
9
complexity(...)

## Default S3 method:
complexity(x, y, groups = "all", summary = c("mean",
  "sd"), ...)

## S3 method for class 'formula'
complexity(formula, data, groups = "all",
  summary = c("mean", "sd"), ...)

Arguments

...

Not used.

x

A data.frame contained only the input attributes.

y

A response vector with one value for each row/component of x.

groups

A list of complexity measures groups or "all" to include all of them.

summary

A list of summarization functions or empty for all values. See summarization method to more information. (Default: c("mean", "sd"))

formula

A formula to define the output column.

data

A data.frame dataset contained the input and output attributes.

Details

The following groups are allowed for this method:

"overlapping"

The feature overlapping measures characterize how informative the available features are to separate the classes See overlapping for more details.

"neighborhood"

Neighborhood measures characterize the presence and density of same or different classes in local neighborhoods. See neighborhood for more details.

"linearity"

Linearity measures try to quantify whether the labels can be linearly separated. See linearity for more details.

"dimensionality"

The dimensionality measures compute information on how smoothly the examples are distributed within the attributes. See dimensionality for more details.

"balance"

Class balance measures take into account the numbers of examples per class in the dataset. See balance for more details.

"network"

Network measures represent the dataset as a graph and extract structural information from it. See network for more details.

"correlation"

Capture the relationship of the feature values with the outputs. See correlation for more details.

"smoothness"

Estimate the smoothness of the function that must be fitted to the data. See smoothness for more details.

Value

A numeric vector named by the requested complexity measures.

References

Tin K Ho and Mitra Basu. (2002). Complexity measures of supervised classification problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 3, 289–300.

Albert Orriols-Puig, Nuria Macia and Tin K Ho. (2010). Documentation for the data complexity library in C++. Technical Report. La Salle - Universitat Ramon Llull.

Ana C Lorena and Aron I Maciel and Pericles B C Miranda and Ivan G Costa and Ricardo B C Prudencio. (2018). Data complexity meta-features for regression problems. Machine Learning, 107, 1, 209–246.

Examples

1
2
3
4
5
6
7
## Extract all complexity measures for classification task
data(iris)
complexity(Species ~ ., iris)

## Extract all complexity measures for regression task
data(cars)
complexity(speed ~ ., cars)

Example output

 overlapping.F1.mean    overlapping.F1.sd overlapping.F1v.mean 
         0.277564193          0.261262259          0.026799630 
  overlapping.F1v.sd  overlapping.F2.mean    overlapping.F2.sd 
         0.033770417          0.006381766          0.011053544 
 overlapping.F3.mean    overlapping.F3.sd  overlapping.F4.mean 
         0.123333333          0.213619600          0.043333333 
   overlapping.F4.sd      neighborhood.N1 neighborhood.N2.mean 
         0.075055535          0.106666667          0.198144442 
  neighborhood.N2.sd neighborhood.N3.mean   neighborhood.N3.sd 
         0.146693341          0.060000000          0.238282445 
neighborhood.N4.mean   neighborhood.N4.sd neighborhood.T1.mean 
         0.013333333          0.115081918          0.055555556 
  neighborhood.T1.sd     neighborhood.LSC    linearity.L1.mean 
         0.090949961          0.816400000          0.004335693 
     linearity.L1.sd    linearity.L2.mean      linearity.L2.sd 
         0.007509640          0.013333333          0.023094011 
   linearity.L3.mean      linearity.L3.sd    dimensionality.T2 
         0.006666667          0.011547005          0.026666667 
   dimensionality.T3    dimensionality.T4           balance.C1 
         0.013333333          0.500000000          1.000000000 
          balance.C2      network.Density      network.ClsCoef 
         0.000000000          0.833288591          0.267977020 
   network.Hubs.mean      network.Hubs.sd 
         0.838050831          0.275331937 
correlation.C2.mean   correlation.C2.sd correlation.C3.mean   correlation.C3.sd 
         0.83035684                  NA          0.08000000                  NA 
     correlation.C4   linearity.L1.mean     linearity.L1.sd   linearity.L2.mean 
         0.56000000          0.11991838          0.08629891          0.02167897 
    linearity.L2.sd   linearity.L3.mean     linearity.L3.sd  smoothness.S1.mean 
                 NA          0.01672446          0.02105020          0.18172983 
   smoothness.S1.sd  smoothness.S2.mean    smoothness.S2.sd  smoothness.S3.mean 
         0.10309899          0.11812522          0.10108113          0.03632653 
   smoothness.S3.sd  smoothness.S4.mean    smoothness.S4.sd   dimensionality.T2 
         0.03690202          0.03459402          0.04527191          0.02000000 
  dimensionality.T3   dimensionality.T4 
         0.02000000          1.00000000 

ECoL documentation built on Nov. 5, 2019, 9:07 a.m.

Related to complexity in ECoL...