infotheo: Information-theoretic meta-features

Description Usage Arguments Details Value References See Also Examples

View source: R/infotheo.R

Description

Information-theoretic meta-features are particularly appropriate to describe discrete (categorical) attributes, but they also fit continuous ones so a discretization is required.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
infotheo(...)

## Default S3 method:
infotheo(
  x,
  y,
  features = "all",
  summary = c("mean", "sd"),
  transform = TRUE,
  ...
)

## S3 method for class 'formula'
infotheo(
  formula,
  data,
  features = "all",
  summary = c("mean", "sd"),
  transform = TRUE,
  ...
)

Arguments

...

Further arguments passed to the summarization functions.

x

A data.frame contained only the input attributes.

y

A factor response vector with one label for each row/component of x.

features

A list of features names or "all" to include all them. The supported values are described in the details section. (Default: "all")

summary

A list of summarization functions or empty for all values. See post.processing method to more information. (Default: c("mean", "sd"))

transform

A logical value indicating if the numeric attributes should be transformed. If FALSE they will be ignored. (Default: TRUE)

formula

A formula to define the class column.

data

A data.frame dataset contained the input attributes and class The details section describes the valid values for this group.

Details

The following features are allowed for this method:

"attrConc"

Attributes concentration. It is the Goodman and Kruskal's tau measure otherwise known as the concentration coefficient computed for each pair of attributes (multi-valued).

"attrEnt"

Attributes entropy, a measure of randomness of each attributes in the dataset (multi-valued).

"classConc"

Class concentration, similar to "attrConc", however, it is computed for each attribute and the class (multi-valued).

"classEnt"

Class entropy, which describes how much information is necessary to specify the class in the dataset.

"eqNumAttr"

Equivalent number of attributes, which represents the number of attributes suitable to optimally solve the classification task using the dataset.

"jointEnt"

Joint entropy, which represents the total entropy of each attribute and the class (multi-valued).

"mutInf"

Mutual information, that is the common information shared between each attribute and the class in the dataset (multi-valued).

"nsRatio"

Noise ratio, which describes the amount of irrelevant information contained in the dataset.

This method uses the unsupervised data discretization procedure provided by discretize function, where the default values are used when transform=TRUE.

Value

A list named by the requested meta-features.

References

Donald Michie, David J. Spiegelhalter, Charles C. Taylor, and John Campbell. Machine Learning, Neural and Statistical Classification, volume 37. Ellis Horwood Upper Saddle River, 1994.

Alexandros Kalousis and Melanie Hilario. Model selection via meta-learning: a comparative study. International Journal on Artificial Intelligence Tools, volume 10, pages 525 - 554, 2001.

Ciro Castiello, Giovanna Castellano, and Anna Maria Fanelli. Meta-data: Characterization of input features for meta-learning. In 2nd International Conference on Modeling Decisions for Artificial Intelligence (MDAI), pages 457 - 468, 2005.

See Also

Other meta-features: clustering(), complexity(), concept(), general(), itemset(), landmarking(), model.based(), relative(), statistical()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Extract all metafeatures
infotheo(Species ~ ., iris)

## Extract some metafeatures
infotheo(iris[1:4], iris[5], c("classEnt", "jointEnt"))

## Extract all meta-features without summarize the results
infotheo(Species ~ ., iris, summary=c())

## Use another summarization functions
infotheo(Species ~ ., iris, summary=c("min", "median", "max"))

## Do not transform the data (using only categorical attributes)
infotheo(Species ~ ., iris, transform=FALSE)

Example output

$attrConc
     mean        sd 
0.2098049 0.1195880 

$attrEnt
      mean         sd 
2.27719128 0.06103943 

$classConc
     mean        sd 
0.2734739 0.1409110 

$classEnt
[1] 1.584963

$eqNumAttr
[1] 1.878064

$jointEnt
     mean        sd 
3.0182196 0.3821883 

$mutInf
     mean        sd 
0.8439342 0.4222026 

$nsRatio
[1] 1.698304

$classEnt
[1] 1.584963

$jointEnt
     mean        sd 
3.0182196 0.3821883 

$attrConc
 non.aggregated1  non.aggregated2  non.aggregated3  non.aggregated4 
      0.08478340       0.26374940       0.23291127       0.09183055 
 non.aggregated5  non.aggregated6  non.aggregated7  non.aggregated8 
      0.11612406       0.12836408       0.25689542       0.12161444 
 non.aggregated9 non.aggregated10 non.aggregated11 non.aggregated12 
      0.42995680       0.23009982       0.13924810       0.42208100 

$attrEnt
non.aggregated.Sepal.Length  non.aggregated.Sepal.Width 
                   2.315653                    2.186232 
non.aggregated.Petal.Length  non.aggregated.Petal.Width 
                   2.308260                    2.298620 

$classConc
non.aggregated.Sepal.Length  non.aggregated.Sepal.Width 
                  0.1882864                   0.1197396 
non.aggregated.Petal.Length  non.aggregated.Petal.Width 
                  0.3847270                   0.4011426 

$classEnt
[1] 1.584963

$eqNumAttr
[1] 1.878064

$jointEnt
non.aggregated.Sepal.Length  non.aggregated.Sepal.Width 
                   3.281389                    3.410577 
non.aggregated.Petal.Length  non.aggregated.Petal.Width 
                   2.698910                    2.682002 

$mutInf
non.aggregated.Sepal.Length  non.aggregated.Sepal.Width 
                  0.6192261                   0.3606172 
non.aggregated.Petal.Length  non.aggregated.Petal.Width 
                  1.1943125                   1.2015809 

$nsRatio
[1] 1.698304

$attrConc
      min    median       max 
0.0847834 0.1846740 0.4299568 

$attrEnt
     min   median      max 
2.186232 2.303440 2.315653 

$classConc
      min    median       max 
0.1197396 0.2865067 0.4011426 

$classEnt
[1] 1.584963

$eqNumAttr
[1] 1.878064

$jointEnt
     min   median      max 
2.682002 2.990150 3.410577 

$mutInf
      min    median       max 
0.3606172 0.9067693 1.2015809 

$nsRatio
[1] 1.698304

$attrConc
mean   sd 
  NA   NA 

$attrEnt
mean   sd 
  NA   NA 

$classConc
mean   sd 
  NA   NA 

$classEnt
[1] NA

$eqNumAttr
[1] NA

$jointEnt
mean   sd 
  NA   NA 

$mutInf
mean   sd 
  NA   NA 

$nsRatio
[1] NA

mfe documentation built on July 1, 2020, 10:46 p.m.