statistical: Statistical meta-features

View source: R/statistical.R

statisticalR Documentation

Statistical meta-features

Description

Statistical meta-features are the standard statistical measures to describe the numerical properties of a distribution of data. As it requires only numerical attributes, the categorical data are transformed to numerical.

Usage

statistical(...)

## Default S3 method:
statistical(
  x,
  y,
  features = "all",
  summary = c("mean", "sd"),
  by.class = FALSE,
  transform = TRUE,
  ...
)

## S3 method for class 'formula'
statistical(
  formula,
  data,
  features = "all",
  summary = c("mean", "sd"),
  by.class = FALSE,
  transform = TRUE,
  ...
)

Arguments

...

Further arguments passed to the summarization functions.

x

A data.frame contained only the input attributes.

y

A factor response vector with one label for each row/component of x.

features

A list of features names or "all" to include all them. The details section describes the valid values for this group.

summary

A list of summarization functions or empty for all values. See post.processing method to more information. (Default: c("mean", "sd"))

by.class

A logical value indicating if the meta-features must be computed for each group of samples belonging to different output classes. (Default: FALSE)

transform

A logical value indicating if the categorical attributes should be transformed. If FALSE they will be ignored. (Default: TRUE)

formula

A formula to define the class column.

data

A data.frame dataset contained the input attributes and class The details section describes the valid values for this group.

Details

The following features are allowed for this method:

"canCor"

Canonical correlations between the predictive attributes and the class (multi-valued).

"gravity"

Center of gravity, which is the distance between the instance in the center of the majority class and the instance-center of the minority class.

"cor"

Absolute attributes correlation, which measure the correlation between each pair of the numeric attributes in the dataset (multi-valued). This measure accepts an extra argument called method = c("pearson", "kendall", "spearman"). See cor for more details.

"cov"

Absolute attributes covariance, which measure the covariance between each pair of the numeric attributes in the dataset (multi-valued).

"nrDisc"

Number of the discriminant functions.

"eigenvalues"

Eigenvalues of the covariance matrix (multi-valued).

"gMean"

Geometric mean of attributes (multi-valued).

"hMean"

Harmonic mean of attributes (multi-valued).

"iqRange"

Interquartile range of attributes (multi-valued).

"kurtosis"

Kurtosis of attributes (multi-valued).

"mad"

Median absolute deviation of attributes (multi-valued).

"max"

Maximum value of attributes (multi-valued).

"mean"

Mean value of attributes (multi-valued).

"median"

Median value of attributes (multi-valued).

"min"

Minimum value of attributes (multi-valued).

"nrCorAttr"

Number of attributes pairs with high correlation (multi-valued when by.class=TRUE).

"nrNorm"

Number of attributes with normal distribution. The Shapiro-Wilk Normality Test is used to assess if an attribute is or not is normally distributed (multi-valued only when by.class=TRUE).

"nrOutliers"

Number of attributes with outliers values. The Turkey's boxplot algorithm is used to compute if an attributes has or does not have outliers (multi-valued only when by.class=TRUE).

"range"

Range of Attributes (multi-valued).

"sd"

Standard deviation of the attributes (multi-valued).

"sdRatio"

Statistic test for homogeneity of covariances.

"skewness"

Skewness of attributes (multi-valued).

"sparsity"

Attributes sparsity, which represents the degree of discreetness of each attribute in the dataset (multi-valued).

"tMean"

Trimmed mean of attributes (multi-valued). It is the arithmetic mean excluding the 20% of the lowest and highest instances.

"var"

Attributes variance (multi-valued).

"wLambda"

Wilks Lambda.

This method uses simple binarization to transform the categorical attributes when transform=TRUE.

Value

A list named by the requested meta-features.

References

Ciro Castiello, Giovanna Castellano, and Anna M. Fanelli. Meta-data: Characterization of input features for meta-learning. In 2nd International Conference on Modeling Decisions for Artificial Intelligence (MDAI), pages 457 - 468, 2005.

Shawkat Ali, and Kate A. Smith. On learning algorithm selection for classification. Applied Soft Computing, volume 6, pages 119 - 138, 2006.

See Also

Other meta-features: clustering(), complexity(), concept(), general(), infotheo(), itemset(), landmarking(), model.based(), relative()

Examples

## Extract all meta-features
statistical(Species ~ ., iris)

## Extract some meta-features
statistical(iris[1:4], iris[5], c("cor", "nrNorm"))

## Extract all meta-features without summarize the results
statistical(Species ~ ., iris, summary=c())

## Use another summarization function
statistical(Species ~ ., iris, summary=c("min", "median", "max"))

## Extract statistical measures using by.class approach
statistical(Species ~ ., iris, by.class=TRUE)

## Do not transform the data (using only categorical attributes)
statistical(Species ~ ., iris, transform=FALSE)

rivolli/mfe documentation built on March 29, 2022, 11:08 p.m.