Description Usage Arguments Details Value References See Also Examples
Statistical meta-features are the standard statistical measures to describe the numerical properties of a distribution of data. As it requires only numerical attributes, the categorical data are transformed to numerical.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
... |
Further arguments passed to the summarization functions. |
x |
A data.frame contained only the input attributes. |
y |
A factor response vector with one label for each row/component of x. |
features |
A list of features names or |
summary |
A list of summarization functions or empty for all values. See
post.processing method to more information. (Default:
|
by.class |
A logical value indicating if the meta-features must be computed for each group of samples belonging to different output classes. (Default: FALSE) |
transform |
A logical value indicating if the categorical attributes
should be transformed. If |
formula |
A formula to define the class column. |
data |
A data.frame dataset contained the input attributes and class The details section describes the valid values for this group. |
The following features are allowed for this method:
Canonical correlations between the predictive attributes and the class (multi-valued).
Center of gravity, which is the distance between the instance in the center of the majority class and the instance-center of the minority class.
Absolute attributes correlation, which measure the
correlation between each pair of the numeric attributes in the dataset
(multi-valued). This measure accepts an extra argument called
method = c("pearson", "kendall", "spearman")
. See
cor
for more details.
Absolute attributes covariance, which measure the covariance between each pair of the numeric attributes in the dataset (multi-valued).
Number of the discriminant functions.
Eigenvalues of the covariance matrix (multi-valued).
Geometric mean of attributes (multi-valued).
Harmonic mean of attributes (multi-valued).
Interquartile range of attributes (multi-valued).
Kurtosis of attributes (multi-valued).
Median absolute deviation of attributes (multi-valued).
Maximum value of attributes (multi-valued).
Mean value of attributes (multi-valued).
Median value of attributes (multi-valued).
Minimum value of attributes (multi-valued).
Number of attributes pairs with high correlation
(multi-valued when by.class=TRUE
).
Number of attributes with normal distribution. The
Shapiro-Wilk Normality Test is used to assess if an attribute is or not is
normally distributed (multi-valued only when by.class=TRUE
).
Number of attributes with outliers values. The
Turkey's boxplot algorithm is used to compute if an attributes has or does
not have outliers (multi-valued only when by.class=TRUE
).
Range of Attributes (multi-valued).
Standard deviation of the attributes (multi-valued).
Statistic test for homogeneity of covariances.
Skewness of attributes (multi-valued).
Attributes sparsity, which represents the degree of discreetness of each attribute in the dataset (multi-valued).
Trimmed mean of attributes (multi-valued). It is the arithmetic mean excluding the 20% of the lowest and highest instances.
Attributes variance (multi-valued).
Wilks Lambda.
This method uses simple binarization to transform the categorical attributes
when transform=TRUE
.
A list named by the requested meta-features.
Ciro Castiello, Giovanna Castellano, and Anna M. Fanelli. Meta-data: Characterization of input features for meta-learning. In 2nd International Conference on Modeling Decisions for Artificial Intelligence (MDAI), pages 457 - 468, 2005.
Shawkat Ali, and Kate A. Smith. On learning algorithm selection for classification. Applied Soft Computing, volume 6, pages 119 - 138, 2006.
Other meta-features:
clustering()
,
complexity()
,
concept()
,
general()
,
infotheo()
,
itemset()
,
landmarking()
,
model.based()
,
relative()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## Extract all meta-features
statistical(Species ~ ., iris)
## Extract some meta-features
statistical(iris[1:4], iris[5], c("cor", "nrNorm"))
## Extract all meta-features without summarize the results
statistical(Species ~ ., iris, summary=c())
## Use another summarization function
statistical(Species ~ ., iris, summary=c("min", "median", "max"))
## Extract statistical measures using by.class approach
statistical(Species ~ ., iris, by.class=TRUE)
## Do not transform the data (using only categorical attributes)
statistical(Species ~ ., iris, transform=FALSE)
|
$canCor
mean sd
0.7280090 0.3631869
$gravity
[1] 3.208281
$cor
mean sd
0.5941160 0.3375443
$cov
mean sd
0.5966542 0.5582672
$nrDisc
[1] 2
$eigenvalues
mean sd
1.143239 2.058771
$gMean
mean sd
3.223073 2.022943
$hMean
mean sd
2.978389 2.145948
$iqRange
mean sd
1.700000 1.275408
$kurtosis
mean sd
-0.8105361 0.7326910
$mad
mean sd
1.0934175 0.5785782
$max
mean sd
5.425000 2.443188
$mean
mean sd
3.464500 1.918485
$median
mean sd
3.612500 1.919364
$min
mean sd
1.850000 1.808314
$nrCorAttr
[1] 0.5
$nrNorm
[1] 1
$nrOutliers
[1] 1
$range
mean sd
3.575 1.650
$sd
mean sd
0.9478671 0.5712994
$sdRatio
[1] 1.277229
$skewness
mean sd
0.06273198 0.29439896
$sparsity
mean sd
0.02871478 0.01103236
$tMean
mean sd
3.470556 1.904802
$var
mean sd
1.143239 1.332546
$wLambda
[1] 0.02343863
$cor
mean sd
0.5941160 0.3375443
$nrNorm
[1] 1
$canCor
non.aggregated1 non.aggregated2
0.9848209 0.4711970
$gravity
[1] 3.208281
$cor
non.aggregated1 non.aggregated2 non.aggregated3 non.aggregated4 non.aggregated5
0.1175698 0.8717538 0.4284401 0.8179411 0.3661259
non.aggregated6
0.9628654
$cov
non.aggregated1 non.aggregated2 non.aggregated3 non.aggregated4 non.aggregated5
0.0424340 1.2743154 0.3296564 0.5162707 0.1216394
non.aggregated6
1.2956094
$nrDisc
[1] 2
$eigenvalues
non.aggregated1 non.aggregated2 non.aggregated3 non.aggregated4
4.22824171 0.24267075 0.07820950 0.02383509
$gMean
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
5.7857204 3.0265978
non.aggregated.Petal.Length non.aggregated.Petal.Width
3.2382668 0.8417075
$hMean
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
5.7289051 2.9958151
non.aggregated.Petal.Length non.aggregated.Petal.Width
2.6941655 0.4946708
$iqRange
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
1.3 0.5
non.aggregated.Petal.Length non.aggregated.Petal.Width
3.5 1.5
$kurtosis
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
-0.6058125 0.1387047
non.aggregated.Petal.Length non.aggregated.Petal.Width
-1.4168574 -1.3581792
$mad
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
1.03782 0.44478
non.aggregated.Petal.Length non.aggregated.Petal.Width
1.85325 1.03782
$max
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
7.9 4.4
non.aggregated.Petal.Length non.aggregated.Petal.Width
6.9 2.5
$mean
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
5.843333 3.057333
non.aggregated.Petal.Length non.aggregated.Petal.Width
3.758000 1.199333
$median
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
5.80 3.00
non.aggregated.Petal.Length non.aggregated.Petal.Width
4.35 1.30
$min
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
4.3 2.0
non.aggregated.Petal.Length non.aggregated.Petal.Width
1.0 0.1
$nrCorAttr
[1] 0.5
$nrNorm
[1] 1
$nrOutliers
[1] 1
$range
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
3.6 2.4
non.aggregated.Petal.Length non.aggregated.Petal.Width
5.9 2.4
$sd
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
0.8280661 0.4358663
non.aggregated.Petal.Length non.aggregated.Petal.Width
1.7652982 0.7622377
$sdRatio
[1] 1.277229
$skewness
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
0.3086407 0.3126147
non.aggregated.Petal.Length non.aggregated.Petal.Width
-0.2694109 -0.1009166
$sparsity
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
0.02205177 0.03705865
non.aggregated.Petal.Length non.aggregated.Petal.Width
0.01670048 0.03904820
$tMean
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
5.797778 3.040000
non.aggregated.Petal.Length non.aggregated.Petal.Width
3.842222 1.202222
$var
non.aggregated.Sepal.Length non.aggregated.Sepal.Width
0.6856935 0.1899794
non.aggregated.Petal.Length non.aggregated.Petal.Width
3.1162779 0.5810063
$wLambda
[1] 0.02343863
$canCor
min median max
0.4711970 0.7280090 0.9848209
$gravity
[1] 3.208281
$cor
min median max
0.1175698 0.6231906 0.9628654
$cov
min median max
0.0424340 0.4229635 1.2956094
$nrDisc
[1] 2
$eigenvalues
min median max
0.02383509 0.16044012 4.22824171
$gMean
min median max
0.8417075 3.1324323 5.7857204
$hMean
min median max
0.4946708 2.8449903 5.7289051
$iqRange
min median max
0.5 1.4 3.5
$kurtosis
min median max
-1.4168574 -0.9819959 0.1387047
$mad
min median max
0.44478 1.03782 1.85325
$max
min median max
2.50 5.65 7.90
$mean
min median max
1.199333 3.407667 5.843333
$median
min median max
1.300 3.675 5.800
$min
min median max
0.1 1.5 4.3
$nrCorAttr
[1] 0.5
$nrNorm
[1] 1
$nrOutliers
[1] 1
$range
min median max
2.4 3.0 5.9
$sd
min median max
0.4358663 0.7951519 1.7652982
$sdRatio
[1] 1.277229
$skewness
min median max
-0.2694109 0.1038621 0.3126147
$sparsity
min median max
0.01670048 0.02955521 0.03904820
$tMean
min median max
1.202222 3.441111 5.797778
$var
min median max
0.1899794 0.6333499 3.1162779
$wLambda
[1] 0.02343863
$canCor
mean sd
0.7280090 0.3631869
$gravity
[1] 3.208281
$cor
mean sd
0.4850530 0.2124471
$cov
mean sd
0.07154263 0.07234487
$nrDisc
[1] 2
$eigenvalues
mean sd
0.1518663 0.2187384
$gMean
mean sd
3.444764 2.018251
$hMean
mean sd
3.424851 2.014514
$iqRange
mean sd
0.4625000 0.2071177
$kurtosis
mean sd
-0.07541906 0.64345348
$mad
mean sd
0.3521175 0.1925954
$max
mean sd
4.258333 2.333339
$mean
mean sd
3.464500 2.021852
$median
mean sd
3.458333 2.014587
$min
mean sd
2.633333 1.669150
$nrCorAttr
mean sd
0.5000000 0.4409586
$nrNorm
mean sd
2.6666667 0.5773503
$nrOutliers
mean sd
2 1
$range
mean sd
1.6250000 0.7374711
$sd
mean sd
0.3577631 0.1613754
$sdRatio
[1] 1.277229
$skewness
mean sd
0.1199744 0.4378457
$sparsity
mean sd
0.06017094 0.03608774
$tMean
mean sd
3.455833 2.011284
$var
mean sd
0.1518663 0.1221409
$wLambda
[1] 0.02343863
$canCor
mean sd
0.7280090 0.3631869
$gravity
[1] 3.208281
$cor
mean sd
0.5941160 0.3375443
$cov
mean sd
0.5966542 0.5582672
$nrDisc
[1] 2
$eigenvalues
mean sd
1.143239 2.058771
$gMean
mean sd
3.223073 2.022943
$hMean
mean sd
2.978389 2.145948
$iqRange
mean sd
1.700000 1.275408
$kurtosis
mean sd
-0.8105361 0.7326910
$mad
mean sd
1.0934175 0.5785782
$max
mean sd
5.425000 2.443188
$mean
mean sd
3.464500 1.918485
$median
mean sd
3.612500 1.919364
$min
mean sd
1.850000 1.808314
$nrCorAttr
[1] 0.5
$nrNorm
[1] 1
$nrOutliers
[1] 1
$range
mean sd
3.575 1.650
$sd
mean sd
0.9478671 0.5712994
$sdRatio
[1] 1.277229
$skewness
mean sd
0.06273198 0.29439896
$sparsity
mean sd
0.02871478 0.01103236
$tMean
mean sd
3.470556 1.904802
$var
mean sd
1.143239 1.332546
$wLambda
[1] 0.02343863
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.