correlation: Measures of feature correlation

Description Usage Arguments Details Value References See Also Examples

View source: R/correlation.R

Description

Regression task. These measures calculate the correlation of the values of the features to the outputs. If at least one feature is highly correlated to the output, this indicates that simpler functions can be fitted to the data.

Usage

1
2
3
4
5
6
7
correlation(...)

## Default S3 method:
correlation(x, y, measures = "all", summary = c("mean", "sd"), ...)

## S3 method for class 'formula'
correlation(formula, data, measures = "all", summary = c("mean", "sd"), ...)

Arguments

...

Not used.

x

A data.frame contained only the input attributes.

y

A response vector with one value for each row/component of x.

measures

A list of measures names or "all" to include all them.

summary

A list of summarization functions or empty for all values. See summarization method to more information. (Default: c("mean", "sd"))

formula

A formula to define the output column.

data

A data.frame dataset contained the input and output attributes.

Details

The following measures are allowed for this method:

"C1"

Feature correlation to the output (C1) calculate the absolute value of the Spearman correlation between each feature and the outputs.

"C2"

Average feature correlation to the output (C2) computes the average of the Spearman correlations of all features to the output.

"C3"

Individual feature efficiency (C3) calculates, for each feature, the number of examples that must be removed from the dataset until a high Spearman correlation value to the output is achieved.

"C4"

Collective feature efficiency (C4) computes the ratio of examples removed from the dataset based on an iterative process of linear fitting between the features and the target attribute.

Value

A list named by the requested correlation measure.

References

Ana C Lorena and Aron I Maciel and Pericles B C Miranda and Ivan G Costa and Ricardo B C Prudencio. (2018). Data complexity meta-features for regression problems. Machine Learning, 107, 1, 209–246.

See Also

Other complexity-measures: balance(), dimensionality(), featurebased(), linearity(), neighborhood(), network(), smoothness()

Examples

1
2
3
## Extract all correlation measures for regression task
data(cars)
correlation(speed ~ ., cars)

lpfgarcia/ECoL documentation built on Dec. 22, 2020, 1:41 a.m.