dimensionality: Measures of dimensionality

Description Usage Arguments Details Value References See Also Examples

View source: R/dimensionality.R

Description

These measures give an indicative of data sparsity. They capture how sparse a datasets tend to have regions of low density. These regions are know to be more difficult to extract good classification and regression models.

Usage

1
2
3
4
5
6
7
dimensionality(...)

## Default S3 method:
dimensionality(x, y, measures = "all", ...)

## S3 method for class 'formula'
dimensionality(formula, data, measures = "all", ...)

Arguments

...

Not used.

x

A data.frame contained only the input attributes.

y

A response vector with one value for each row/component of x.

measures

A list of measures names or "all" to include all them.

formula

A formula to define the output column.

data

A data.frame dataset contained the input and output attributes.

Details

The following measures are allowed for this method:

"T2"

Average number of points per dimension (T2) is given by the ratio between the number of examples and dimensionality of the dataset.

"T3"

Average number of points per PCA (T3) is similar to T2, but uses the number of PCA components needed to represent 95 variability as the base of data sparsity assessment.

"T4"

Ratio of the PCA Dimension to the Original (T4) estimates the proportion of relevant and the original dimensions for a dataset.

Value

A list named by the requested dimensionality measure.

References

Ana C Lorena, Ivan G Costa, Newton Spolaor and Marcilio C P Souto. (2012). Analysis of complexity indices for classification problems: Cancer gene expression data. Neurocomputing 75, 1, 33–42.

See Also

Other complexity-measures: balance, correlation, linearity, neighborhood, network, overlapping, smoothness

Examples

1
2
3
4
5
6
7
## Extract all dimensionality measures for classification task
data(iris)
dimensionality(Species ~ ., iris)

## Extract all dimensionality measures for regression task
data(cars)
dimensionality(speed ~ ., cars)

Example output

        T2         T3         T4 
0.02666667 0.01333333 0.50000000 
  T2   T3   T4 
0.02 0.02 1.00 

ECoL documentation built on Nov. 5, 2019, 9:07 a.m.

Related to dimensionality in ECoL...