dimensionality: Measures of dimensionality
In lpfgarcia/ECoL: Complexity Measures for Supervised Problems

Description Usage Arguments Details Value References See Also Examples

View source: R/dimensionality.R

These measures give an indicative of data sparsity. They capture how sparse a datasets tend to have regions of low density. These regions are know to be more difficult to extract good classification and regression models.

dimensionality(...)

## Default S3 method:
dimensionality(x, y, measures = "all", ...)

## S3 method for class 'formula'
dimensionality(formula, data, measures = "all", ...)

`...`	Not used.
`x`	A data.frame contained only the input attributes.
`y`	A response vector with one value for each row/component of x.
`measures`	A list of measures names or `"all"` to include all them.
`formula`	A formula to define the output column.
`data`	A data.frame dataset contained the input and output attributes.

The following measures are allowed for this method:

"D1": Average number of points per dimension (D1) is given by the ratio between the number of examples and dimensionality of the dataset.
"D2": Average number of points per PCA (D2) is similar to T2, but uses the number of PCA components needed to represent 95 variability as the base of data sparsity assessment.
"D3": Ratio of the PCA Dimension to the Original (D3) estimates the proportion of relevant and the original dimensions for a dataset.

A list named by the requested dimensionality measure.

Ana C Lorena, Ivan G Costa, Newton Spolaor and Marcilio C P Souto. (2012). Analysis of complexity indices for classification problems: Cancer gene expression data. Neurocomputing 75, 1, 33–42.

Other complexity-measures: balance(), correlation(), featurebased(), linearity(), neighborhood(), network(), smoothness()

## Extract all dimensionality measures for classification task
data(iris)
dimensionality(Species ~ ., iris)

## Extract all dimensionality measures for regression task
data(cars)
dimensionality(speed ~ ., cars)