friedmandata: Friedman's classification benchmark data
In klaR: Classification and Visualization

friedman.data

R Documentation

Friedman's classification benchmark data

Description

Function to generate 3-class classification benchmarking data as introduced by J.H. Friedman (1989)

Usage

friedman.data(setting = 1, p = 6, samplesize = 40, asmatrix = FALSE)

Arguments

`setting`	the problem setting (integer 1,2,...,6).
`p`	number of variables (6, 10, 20 or 40).
`samplesize`	sample size (number of observations, >=6).
`asmatrix`	if `TRUE`, results are returned as a matrix, otherwise as a data frame (default).

Details

When J.H. Friedman introduced the Regularized Discriminant Analysis (rda) in 1989, he used artificially generated data to test the procedure and to examine its performance in comparison to Linear and Quadratic Discriminant Analysis (see also lda and qda).

6 different settings were considered to demonstrate potential strengths and weaknesses of the new method:

equal spherical covariance matrices,
unequal spherical covariance matrices,
equal, highly ellipsoidal covariance matrices with mean differences in low-variance subspace,
equal, highly ellipsoidal covariance matrices with mean differences in high-variance subspace,
unequal, highly ellipsoidal covariance matrices with zero mean differences and
unequal, highly ellipsoidal covariance matrices with nonzero mean differences.

For each of the 6 settings data was generated with 6, 10, 20 and 40 variables.

Classification performance was then measured by repeatedly creating training-datasets of 40 observations and estimating the misclassification rates by test sets of 100 observations.

The number of classes is always 3, class labels are assigned randomly (with equal probabilities) to observations, so the contributions of classes to the data differs from dataset to dataset. To make sure covariances can be estimated at all, there are always at least two observations from each class in a dataset.

Value

Depending on asmatrix either a data frame or a matrix with samplesize rows and p+1 columns, the first column containing the class labels, the remaining columns being the variables.

Author(s)

Christian Röver, roever@statistik.tu-dortmund.de

References

Friedman, J.H. (1989): Regularized Discriminant Analysis. In: Journal of the American Statistical Association 84, 165-175.

Examples

# Reproduce the 1st setting with 6 variables.
# Error rate should be somewhat near 9 percent.
training <- friedman.data(1, 6, 40)
x <- rda(class ~ ., data = training, gamma = 0.74, lambda = 0.77)
test <- friedman.data(1, 6, 100)
y <- predict(x, test[,-1])
errormatrix(test[,1], y$class)

klaR documentation built on May 29, 2024, 5:20 a.m.

klaR index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

klaR
Classification and Visualization

friedmandata: Friedman's classification benchmark data
In klaR: Classification and Visualization

Friedman's classification benchmark data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to friedmandata in klaR...

R Package Documentation

Browse R Packages

We want your feedback!

klaR Classification and Visualization

friedmandata: Friedman's classification benchmark data In klaR: Classification and Visualization

Friedman's classification benchmark data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to friedmandata in klaR...

R Package Documentation

Browse R Packages

We want your feedback!

klaR
Classification and Visualization

friedmandata: Friedman's classification benchmark data
In klaR: Classification and Visualization