Madelon data set: synthetic data from NIPS 2003 feature selection challenge

Share:

Description

This is a two-class classification problem. The difficulty is that the problem is multivariate and highly non-linear. Of the 500 features, 20 are real features, 480 are noise features.
Data set from UCI repository, discretized using median cutoffs.

This is a two-class classification problem. The difficulty is that the problem is multivariate and highly non-linear. Of the 500 features, 20 are real features, 480 are noise features.
Data set from UCI repository, discretized using median cutoffs.

Usage

1
2
3

Format

TrainX

A matrix with 2000 rows and 500 columns.

TrainY

A vector with 2000 rows.

TestX

A matrix with 600 rows and 500 columns.

TestY

A vector with 600 rows.

References

UCI madelon data set

UCI madelon data set