toys: A simulated dataset called toys data

toysR Documentation

A simulated dataset called toys data

Description

toys is a simple simulated dataset of a binary classification problem, introduced by Weston et.al..

Format

The format is a list of 2 components:

x

a dataframe containing input variables: with 100 obs. of 200 variables

y

output variable: a factor with 2 levels "-1" and "1"

Details

It is an equiprobable two class problem, Y belongs to {-1,1}, with six true variables, the others being some noise. The simulation model is defined through the conditional distribution of the X_i for Y=y:

  • with probability 0.7, X^j ~ N(yj,1) for j=1,2,3 and X^j ~ N(0,1) for j=4,5,6 ;

  • with probability 0.3, X^j ~ N(0,1) for j=1,2,3 and X^j ~ N(y(j-3),1) for j=4,5,6 ;

  • the other variables are noise, X^j ~ N(0,1) for j=7,...,p.

After simulation, the obtained variables are finally standardized.

Source

Weston, J., Elisseff, A., Schoelkopf, B., Tipping, M. (2003), Use of the zero norm with linear models and Kernel methods, J. Machine Learn. Res. 3, 1439-1461

Examples

data(toys)
toys.rf <- randomForest::randomForest(toys$x, toys$y)
toys.rf

## Not run: 
# VSURF applied for toys data:
# (a few minutes to execute)
data(toys)
toys.vsurf <- VSURF(toys$x, toys$y)
toys.vsurf

## End(Not run)


robingenuer/VSURF documentation built on July 15, 2024, 8:18 p.m.