data_cls: Classification toy dataset
In etree: Classification and Regression with Structured and Mixed-Type Data

data_cls

R Documentation

Classification toy dataset

Description

A simple dataset containing simulated values for a nominal response variable and four covariates of both mixed and partially structured type. The data generation process is based on Example 4.7 (”Signal shape classification”, pages 73-77) from Saito (1994).

Usage

data_cls

Format

List with two elements: covs, which is a list containing the covariates, and resp, which is a factor of length 150 representing the response variable. The response variable is divided into three classes whose labels are cylinder (Cyl), bell (Bel) and funnel (Fun). The four covariates in covs all have length 150 and are characterized as follows:

Nominal: Cyl observations are given level 1 with probability 0.8 and levels 2 and 3 with probability 0.1 each, Bel observations are given level 2 with probability 0.8 and levels 1 and 3 with probability 0.1 each, Fun observations are given level 3 with probability 0.8 and levels 1 and 2 with probability 0.1 each;
Numeric: coefficients for one of the basis used to perform the B-splines expansion of the curves that are in turn specified as in Saito (1994);
Functional: curves as specified in Saito (1994);
Graphs: Erd\"os-R\'enyi graphs with connection probability 0.10 for Cyl observations, 0.125 for Bel observations, 0.15 for Fun observations.