data_reg: Regression toy dataset
In etree: Classification and Regression with Structured and Mixed-Type Data

data_reg

R Documentation

Regression toy dataset

Description

A simple dataset containing simulated values for a numeric response variable and four covariates of both mixed and partially structured type. The data generation process is based on Section 5 (”Example: synthetic data”) from Serban and Wasserman (2005).

Usage

data_reg

Format

List with two elements: covs, which is a list containing the covariates, and resp, which is a numeric vector of length 200 representing the response variable. The response variable is specified as in Serban and Wasserman (2005). The four covariates in covs all have length 200 and are characterized as follows:

Nominal: level 0 for observations having negative response variable, level 1 otherwise;
Numeric: coefficients for one of the basis used to perform the B-splines expansion of the curves that are in turn specified as in Serban and Wasserman (2005);
Functional: curves as specified in Serban and Wasserman (2005), with 50 observations coming from each of the four curve shapes;
Graphs: Erd\"os-R\'enyi graphs with connection probability given by a transformation of the response variable obtained standardizing between 0.2 and 0.8 its value after adding a normally distributed noise with mean 0 and standard deviation 7.

References

Serban, N., and Wasserman, L. (2005). CATS: clustering after transformation and smoothing. Journal of the American Statistical Association, 100(471), 990-999.

etree documentation built on July 16, 2022, 9:05 a.m.