ImbR: Synthetic Regression Data Set

Description Usage Format Author(s) Examples

Description

Simulated data set for imbalanced domain on regression. The rare cases corresponden to the higher extreme values and are described by a circle with white noise. The normal cases have a normal distribution with the same center of the circunference with elliptical contours.

Usage

1

Format

The data set has 2 continuous features (X1 and X2) and a continuous target variable (denoted as Tgt). The rare examples, i.e, cases with higher values of the target variable occur in 5% of the data. Data set ImbR has 1000 examples.

ImbR data has been simulated as follows:

-

lower Tgt values: (X1, X2)\sim \mathbf{N}_{2} ≤ft(\mathbf{10}_{2}, \mathbf{2.5}_{2}\right)

and Tgt\sim \mathbf{Γ} ≤ft( 0.5, 1 \right) +10

-

higher Tgt values: (X1, X2)\sim ≤ft(ρ * cos(θ) + 10, ρ * sin(θ) + 10 \right), where ρ \sim \mathbf{9}_{2}+\mathbf{N}_{2} ≤ft(\mathbf{0}_{2}, \mathbf{I}_{2} \right) and θ \sim \mathbf{U}_{2} ≤ft( \mathbf{0}_{2}, 2π \mathbf{I}_{2} \right) Tgt\sim \mathbf{Γ} ≤ft( 1,1 \right) + 20

Author(s)

Paula Branco [email protected], Rita Ribeiro [email protected] and Luis Torgo [email protected]

Examples

1
2
3
4

UBL documentation built on July 13, 2017, 5:02 p.m.