ImbR: Synthetic Regression Data Set

Description Usage Format Author(s) Examples

Description

Simulated data set for imbalanced domain on regression. The rare cases corresponden to the higher extreme values and are described by a circle with white noise. The normal cases have a normal distribution with the same center of the circunference with elliptical contours.

Usage

1

Format

The data set has 2 continuous features (X1 and X2) and a continuous target variable (denoted as Tgt). The rare examples, i.e, cases with higher values of the target variable occur in 5% of the data. Data set ImbR has 1000 examples.

ImbR data has been simulated as follows:

-

lower Tgt values: (X1, X2)\sim \mathbf{N}_{2} ≤ft(\mathbf{10}_{2}, \mathbf{2.5}_{2}\right)

and Tgt\sim \mathbf{Γ} ≤ft( 0.5, 1 \right) +10

-

higher Tgt values: (X1, X2)\sim ≤ft(ρ * cos(θ) + 10, ρ * sin(θ) + 10 \right), where ρ \sim \mathbf{9}_{2}+\mathbf{N}_{2} ≤ft(\mathbf{0}_{2}, \mathbf{I}_{2} \right) and θ \sim \mathbf{U}_{2} ≤ft( \mathbf{0}_{2}, 2π \mathbf{I}_{2} \right) Tgt\sim \mathbf{Γ} ≤ft( 1,1 \right) + 20

Author(s)

Paula Branco paobranco@gmail.com, Rita Ribeiro rpribeiro@dcc.fc.up.pt and Luis Torgo ltorgo@dcc.fc.up.pt

Examples

1
2
3
4

paobranco/UBL documentation built on May 6, 2021, 6:57 p.m.