ImbR: Synthetic Regression Data Set

ImbRR Documentation

Synthetic Regression Data Set

Description

Simulated data set for imbalanced domain on regression. The rare cases corresponden to the higher extreme values and are described by a circle with white noise. The normal cases have a normal distribution with the same center of the circunference with elliptical contours.

Usage

data(ImbR)

Format

The data set has 2 continuous features (X1 and X2) and a continuous target variable (denoted as Tgt). The rare examples, i.e, cases with higher values of the target variable occur in 5% of the data. Data set ImbR has 1000 examples.

ImbR data has been simulated as follows:

-

lower Tgt values: (X1, X2)\sim \mathbf{N}_{2} \left(\mathbf{10}_{2}, \mathbf{2.5}_{2}\right)

and Tgt\sim \mathbf{\Gamma} \left( 0.5, 1 \right) +10

-

higher Tgt values: (X1, X2)\sim \left(\rho * cos(\theta) + 10, \rho * sin(\theta) + 10 \right), where \rho \sim \mathbf{9}_{2}+\mathbf{N}_{2} \left(\mathbf{0}_{2}, \mathbf{I}_{2} \right) and \theta \sim \mathbf{U}_{2} \left( \mathbf{0}_{2}, 2\pi \mathbf{I}_{2} \right) Tgt\sim \mathbf{\Gamma} \left( 1,1 \right) + 20

Author(s)

Paula Branco paobranco@gmail.com, Rita Ribeiro rpribeiro@dcc.fc.up.pt and Luis Torgo ltorgo@dcc.fc.up.pt

Examples

data(ImbR)
summary(ImbR)

boxplot(ImbR$Tgt)

UBL documentation built on Oct. 8, 2023, 1:07 a.m.