buildOutliers: Build Outliers in Data Distribution

View source: R/buildOutliers.R

buildOutliersR Documentation

Build Outliers in Data Distribution


Builds outlier values and replaces random data points with outliers. This is an internal function and is currently not exported in the package.





numeric vector. This is the target vector which is processed for outlier generation.


It is a common occurrence to have outliers in production data. For instance, in the retail industry, there are days such as black Friday where the sales for that day are far more than the daily average for the year. For the synthetic data generated to seem similar to production data, package conjurer uses this function to build such outlier data.

This function takes a numeric vector and then randomly selects at least 1 data point and a maximum of 3 percent data points to be replaced with an outlier. The process for generating outliers is as follows. This methodology of outlier generation is based on a popular method of identifying outliers. For more details refer to the function 'outlier' in R package 'GmAMisc'.

  1. First, the interquartile range(IQR) of the numeric vector is computed.

  2. Second, a random number between 1.5 and 3 is generated.

  3. Finally, the random number above is multiplied with the IQR to compute the outlier.

    These steps mentioned above are repeated for at least once and a maximum of 3


A numeric vector with random values replaced with outlier values.

conjurer documentation built on May 1, 2022, 9:05 a.m.