fast_outlier_id: Analyzes the values of a given column list in a given...

Description Usage Arguments Value Examples

View source: R/fast_outliers.R

Description

Analyzes the values of a given column list in a given dataframe, identifies outliers using either the Z-Score algorithm or interquantile range algorithm. The return is a dataframe containing the following columns: column name, list containing the outlier's index position, percentaje of total counts considered outliers. Modifies an existing dataframe, with missing values imputed based on the chosen method.

Usage

1
2
3
4
5
6
fast_outlier_id(
  data,
  cols = "All",
  method = "z-score",
  threshold_low_freq = 0.05
)

Arguments

data

dataframe - Dataframe to be analyzed

cols

list - List containing the columns to be analyzed.

method

string - string indicating which method to be used to identify outliers (methods available are: "Z score" or "Interquantile Range")

threshold_low_freq

double - Indicates the threshold for evaluating outliers in categorical columns.

Value

dataframe

Examples

1
fast_outlier_id(data = iris, cols =  c("Sepal.Length", "Sepal.Width"), method = "z-score")

UBC-MDS/redahelper documentation built on April 2, 2020, 3:59 a.m.