knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Here we will demonstrate how to use py_outliers_utils to deal with the outliers in a dataset and plot the distribution of the dataset:

Libarary package

library(routliersutils)

Create a dataframe

We need to create a dataframe to work with.

df <- data.frame(SepalLengthCm = c(5.1, 4.9, 4.7, 5.5, 5.1, 10, 54, 5.0, 5.2, 5.3, 5.1),
                          SepalWidthCm = c(1.4, 1.4, 10, 2.0, 0.7, 1.6, 1.2, 1.4, 1.8, 1.5, 2.1),
                          PetalWidthCm = c(0.2, 0.2, 0.2, 0.3, 0.4, 0.5, 0.5, 0.6, 0.4, 0.2, 5))
df

Identify outliers

We can identify outliers using outlier_identifier. Note that this function will return a dataframe with the summary of the outlier identified by the method, with an additional column having if row has outlier or not if return_df = True.

outlier_identifier(df, columns=c('SepalWidthCm',  'PetalWidthCm'), return_df=FALSE)

Trim outliers

We can trim outliers using trim_outliers. This function will return a dataframe which the outlier has already process by the chosen method.

trim_outliers(df,identifier='IQR', method='trim')

Visualize outliers

We can trim outliers using visualize_outliers. This function will return a ggplot of data distribution with given method.

visualize_outliers(df, columns=c("SepalWidthCm", "PetalWidthCm"), type="violin")


UBC-MDS/r_outliers_utils documentation built on Feb. 7, 2022, 9:12 a.m.