knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Here we will demonstrate how to use py_outliers_utils to deal with the outliers in a dataset and plot the distribution of the dataset:
library(routliersutils)
We need to create a dataframe to work with.
df <- data.frame(SepalLengthCm = c(5.1, 4.9, 4.7, 5.5, 5.1, 10, 54, 5.0, 5.2, 5.3, 5.1), SepalWidthCm = c(1.4, 1.4, 10, 2.0, 0.7, 1.6, 1.2, 1.4, 1.8, 1.5, 2.1), PetalWidthCm = c(0.2, 0.2, 0.2, 0.3, 0.4, 0.5, 0.5, 0.6, 0.4, 0.2, 5)) df
We can identify outliers using outlier_identifier
. Note that this function will return a dataframe with the summary of the outlier identified by the method, with an additional column having if row has outlier or not if return_df = True.
outlier_identifier(df, columns=c('SepalWidthCm', 'PetalWidthCm'), return_df=FALSE)
We can trim outliers using trim_outliers
. This function will return a dataframe which the outlier has already process by the chosen method.
trim_outliers(df,identifier='IQR', method='trim')
We can trim outliers using visualize_outliers
. This function will return a ggplot of data distribution with given method.
visualize_outliers(df, columns=c("SepalWidthCm", "PetalWidthCm"), type="violin")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.