knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of tidy.outliers is to allow for easy usage of many outliers removal methods, currently implemented are:
Simple methods:
Model Methods:
The package works on the principal that all basic step_outlier_* functions return an outlier "score" that can be used for filtering outliers where 0 is a very low outlier score and 1 is a very high outlier score, so you could filter, for example all rows where the outlier score is greater than .9.
You can not yet install the released version of tidy.outliers from CRAN with:
#install.packages("tidy.outliers")
And the development version from GitHub with:
# install.packages("devtools") devtools::install_github("brunocarlin/tidy.outliers")
library(recipes) library(tidy.outliers)
I keep the mpg as an example outcome since you should remove outlier from your outcome, you also shouldn't remove outlier from testing data so the default is to skip the steps of the package when predicting.
rec_obj <- recipe(mpg ~ ., data = mtcars) |> step_outliers_maha(all_numeric(), -all_outcomes()) |> step_outliers_lookout(all_numeric(),-contains(r"(.outliers)"),-all_outcomes()) |> prep(mtcars)
bake(rec_obj,new_data = NULL) |> select(contains(r"(.outliers)")) |> arrange(.outliers_lookout |> desc())
rec_obj2 <- recipe(mpg ~ ., data = mtcars) |> step_outliers_maha(all_numeric(), -all_outcomes()) |> step_outliers_lookout(all_numeric(),-contains(r"(.outliers)"),-all_outcomes()) |> step_outliers_remove(contains(r"(.outliers)")) |> prep(mtcars)
We filtered one row from the dataset and and automatically removed the extra outlier columns.
bake(rec_obj2,new_data = NULL) |> glimpse()
mtcars |> glimpse()
And we can get which were the outliers and their score
tidy(rec_obj2,number = 3) |> arrange(aggregation_results |> desc())
The package was made to play nice with tune and friends from tidymodels check out the article on our github pkgdown page!
Although it is possible to manually change the function of model parameters using the options argument it would be nice to add the option to tune those internal parameters as well.
So instead of this.
rec_obj2 <- recipe(mpg ~ ., data = mtcars) |> step_outliers_outForest( all_numeric(), -all_outcomes(), options = list( impute_multivariate_control = list( num.trees = 200 ) ))
You would write something like this
rec_obj2 <- recipe(mpg ~ ., data = mtcars) |> step_outliers_outForest( all_numeric(), -all_outcomes(), options = list( impute_multivariate_control = list( num.trees = tune::tune('tree') ) ))
The main problem is that this would require manually going model by model and incorporating those arguments as tunable components.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.