knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Install the outlierfix package for setting up

library(outlierfix)

Let us create some sample data for fixing outliers later.

dt <- data.frame(age= c(89,23, 26, 21, 31, 38, 34, 25, 65,32, 36 ,38, 35 ,30 ,35, 20,11),income=c(1200,1400,1350,1600,7580,3620,2500,4230,4830,3820,5230,3360,2860,3120,2970,3000,4500))
dt

1. Detecting and fixing outliers in an interactive way

Firstly, we focus on the age, set the age limit within 20-60, which make more sense for working age. When interactive==TRUE, users can enter the number of chosen method for identifying outliers, according to the distribution and boxplot of data to choose the subsequent fix method.

fixout_conti(dt, col_name="age",rangeLU=c(20,60),plot=TRUE)

2. Detecting and fixing outliers in an automatic way

choose column income to fix the outliers automatically. In this case, we want to have all figures in range but exclude the income equals to 1200. Missing values are excluded in default.

fixout_conti(dt, col_name ="income",interactive = FALSE,exclude = 1200, plot = TRUE)

Save the output as new data table.

dt1=fixout_conti(dt, col_name ="income",interactive = FALSE,exclude = 1200, plot = TRUE)
dt1

This function can be used to process multiple columns in a data table when leveraging loop. It is worth noting this approach is not suitable when the fixed method related to "drop" because some rows of data has been dropped while processing sequentially.

col_names <- c("age", "income" )
fix.m <-c("asNARight", "asNALeft" )
dt1 <- dt
#remove outliers 
for( i in 1: length(col_names)){
  dt1=fixout_conti(dt1,col_name=col_names[i], fix.m = fix.m[i])
}


sssEos/outlierfix documentation built on Dec. 23, 2021, 4:32 a.m.