knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Install the outlierfix package for setting up
library(outlierfix)
Let us create some sample data for fixing outliers later.
dt <- data.frame(age= c(89,23, 26, 21, 31, 38, 34, 25, 65,32, 36 ,38, 35 ,30 ,35, 20,11),income=c(1200,1400,1350,1600,7580,3620,2500,4230,4830,3820,5230,3360,2860,3120,2970,3000,4500)) dt
Firstly, we focus on the age, set the age limit within 20-60, which make more sense for working age. When interactive==TRUE, users can enter the number of chosen method for identifying outliers, according to the distribution and boxplot of data to choose the subsequent fix method.
fixout_conti(dt, col_name="age",rangeLU=c(20,60),plot=TRUE)
choose column income to fix the outliers automatically. In this case, we want to have all figures in range but exclude the income equals to 1200. Missing values are excluded in default.
fixout_conti(dt, col_name ="income",interactive = FALSE,exclude = 1200, plot = TRUE)
Save the output as new data table.
dt1=fixout_conti(dt, col_name ="income",interactive = FALSE,exclude = 1200, plot = TRUE) dt1
This function can be used to process multiple columns in a data table when leveraging loop. It is worth noting this approach is not suitable when the fixed method related to "drop" because some rows of data has been dropped while processing sequentially.
col_names <- c("age", "income" ) fix.m <-c("asNARight", "asNALeft" ) dt1 <- dt #remove outliers for( i in 1: length(col_names)){ dt1=fixout_conti(dt1,col_name=col_names[i], fix.m = fix.m[i]) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.