ez.outlier | R Documentation |
univariate outlier cleanup
ez.outlier(
x,
col = NULL,
method = c("z", "mad", "iqr"),
cutoff = NA,
fillout = c("null", "na", "mean", "median"),
hack = FALSE,
plot = FALSE,
na.rm = TRUE,
print2scr = TRUE
)
x |
a data frame or a vector |
col |
passed to |
method |
z score, mad, or IQR (John Tukey) |
cutoff |
abs(x) > cutoff will be treated as outliers. Default/auto values (i.e. if NA):
|
fillout |
how to process outlier, fill with na, mean, median (columnwise for data frame), or null –> remove outlier (only for vector or df with single col specified, auto switch to na if otherwise) |
hack |
call mapply to try all method and cutoff (same length or scalar, ie, different methods with corresponding cutoff, or same method with different cutoff). |
plot |
boxplot and hist before and after outlier processing. |
returns a new data frame or vector. If hack=T, returns nothings
univariate outlier approach
The Z-score method relies on the mean and standard deviation of a group of data to measure central
tendency and dispersion. This is troublesome, because the mean and standard deviation are highly
affected by outliers – they are not robust. In fact, the skewing that outliers bring is one of the
biggest reasons for finding and removing outliers from a dataset!
Another drawback of the Z-score method is that it behaves strangely in small datasets – in fact,
the Z-score method will never detect an outlier if the dataset has fewer than 12 items in it.
Median absolute deviation, modified z-score. The median and MAD are robust measures of central tendency and dispersion, respectively.
Interquartile range method is that, like the modified Z-score method, it uses a robust measure of dispersion.
set.seed(1234)
x = rnorm(10)
iris %>% ez.outlier(1,fill='na',plot=T,hack=T,method=c('mad'),cutoff=c(1,3,2))
iris %>% ez.outlier(1,fill='null',plot=T,hack=T,method=c('z','mad','iqr'),cutoff=c(3,5,1.5))
iris %>% ez.outlier(1,fill='null',plot=T,hack=T,method=c('z','mad','iqr'),cutoff=NA)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.