quantileTrim: Filter data by interquartile range

View source: R/np_utility.R

quantileTrimR Documentation

Filter data by interquartile range

Description

quantileTrim takes a numeric vector and removes data points that fall more than threshold * the interquartile range outside of the interquartile range. If returnFilter is set to TRUE then the function returns a named list with the trimmed data and a logical vector

Usage

quantileTrim(x, threshold = 3, na.rm = FALSE, returnFilter = FALSE)

Arguments

x

a numeric vector or a object compatible with the quantile function

threshold

numeric; the number of interquartile ranges out side of the inner 50% range of the data to use as a cutoff from trimming. Typical values include 1.5 for outliers and 3 for extreme outliers.

na.rm

logical; if true will remove all NA values from x before analyzing the data.

returnFilter

logical; will cause the function to return a list including with both the trimmed data and a logical vector that can be used to filter objects of the same length as x.

Details

The interquartile range (IQR) also known as the H-spread, represents the range encompassing the middle 50 This is is used to as a measure of dispersion around the median and more frequently to detect outlier data points. Here data points are filtered if x < Q_{1} - threshold\times IQR and x > Q_{3} + threshold\times IQR where Q_{1} and Q_{3} represent the cumulative 25

Value

The trimmed numeric vector or a returnFilter is TRUE then a named list labeled data and filter is returned with the trimmed data and the logical filtering vector, respectively.

See Also

quantile.

Examples

x<-rnorm(1000)
paste0(mean(x)," (",range(x),")")
x<-quantileTrim(x,threshold=1.5)
paste0(mean(x)," (",range(x),")")

#Example using the filter function:
myData<-c(NA,rnorm(100),NA,NA,rnorm(100),NA,NA,NA,rnorm(300),NA,10000)
myIndex<-1:508
newData<-quantileTrim(myData,na.rm=TRUE,returnFilter=TRUE)
identical(newData$data,myData[newData$filter])

ZachHunter/NicePlots.R documentation built on Sept. 23, 2023, 4:04 a.m.