outlierRemoveDataset: Calls is.outlier for all of the data columns (cols) of the...

Description Usage Arguments Value Author(s) Examples

Description

Calls "is.outlier" for all of the data columns (cols) of the provided data.frame (x) and returns the data frame with NA in place of outliers

Usage

1
outlierRemoveDataset(x, mcut = 6.2, by = NA, cols)

Arguments

x

A data.frame with sample data, metadata, etc.

mcut

Number of MADs a data point need to be from the median to be considered an outlier, default is 6.2

by

Column name to group data by for outlier removal (e.g. by line, by run, etc.), if not provided then by whole dataset

cols

Vector of column numbers or names in x to remove outliers from.

Value

Returns data frame in the same format as input, but with outliers in each of the specified columns changed to NA.

Author(s)

Greg Ziegler

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
set.seed(1)
x <- rnorm(100)
y <- rnorm(100)
z <- rnorm(100)
x <- c(-10, x, 10)
y <- c(-20, y, 20)
z <- c(-30, z, 30)
df <- data.frame(id=sample(LETTERS[1:5],length(x),replace=TRUE),x,y,z)
#By entire dataset
dfOR <- outlierRemoveDataset(df,6.2,by=NA,c("x","y","z"))
summary(dfOR)
#Look for outliers within groups
dfOR <- outlierRemoveDataset(df,6.2,by="id",c("x","y","z"))
summary(dfOR)

## The function is currently defined as
function (x, mcut = 6.2, by = NA, cols) 
{
    for (i in cols) {
        if (is.na(by)) {
            x[, i] <- is.outlier(x[, i], mcut)
        }
        else {
            for (j in unique(x[, by])) {
                if (is.na(j)) {
                  x[is.na(x[, by]), i] <- is.outlier(x[is.na(x[, 
                    by]), i], mcut)
                }
                else {
                  x[x[, by] == j & !(is.na(x[, by])), i] <- is.outlier(x[x[, 
                    by] == j & !(is.na(x[, by])), i], mcut)
                }
            }
        }
    }
    return(x)
  }

gziegler/ionomicsUtils documentation built on June 20, 2019, 8:04 p.m.