detect_outliers_per_group: Detect outliers per group

Description Usage Arguments Examples

View source: R/detect_outliers.R

Description

Detects outliers per each group in a distance matrix using a certain method and following a certain criterion.

Usage

1
2
detect_outliers_per_group(distMatrix, groups, method = "MD",
  criterion = "MAD", LOF_k = 2, MAD_trim = 2, boxplot_trim = 1.5)

Arguments

distMatrix

Numeric, distance matrix

groups

Factor, vector containing the assignation of each observation in the matrix to a specific group.

method

Character, method for measuring separation of one point from all other points. The following are accepted: "MdD": median distance; "MD": average (mean) distance; "MAH": Mahalanobis distances (outCoDa); "LOF": Local Outlier Factor Score (lof).

criterion

Numeric/Character, the criterion used for separating outliers. The following are accepted: <Numeric, 0-1>: number between 0 and 1, sets a quantile type of threshold; "boxplot": outliers are those singled out as points in a boxplot; "MAD": threshold is given by Median Absolute Deviation.

LOF_k

Numeric, when method = "LOF", the size of the neighborhood. See lof.

MAD_trim

Numeric, when criterion = "MAD", the multipler of MAD to calculate a outlier threshold.

boxplot_trim

Numeric, when criterion = "boxplot", the multipler of the interquartile range (IQR) to calculate a outlier threshold.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 

pca <- princomp(iris[, 1:4])

irisSpeciesOutliers_MD <- detect_outliers_per_group(dist(iris[, 1:4]),
                                                    iris$Species,
                                                    method = "MD",
                                                    criterion = "MAD")
irisSpeciesOutliers_LOF <- detect_outliers_per_group(dist(iris[, 1:4]),
                                                     iris$Species,
                                                     method = "LOF",
                                                     criterion = "MAD")
plot(pca$scores[, 1:2],
    col = iris$Species,
    main = "Outliers per group")
points(pca$scores[irisOutliers_MD$index, 1:2],
       col = "purple", pch = 2, cex = 1.5)
points(pca$scores[irisOutliers_LOF$index, 1:2],
       col = "orange", pch = 6, cex = 1.5)
legend(0.65 * max(pca$scores[,1]), max(pca$scores[,2]),
       c("MD", "LOF"), pch = c(2, 6), col = c("purple", "orange"))


## End(Not run)

Andros-Spica/cerUB documentation built on June 9, 2020, 9:22 p.m.