# R/generateOutliers.R In outForest: Multivariate Outlier Detection and Replacement

#### Documented in generateOutliers

```#' Adds Outliers to a Vector, Matrix or Data Frame
#'
#' Takes a vector, matrix or data frame and replaces some numeric values by outliers.
#'
#' @importFrom stats sd rnorm
#' @param x A vector, matrix or \code{data.frame}.
#' @param p Proportion of outliers to add to \code{x}. In case \code{x} is a \code{data.frame}, \code{p} can also be a vector of probabilities per column or a named vector (see examples).
#' @param sd_factor Each outlier is generated by shifting the original value by a realization of a normal random variable with \code{sd_factor} times the original sample standard deviation.
#' @param seed An integer seed.
#' @return \code{x} with outliers.
#' @export
#' @examples
#' generateOutliers(1:10, seed = 334, p = 0.3)
#' generateOutliers(cbind(1:10, 10:1), p = 0.2)
#' head(generateOutliers(iris, p = c(0, 0, 0.5, 0.5, 0.5)))
#' head(generateOutliers(iris, p = c(Sepal.Length = 0.2)))
generateOutliers <- function(x, p = 0.05, sd_factor = 5, seed = NULL) {
stopifnot(p >= 0, p <= 1, is.atomic(x) || is.data.frame(x))
if (!is.null(seed)) {
set.seed(seed)
}
# vector or matrix
if (is.atomic(x)) {
return(.generateOutliersVec(z = x, p = p, sdf = sd_factor))
}
# data frame
v <- if (is.null(names(p))) names(x) else intersect(names(p), names(x))
x[, v] <- Map(.generateOutliersVec, z = x[, v, drop = FALSE], p = p, sdf = sd_factor)
x
}

# Helper function
.generateOutliersVec <- function(z, p, sdf) {
if (!is.numeric(z)) {
return(z)
}
n <- length(z)
m <- round(p * n)
z[sample(n, m)] <- z[m] + sample(c(-1, 1), m, replace = TRUE) *
sdf * rnorm(m, sd(z, na.rm = TRUE))
z
}
```

## Try the outForest package in your browser

Any scripts or data that you put into this service are public.

outForest documentation built on Jan. 31, 2022, 9:07 a.m.