R/generateOutliers.R

Defines functions .generateOutliersVec generateOutliers

Documented in generateOutliers

#' Adds Outliers to a Vector, Matrix or Data Frame
#'
#' Takes a vector, matrix or data frame and replaces some numeric values by outliers.
#'
#' @importFrom stats sd rnorm
#' @param x A vector, matrix or \code{data.frame}.
#' @param p Proportion of outliers to add to \code{x}. In case \code{x} is a \code{data.frame}, \code{p} can also be a vector of probabilities per column or a named vector (see examples).
#' @param sd_factor Each outlier is generated by shifting the original value by a realization of a normal random variable with \code{sd_factor} times the original sample standard deviation.
#' @param seed An integer seed.
#' @return \code{x} with outliers.
#' @export
#' @examples
#' generateOutliers(1:10, seed = 334, p = 0.3)
#' generateOutliers(cbind(1:10, 10:1), p = 0.2)
#' head(generateOutliers(iris))
#' head(generateOutliers(iris, p = 0.2))
#' head(generateOutliers(iris, p = c(0, 0, 0.5, 0.5, 0.5)))
#' head(generateOutliers(iris, p = c(Sepal.Length = 0.2)))
#' @seealso \code{\link{outForest}}.
generateOutliers <- function(x, p = 0.05, sd_factor = 5, seed = NULL) {
  stopifnot(p >= 0, p <= 1, is.atomic(x) || is.data.frame(x))
  if (!is.null(seed)) {
    set.seed(seed)
  }
  # vector or matrix
  if (is.atomic(x)) {
    return(.generateOutliersVec(z = x, p = p, sdf = sd_factor))
  }
  # data frame
  v <- if (is.null(names(p))) names(x) else intersect(names(p), names(x))
  x[, v] <- Map(.generateOutliersVec, z = x[, v, drop = FALSE], p = p, sdf = sd_factor)
  x
}

# Helper function
.generateOutliersVec <- function(z, p, sdf) {
  if (!is.numeric(z)) {
    return(z)
  }
  n <- length(z)
  m <- round(p * n)
  z[sample(n, m)] <- z[m] + sample(c(-1, 1), m, replace = TRUE) *
    sdf * rnorm(m, sd(z, na.rm = TRUE))
  z
}

Try the outForest package in your browser

Any scripts or data that you put into this service are public.

outForest documentation built on Jan. 31, 2022, 9:07 a.m.