R/generateOutliers.R

Defines functions .generateOutliersVec generateOutliers

Documented in generateOutliers

#' Adds Outliers
#'
#' Takes a vector, matrix or data frame and replaces some numeric values by outliers.
#'
#' @param x A vector, matrix or `data.frame`.
#' @param p Proportion of outliers to add to `x`. In case `x` is a `data.frame`, `p` can
#'   also be a vector of probabilities per column or a named vector (see examples).
#' @param sd_factor Each outlier is generated by shifting the original value by a
#'   realization of a normal random variable with `sd_factor` times
#'   the original sample standard deviation.
#' @param seed An integer seed.
#' @returns `x` with outliers.
#' @export
#' @examples
#' generateOutliers(1:10, seed = 334, p = 0.3)
#' generateOutliers(cbind(1:10, 10:1), p = 0.2)
#' head(generateOutliers(iris))
#' head(generateOutliers(iris, p = 0.2))
#' head(generateOutliers(iris, p = c(0, 0, 0.5, 0.5, 0.5)))
#' head(generateOutliers(iris, p = c(Sepal.Length = 0.2)))
#' @seealso [outForest()]
generateOutliers <- function(x, p = 0.05, sd_factor = 5, seed = NULL) {
  stopifnot(p >= 0, p <= 1, is.atomic(x) || is.data.frame(x))
  if (!is.null(seed)) {
    set.seed(seed)
  }
  # vector or matrix
  if (is.atomic(x)) {
    return(.generateOutliersVec(z = x, p = p, sdf = sd_factor))
  }
  # data frame
  v <- if (is.null(names(p))) names(x) else intersect(names(p), names(x))
  x[, v] <- Map(.generateOutliersVec, z = x[, v, drop = FALSE], p = p, sdf = sd_factor)
  x
}

# Helper function
.generateOutliersVec <- function(z, p, sdf) {
  if (!is.numeric(z)) {
    return(z)
  }
  n <- length(z)
  m <- round(p * n)
  z[sample(n, m)] <- z[m] + sample(c(-1, 1), m, replace = TRUE) *
    sdf * stats::rnorm(m, stats::sd(z, na.rm = TRUE))
  z
}

Try the outForest package in your browser

Any scripts or data that you put into this service are public.

outForest documentation built on May 31, 2023, 5:55 p.m.