winsorize: Winsorize data

View source: R/winsorize.R

winsorizeR Documentation

Winsorize data

Description

Winsorize data

Usage

winsorize(data, ...)

## S3 method for class 'numeric'
winsorize(
  data,
  threshold = 0.2,
  method = "percentile",
  robust = FALSE,
  verbose = TRUE,
  ...
)

Arguments

data

data frame or vector.

...

Currently not used.

threshold

The amount of winsorization, depends on the value of method:

  • For method = "percentile": the amount to winsorize from each tail. The value of threshold must be between 0 and 0.5 and of length 1.

  • For method = "zscore": the number of SD/MAD-deviations from the mean/median (see robust). The value of threshold must be greater than 0 and of length 1.

  • For method = "raw": a vector of length 2 with the lower and upper bound for winsorization.

method

One of "percentile" (default), "zscore", or "raw".

robust

Logical, if TRUE, winsorizing through the "zscore" method is done via the median and the median absolute deviation (MAD); if FALSE, via the mean and the standard deviation.

verbose

Not used anymore since datawizard 0.6.6.

Details

Winsorizing or winsorization is the transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers. The distribution of many statistics can be heavily influenced by outliers. A typical strategy is to set all outliers (values beyond a certain threshold) to a specified percentile of the data; for example, a ⁠90%⁠ winsorization would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile. Winsorized estimators are usually more robust to outliers than their more standard forms.

Value

A data frame with winsorized columns or a winsorized vector.

See Also

  • Functions to rename stuff: data_rename(), data_rename_rows(), data_addprefix(), data_addsuffix()

  • Functions to reorder or remove columns: data_reorder(), data_relocate(), data_remove()

  • Functions to reshape, pivot or rotate data frames: data_to_long(), data_to_wide(), data_rotate()

  • Functions to recode data: rescale(), reverse(), categorize(), recode_values(), slide()

  • Functions to standardize, normalize, rank-transform: center(), standardize(), normalize(), ranktransform(), winsorize()

  • Split and merge data frames: data_partition(), data_merge()

  • Functions to find or select columns: data_select(), extract_column_names()

  • Functions to filter rows: data_match(), data_filter()

Examples

hist(iris$Sepal.Length, main = "Original data")

hist(winsorize(iris$Sepal.Length, threshold = 0.2),
  xlim = c(4, 8), main = "Percentile Winsorization"
)

hist(winsorize(iris$Sepal.Length, threshold = 1.5, method = "zscore"),
  xlim = c(4, 8), main = "Mean (+/- SD) Winsorization"
)

hist(winsorize(iris$Sepal.Length, threshold = 1.5, method = "zscore", robust = TRUE),
  xlim = c(4, 8), main = "Median (+/- MAD) Winsorization"
)

hist(winsorize(iris$Sepal.Length, threshold = c(5, 7.5), method = "raw"),
  xlim = c(4, 8), main = "Raw Thresholds"
)

# Also works on a data frame:
winsorize(iris, threshold = 0.2)


datawizard documentation built on Oct. 6, 2024, 1:08 a.m.