winsorize | R Documentation |
Winsorize data
winsorize(data, ...)
## S3 method for class 'numeric'
winsorize(
data,
threshold = 0.2,
method = "percentile",
robust = FALSE,
verbose = TRUE,
...
)
data |
data frame or vector. |
... |
Currently not used. |
threshold |
The amount of winsorization, depends on the value of
|
method |
One of "percentile" (default), "zscore", or "raw". |
robust |
Logical, if TRUE, winsorizing through the "zscore" method is done via the median and the median absolute deviation (MAD); if FALSE, via the mean and the standard deviation. |
verbose |
Not used anymore since |
Winsorizing or winsorization is the transformation of statistics by limiting
extreme values in the statistical data to reduce the effect of possibly
spurious outliers. The distribution of many statistics can be heavily
influenced by outliers. A typical strategy is to set all outliers (values
beyond a certain threshold) to a specified percentile of the data; for
example, a 90%
winsorization would see all data below the 5th percentile set
to the 5th percentile, and data above the 95th percentile set to the 95th
percentile. Winsorized estimators are usually more robust to outliers than
their more standard forms.
A data frame with winsorized columns or a winsorized vector.
Functions to rename stuff: data_rename()
, data_rename_rows()
, data_addprefix()
, data_addsuffix()
Functions to reorder or remove columns: data_reorder()
, data_relocate()
, data_remove()
Functions to reshape, pivot or rotate data frames: data_to_long()
, data_to_wide()
, data_rotate()
Functions to recode data: rescale()
, reverse()
, categorize()
,
recode_values()
, slide()
Functions to standardize, normalize, rank-transform: center()
, standardize()
, normalize()
, ranktransform()
, winsorize()
Split and merge data frames: data_partition()
, data_merge()
Functions to find or select columns: data_select()
, extract_column_names()
Functions to filter rows: data_match()
, data_filter()
hist(iris$Sepal.Length, main = "Original data")
hist(winsorize(iris$Sepal.Length, threshold = 0.2),
xlim = c(4, 8), main = "Percentile Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = 1.5, method = "zscore"),
xlim = c(4, 8), main = "Mean (+/- SD) Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = 1.5, method = "zscore", robust = TRUE),
xlim = c(4, 8), main = "Median (+/- MAD) Winsorization"
)
hist(winsorize(iris$Sepal.Length, threshold = c(5, 7.5), method = "raw"),
xlim = c(4, 8), main = "Raw Thresholds"
)
# Also works on a data frame:
winsorize(iris, threshold = 0.2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.