impute: Impute outliers

Description Usage Arguments Details Value

View source: R/outlier_detection.R

Description

Impute detected outliers in a multidimensional data set

Usage

1
2
3
impute(x, flag = NULL, fill = "mean", level = 0.1, nmax = NULL,
  side = NULL, crit = "lof", k = 5, metric = "euclidean", q = 3,
  ...)

Arguments

x

a matrix, data frame or vector of data points (a vector will be understood as 1D data, equivalent to a 1-column matrix). Each row is a data point and each column is a dimension. NA values are allowed and will produce NAs in the output.

flag

a boolean or integer (0-or-1) vector flagging outliers, such as produced by the function flag. If NULL, further arguments will be used to compute it here by calling flag.

fill

method for imputing (or removing) outliers. If numeric or NA, it is the value that will replace the outliers. It the data is K-dimensional, fill is expected to be a vector of length K. If longer, the first K components will be used, and if shorter, the vector will be extended by NAs. Alternatively, fill can be a character string. Values 'mean' and 'median' replace outliers with the mean or (multidimensional) median of the rest of the remaining data, 'random' generates random replacement values drawn from the estimated probability distribution of the non-outlier data-points, 'remove' removes the outliers by calling the function purge. Any unambiguous substring can be given, case insensitive.

level

passed to the function flag if the argument 'flag' is NULL or missing

nmax

passed to the function flag if the argument 'flag' is NULL or missing

side

passed to the function flag if the argument 'flag' is NULL or missing

crit

passed to the function flag if the argument 'flag' is NULL or missing

k

passed to the function flag if the argument 'flag' is NULL or missing

metric

distance metric to be used in LOF (if flag is not provided) as well as for multidimensional median if fill is 'median'. A choice of 'euclidean','maximum','manhattan','canberra','minkowski', or 'binary'. Any unambiguous substring can be given, case insensitive.

q

power in Minkowski metric, used if fill='median' and metric='minkowski'

...

passed to Rcgmin if the argument fill is 'median' and data is multidimensional

Details

The output object will be a vector, a matrix or a data-frame, depending on what x was. Row names, column names or (if x was a named vector) names will be kept.

Value

object like x but with outliers imputed.


rushkin/outlieR documentation built on Dec. 20, 2020, 6:11 a.m.