fixout_conti: fixout_conti

Description Usage Arguments Details Value References Examples

View source: R/fixout_conti.R

Description

This utility identifies and replaces outliers for continuous variables. It assumes that the columns containing continuous variable is a data.frame and replaces the column elements that are outlier with NAs. It also provide the density plot for before and after data for users to make assessment on the quality of the data post outlier removal. Users have an option to drop the rows with outlier or keep them after replacement. By default the function will retain the rows. Optionally you can replace these values with their mean or median.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
fixout_conti(
  dt,
  col_name,
  interactive = FALSE,
  iden.m = "resistant",
  fix.m = "asNABoth",
  rangeLU = NULL,
  k = 1.5,
  exclude = NA,
  logt = FALSE,
  plot = FALSE
)

Arguments

dt

A data frame or a data.table.

col_name

A string indicates column name with data need to be fixed.

interactive

TRUE/FALSE, whether to fix the outliers in an interactive way.

iden.m

A string shows the chosen method for identifying outliers."resistant": standard boxplot; "asymmetric": modification of standard method to deal with (moderately) skewed data;"adjbox": adjusted boxplot for skewed distributions;"winsorize": the smallest and/or the largest values are replaced by less extreme values.

fix.m

A string shows the chosen method for fixing outliers. "dropBoth": drop the whole row where outliers are from both sides of the data; "dropLeft": drop the whole row where outliers are from left side of the data; "dropRight": drop the whole row where outliers are from right side of the data; "asNABoth": convert outliers from both sides of the data as missing values; "asNALeft": convert outliers from left side of the data as missing values,"asNARight": convert outliers from right side of the data as missing values; "median": replace outliers with median; "mean": replace outliers with mean.

rangeLU

The lower and upper limit of range for the data.

k

A constant to determine the lines outside the upper and lower quartiles.

exclude

Values that will be excluded in the numbers to be processed. Missing values are removed by default when detecting outliers.

logt

TRUE/FALSE, whether the numbers are transformed with an lognormal distribution.

plot

TRUE/FALSE, whether plots are shown.

Details

More details for methods of identifying can be found in Marcello D'Orazio (2021),Andri Signorell et mult. al. (2021)

Value

A data table contains the fixed column.

References

Andri Signorell et mult. al. (2021). DescTools: Tools for descriptive statistics. R package version 0.99.43.

Hubert, M., and Vandervieren, E. (2008) ‘An Adjusted Boxplot for Skewed Distributions’, Computational Statistics and Data Analysis, 52, pp. 5186-5201.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis.Springer-Verlag New York, 2016.

Marcello D'Orazio (2021). univOutl: Detection of Univariate Outliers. R package version 0.3. https://CRAN.R-project.org/package=univOutl

McGill, R., Tukey, J. W. and Larsen, W. A. (1978) ‘Variations of box plots’. The American Statistician, 32, pp. 12-16.

Examples

1
2
data <- as.data.table(mtcars)
output_table <- fixout_conti(data,col_name="wt",rangeLU=c(2.5,3.5))

sssEos/outlierfix documentation built on Dec. 23, 2021, 4:32 a.m.