Description Usage Arguments Details Value References Examples
This utility identifies and replaces outliers for continuous variables. It assumes that the columns containing continuous variable is a data.frame and replaces the column elements that are outlier with NAs. It also provide the density plot for before and after data for users to make assessment on the quality of the data post outlier removal. Users have an option to drop the rows with outlier or keep them after replacement. By default the function will retain the rows. Optionally you can replace these values with their mean or median.
1 2 3 4 5 6 7 8 9 10 11 12 | fixout_conti(
dt,
col_name,
interactive = FALSE,
iden.m = "resistant",
fix.m = "asNABoth",
rangeLU = NULL,
k = 1.5,
exclude = NA,
logt = FALSE,
plot = FALSE
)
|
dt |
A data frame or a data.table. |
col_name |
A string indicates column name with data need to be fixed. |
interactive |
TRUE/FALSE, whether to fix the outliers in an interactive way. |
iden.m |
A string shows the chosen method for identifying outliers."resistant": standard boxplot; "asymmetric": modification of standard method to deal with (moderately) skewed data;"adjbox": adjusted boxplot for skewed distributions;"winsorize": the smallest and/or the largest values are replaced by less extreme values. |
fix.m |
A string shows the chosen method for fixing outliers. "dropBoth": drop the whole row where outliers are from both sides of the data; "dropLeft": drop the whole row where outliers are from left side of the data; "dropRight": drop the whole row where outliers are from right side of the data; "asNABoth": convert outliers from both sides of the data as missing values; "asNALeft": convert outliers from left side of the data as missing values,"asNARight": convert outliers from right side of the data as missing values; "median": replace outliers with median; "mean": replace outliers with mean. |
rangeLU |
The lower and upper limit of range for the data. |
k |
A constant to determine the lines outside the upper and lower quartiles. |
exclude |
Values that will be excluded in the numbers to be processed. Missing values are removed by default when detecting outliers. |
logt |
TRUE/FALSE, whether the numbers are transformed with an lognormal distribution. |
plot |
TRUE/FALSE, whether plots are shown. |
More details for methods of identifying can be found in Marcello D'Orazio (2021),Andri Signorell et mult. al. (2021)
A data table contains the fixed column.
Andri Signorell et mult. al. (2021). DescTools: Tools for descriptive statistics. R package version 0.99.43.
Hubert, M., and Vandervieren, E. (2008) ‘An Adjusted Boxplot for Skewed Distributions’, Computational Statistics and Data Analysis, 52, pp. 5186-5201.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis.Springer-Verlag New York, 2016.
Marcello D'Orazio (2021). univOutl: Detection of Univariate Outliers. R package version 0.3. https://CRAN.R-project.org/package=univOutl
McGill, R., Tukey, J. W. and Larsen, W. A. (1978) ‘Variations of box plots’. The American Statistician, 32, pp. 12-16.
1 2 | data <- as.data.table(mtcars)
output_table <- fixout_conti(data,col_name="wt",rangeLU=c(2.5,3.5))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.