Description Usage Arguments Value Author(s) Examples
This function fills missing values - a median for numeric variables and a mode for categorical variables (factors). Additionally, the outliers from numeric variables are replaced according to the IQR rule for outliers. In factors rare levels are merged into 'Other' level.
1 | hugo_clean_data(data, prop = 0.01, num_to_fac_amount = 5)
|
data |
|
prop |
proportion of occurence of the level in a categorical variable which decides which levels are rare |
num_to_fac_amount |
numeric columns with less than |
data.frame
that has been cleaned
Eliza Kaczorek
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ## Not run:
# Dataset in base R: airquality
# There are 44 missing values
sum(is.na(airquality))
hugo_clean_data(airquality)
# The data was cleaned.
# Two original rows from data:
# Ozone Solar.R Wind Temp Month Day
# 8 19 20.1 61 5 9
# NA NA 14.3 56 5 5
# After cleaning:
# Ozone Solar.R Wind Temp Month Day
# 8 19 17.65 61 5 9
# 31.5 205 14.30 56 5 5
# We can see that the outlier in 'Wind' was
# replaced by the value Q3+1.5*IGR for this column.
# Missing values were replaced with medians.
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.