Description Usage Arguments Details Value See Also Examples
Performs a number of cleaning operations on a dataframe, detects numerical and categorical columns and returns a list containing the cleaned dataframe and vectors naming the columns with a specific data type.
1 2 3 4 | f_clean_data(data, max_number_of_levels_factors = 10,
min_number_of_levels_nums = 6, exclude_missing = T,
replace_neg_values_with_zero = T, allow_neg_values = c("null"),
id_cols = c("null"))
|
data |
a dataframe |
max_number_of_levels_factors |
If a factor variable contains more then the maximum number of levels the levels with the lowest frequency will be collapsed into 'others', Default: 10 |
min_number_of_levels_nums |
If a numeric number contains less that the minimum of distinct values it will be converted to a factor, Default: 6 |
exclude_missing |
exclude observations with missing values, Default: T |
replace_neg_values_with_zero |
all negative values will be set to 0, Default: T |
allow_neg_values |
specify columns for which negative values are allowed, Default: c("null") |
id_cols |
specify columns containing ids. |
The list this function returns can be a bit tedious to work with. If you want to engineer a new feature you have to manually update the categoricals or the numericals vector. I suggest that you do all the feature engineering before applying this function. The advantage of this column is that when you get to the modelling or visualisation steps you have full control over which columns are used for the formula or for the type of visualisation even if you might have bloated your dataframe with some junk columns.
returns a list
data |
the cleaned dataframe as tibble |
categoricals |
vector of column names containing categorical data |
categoricals_ordered |
vector of column names containing all ordered categorical data |
numericals |
vector of column names containing numerical data |
ids |
vector of column names containing ids |
1 2 | data_ls = f_clean_data( mtcars , id_cols = 'names')
str(data_ls)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.