f_clean_data: f_clean_data

Description Usage Arguments Details Value See Also Examples

View source: R/f_clean.R

Description

Performs a number of cleaning operations on a dataframe, detects numerical and categorical columns and returns a list containing the cleaned dataframe and vectors naming the columns with a specific data type.

Usage

1
2
3
4
f_clean_data(data, max_number_of_levels_factors = 10,
  min_number_of_levels_nums = 6, exclude_missing = T,
  replace_neg_values_with_zero = T, allow_neg_values = c("null"),
  id_cols = c("null"))

Arguments

data

a dataframe

max_number_of_levels_factors

If a factor variable contains more then the maximum number of levels the levels with the lowest frequency will be collapsed into 'others', Default: 10

min_number_of_levels_nums

If a numeric number contains less that the minimum of distinct values it will be converted to a factor, Default: 6

exclude_missing

exclude observations with missing values, Default: T

replace_neg_values_with_zero

all negative values will be set to 0, Default: T

allow_neg_values

specify columns for which negative values are allowed, Default: c("null")

id_cols

specify columns containing ids.

Details

The list this function returns can be a bit tedious to work with. If you want to engineer a new feature you have to manually update the categoricals or the numericals vector. I suggest that you do all the feature engineering before applying this function. The advantage of this column is that when you get to the modelling or visualisation steps you have full control over which columns are used for the formula or for the type of visualisation even if you might have bloated your dataframe with some junk columns.

Value

returns a list

data

the cleaned dataframe as tibble

categoricals

vector of column names containing categorical data

categoricals_ordered

vector of column names containing all ordered categorical data

numericals

vector of column names containing numerical data

ids

vector of column names containing ids

See Also

f_boxcox

Examples

1
2
 data_ls = f_clean_data( mtcars , id_cols = 'names')
 str(data_ls)

erblast/oetteR documentation built on Aug. 4, 2018, 11:03 p.m.