Converts all numerical variables into factor or character, depending on 'stringsAsFactors' parameter,
based on equal frequency criteria. The thresholds for each segment in each variable are generated based on the
discretize_get_bins function, which returns a data frame
containing the threshold for each variable. This result is must be the 'data_bins' parameter input.
Important to note that the returned data frame contains the non-transformed variables plus the transformed ones.
More info about converting numerical into categorical variables
can be found at: https://livebook.datascienceheroes.com/data-preparation.html#data_types
Input data frame
data frame generated by 'discretize_get_bins' function. It contains the variable name and the thresholds for each bin, or segment.
Boolean variable which indicates if the discretization result is character or factor. When TRUE, the segments are ordered. TRUE by default.
Data frame with the transformed variables
1 2 3 4 5 6 7 8 9 10 11
## Not run: # Getting the bins thresholds for each. If input is missing, will run for all numerical variables. d_bins=discretize_get_bins(data=heart_disease, input=c("resting_blood_pressure", "oldpeak"), n_bins=5) # Now it can be applied on the same data frame, or in a new one (for example in a predictive model that change data over time) heart_disease_discretized=discretize_df(data=heart_disease, data_bins=d_bins, stringsAsFactors=T) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.