discretize_df: Discretize a data frame

View source: R/discretize.R

discretize_dfR Documentation

Discretize a data frame

Description

Converts all numerical variables into factor or character, depending on 'stringsAsFactors' parameter, based on equal frequency criteria. The thresholds for each segment in each variable are generated based on the output of discretize_get_bins function, which returns a data frame containing the threshold for each variable. This result is must be the 'data_bins' parameter input. Important to note that the returned data frame contains the non-transformed variables plus the transformed ones. More info about converting numerical into categorical variables can be found at: https://livebook.datascienceheroes.com/data-preparation.html#data_types

Usage

discretize_df(data, data_bins, stringsAsFactors = TRUE)

Arguments

data

Input data frame

data_bins

data frame generated by 'discretize_get_bins' function. It contains the variable name and the thresholds for each bin, or segment.

stringsAsFactors

Boolean variable which indicates if the discretization result is character or factor. When TRUE, the segments are ordered. TRUE by default.

Value

Data frame with the transformed variables

Examples


# Getting the bins thresholds for each. If input is missing, 
# will run for all numerical variables.
d_bins=discretize_get_bins(data=heart_disease,
input=c("resting_blood_pressure", "oldpeak"), n_bins=5)

# Now it can be applied on the same data frame,
# or in a new one (for example in a predictive model that 
# change data over time)
heart_disease_discretized=
discretize_df(data=heart_disease, 
data_bins=d_bins, 
stringsAsFactors=TRUE)



funModeling documentation built on May 29, 2024, 3:24 a.m.