dtize_df | R Documentation |
Discretizes numeric columns of a dataframe based on specified splitting criteria, and handles missing values using specified imputation methods.
dtize_df(
data,
cutoff = "median",
labels = c("low", "high"),
include_right = TRUE,
infinity = TRUE,
include_lowest = TRUE,
na_fill = "none",
m = 5,
maxit = 5,
seed = NULL,
printFlag = FALSE
)
data |
A dataframe containing the data to be discretized. |
cutoff |
A character string specifying the splitting method for numeric columns.
Options are |
labels |
A character vector of labels for the discretized categories. Default is |
include_right |
A logical value indicating if the intervals should be closed on the right. Default is |
infinity |
A logical value indicating if the split intervals should extend to infinity. Default is |
include_lowest |
A logical value indicating if the lowest value should be included in the first interval. Default is |
na_fill |
A character string specifying the imputation method for handling missing values.
Options are |
m |
An integer specifying the number of multiple imputations if |
maxit |
An integer specifying the maximum number of iterations for the |
seed |
An integer seed for reproducibility of the imputation process. Default is |
printFlag |
A logical value indicating if |
A dataframe with numeric columns discretized and missing values handled based on the specified imputation method.
data(BrookTrout)
# Example with median as cutoff
med_df <- dtize_df(
BrookTrout,
cutoff="median",
labels=c("below median", "above median")
)
# Example with mean as cutoff
mean_df <- dtize_df(
BrookTrout,
cutoff="mean",
include_right=FALSE
)
# Example with missing value imputation
air <- dtize_df(
airquality,
cutoff="mean",
na_fill="pmm",
m=10,
maxit=10,
seed=42
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.