dtize_df: Discretize Dataframe Columns

View source: R/dtize_df.R

dtize_dfR Documentation

Discretize Dataframe Columns

Description

Discretizes numeric columns of a dataframe based on specified splitting criteria, and handles missing values using specified imputation methods.

Usage

dtize_df(
  data,
  cutoff = "median",
  labels = c("low", "high"),
  include_right = TRUE,
  infinity = TRUE,
  include_lowest = TRUE,
  na_fill = "none",
  m = 5,
  maxit = 5,
  seed = NULL,
  printFlag = FALSE
)

Arguments

data

A dataframe containing the data to be discretized.

cutoff

A character string specifying the splitting method for numeric columns. Options are "median" (default), "mean" or a custom numeric vector of split points.

labels

A character vector of labels for the discretized categories. Default is c("low", "high").

include_right

A logical value indicating if the intervals should be closed on the right. Default is TRUE.

infinity

A logical value indicating if the split intervals should extend to infinity. Default is TRUE.

include_lowest

A logical value indicating if the lowest value should be included in the first interval. Default is TRUE.

na_fill

A character string specifying the imputation method for handling missing values. Options are "none" (default), "mean", "median", or "pmm" (predictive mean matching).

m

An integer specifying the number of multiple imputations if na_fill = "pmm". Default is 5.

maxit

An integer specifying the maximum number of iterations for the mice algorithm. Default is 5.

seed

An integer seed for reproducibility of the imputation process. Default is NULL.

printFlag

A logical value indicating if mice should print logs during imputation. Default is FALSE.

Value

A dataframe with numeric columns discretized and missing values handled based on the specified imputation method.

Examples

data(BrookTrout)

# Example with median as cutoff
med_df <- dtize_df(
  BrookTrout,
  cutoff="median",
  labels=c("below median", "above median")
)

# Example with mean as cutoff
mean_df <- dtize_df(
  BrookTrout,
  cutoff="mean",
  include_right=FALSE
)

# Example with missing value imputation
air <- dtize_df(
  airquality,
  cutoff="mean",
  na_fill="pmm",
  m=10,
  maxit=10,
  seed=42
)



RulesTools documentation built on April 3, 2025, 5:53 p.m.