dtize_df: Discretize Dataframe Columns
In RulesTools: Preparing, Analyzing, and Visualizing Association Rules

View source: R/dtize_df.R

dtize_df

R Documentation

Discretize Dataframe Columns

Description

Discretizes numeric columns of a dataframe based on specified splitting criteria, and handles missing values using specified imputation methods.

Usage

dtize_df(
  data,
  cutoff = "median",
  labels = c("low", "high"),
  include_right = TRUE,
  infinity = TRUE,
  include_lowest = TRUE,
  na_fill = "none",
  m = 5,
  maxit = 5,
  seed = NULL,
  printFlag = FALSE
)

Arguments

`data`	A dataframe containing the data to be discretized.
`cutoff`	A character string specifying the splitting method for numeric columns. Options are `"median"` (default), `"mean"` or a custom numeric vector of split points.
`labels`	A character vector of labels for the discretized categories. Default is `c("low", "high")`.
`include_right`	A logical value indicating if the intervals should be closed on the right. Default is `TRUE`.
`infinity`	A logical value indicating if the split intervals should extend to infinity. Default is `TRUE`.
`include_lowest`	A logical value indicating if the lowest value should be included in the first interval. Default is `TRUE`.
`na_fill`	A character string specifying the imputation method for handling missing values. Options are `"none"` (default), `"mean"`, `"median"`, or `"pmm"` (predictive mean matching).
`m`	An integer specifying the number of multiple imputations if `na_fill = "pmm"`. Default is `5`.
`maxit`	An integer specifying the maximum number of iterations for the `mice` algorithm. Default is `5`.
`seed`	An integer seed for reproducibility of the imputation process. Default is `NULL`.
`printFlag`	A logical value indicating if `mice` should print logs during imputation. Default is `FALSE`.

Value

A dataframe with numeric columns discretized and missing values handled based on the specified imputation method.

Examples

data(BrookTrout)

# Example with median as cutoff
med_df <- dtize_df(
  BrookTrout,
  cutoff="median",
  labels=c("below median", "above median")
)

# Example with mean as cutoff
mean_df <- dtize_df(
  BrookTrout,
  cutoff="mean",
  include_right=FALSE
)

# Example with missing value imputation
air <- dtize_df(
  airquality,
  cutoff="mean",
  na_fill="pmm",
  m=10,
  maxit=10,
  seed=42
)

RulesTools documentation built on April 3, 2025, 5:53 p.m.