dtize_col: Discretize a Numeric Column

View source: R/dtize_col.R

dtize_colR Documentation

Discretize a Numeric Column

Description

Discretizes a numeric vector into categories based on specified cutoff points. The function handles missing values, allows for infinite bounds, and supports predefined cutoffs such as the mean or median.

Usage

dtize_col(
  column,
  cutoff = "median",
  labels = c("low", "high"),
  include_right = TRUE,
  infinity = TRUE,
  include_lowest = TRUE,
  na_fill = "none"
)

Arguments

column

A numeric vector to discretize.

cutoff

A numeric vector specifying cutoff points, or a string ("mean" or "median").

labels

A character vector specifying labels for the resulting categories.

include_right

Logical. If TRUE, intervals are closed on the right (default TRUE).

infinity

Logical. If TRUE, extends cutoffs to -Inf and Inf (default TRUE).

include_lowest

Logical. If TRUE, the lowest interval is closed on the left (default TRUE).

na_fill

A string specifying the method to impute missing values: "none", "mean", or "median" (default "none").

Value

A factor with the same length as column, where each value is categorized based on the cutoffs.

Examples

data(BrookTrout)

# Example with predefined cutoffs
discrete_water_temp <- dtize_col(
  BrookTrout$eDNAConc, cutoff=13.3,
  labels=c("low", "high"),
  infinity=TRUE
)

# Example with median as cutoff
discrete_pH <- dtize_col(BrookTrout$pH, cutoff="median")

# Example with missing value imputation
filled_col <- dtize_col(
  c(1, 2, NA, 4, 5),
  cutoff = "mean",
  include_right=FALSE,
  na_fill = "mean"
)


RulesTools documentation built on April 3, 2025, 5:53 p.m.