categorize_scale: Categorize numeric variables into categories

categorize_scaleR Documentation

Categorize numeric variables into categories

Description

This function recodes one or more numeric variables into categorical variables based on a specified lower end, upper end, and intermediate breaks. The intervals created include the right endpoint of the interval. For example, breaks = c(2, 3) with lower_end = 1 and upper_end = 5 creates intervals from 1 to <= 2, >2 to <= 3, and >3 to <= 5. If the lower or upper ends are not provided, the function defaults to the minimum and maximum values of the data and issues a warning. This default behavior is prone to errors, however, because a scale may not include its actual lower and upper ends which might in turn affect the recoding process. Hence, it is strongly suggested to manually set the lower and upper bounds of the original continuous scale.

Usage

categorize_scale(
  data,
  ...,
  breaks,
  labels,
  lower_end = NULL,
  upper_end = NULL,
  name = NULL,
  overwrite = FALSE
)

Arguments

data

A tibble or a tdcmm model.

...

Variables to recode as factor variables in categories. If no variables are specified, all numeric columns will be recoded.

breaks

A vector of numeric values specifying the breaks for categorizing the data between the lower and upper ends. The breaks define the boundaries of the intervals. Setting this parameter is required.

labels

A vector of string labels for each interval. The number of labels must match the number of intervals defined by the breaks and lower/upper ends.Setting this parameter is required.

lower_end

Optional numeric value specifying the lower end of the scale. If not provided, defaults to the minimum value of the data.

upper_end

Optional numeric value specifying the upper end of the scale. If not provided, defaults to the maximum value of the data.

name

Optional string specifying the name of the new variable(s). By default, the new variable names are the original variable names suffixed with ⁠_cat⁠.

overwrite

Logical indicating whether to overwrite the original variable(s) with the new categorical variables. If TRUE, the original variable(s) are overwritten.

Value

A modified tibble or tdcmm model with the recoded variables.

See Also

Other scaling: center_scale(), dummify_scale(), minmax_scale(), recode_cat_scale(), reverse_scale(), setna_scale(), z_scale()

Examples

WoJ %>%
dplyr::select(trust_parliament, trust_politicians) %>%
categorize_scale(trust_parliament, trust_politicians,
lower_end = 1, upper_end = 5, breaks = c(2, 3),
labels = c("Low", "Medium", "High"), overwrite = FALSE)
WoJ %>%
dplyr::select(autonomy_selection) %>%
categorize_scale(autonomy_selection, breaks = c(2, 3, 4),
lower_end = 1, upper_end = 5,
labels = c("Low", "Medium", "High", "Very High"),
name = "autonomy_in_categories")

joon-e/tidycomm documentation built on Feb. 24, 2024, 8:58 a.m.