categorize_scale: Categorize numeric variables into categories
In joon-e/tidycomm: Data Modification and Analysis for Communication Research

categorize_scale

R Documentation

Categorize numeric variables into categories

Description

This function recodes one or more numeric variables into categorical variables based on a specified lower end, upper end, and intermediate breaks. The intervals created include the right endpoint of the interval. For example, breaks = c(2, 3) with lower_end = 1 and upper_end = 5 creates intervals from 1 to <= 2, >2 to <= 3, and >3 to <= 5. If the lower or upper ends are not provided, the function defaults to the minimum and maximum values of the data and issues a warning. This default behavior is prone to errors, however, because a scale may not include its actual lower and upper ends which might in turn affect the recoding process. Hence, it is strongly suggested to manually set the lower and upper bounds of the original continuous scale.

Usage

categorize_scale(
  data,
  ...,
  breaks,
  labels,
  lower_end = NULL,
  upper_end = NULL,
  name = NULL,
  overwrite = FALSE
)

Arguments

`data`	A tibble or a tdcmm model.
`...`	Variables to recode as factor variables in categories. If no variables are specified, all numeric columns will be recoded.
`breaks`	A vector of numeric values specifying the breaks for categorizing the data between the lower and upper ends. The breaks define the boundaries of the intervals. Setting this parameter is required.
`labels`	A vector of string labels for each interval. The number of labels must match the number of intervals defined by the breaks and lower/upper ends.Setting this parameter is required.
`lower_end`	Optional numeric value specifying the lower end of the scale. If not provided, defaults to the minimum value of the data.
`upper_end`	Optional numeric value specifying the upper end of the scale. If not provided, defaults to the maximum value of the data.
`name`	Optional string specifying the name of the new variable(s). By default, the new variable names are the original variable names suffixed with `⁠_cat⁠`.
`overwrite`	Logical indicating whether to overwrite the original variable(s) with the new categorical variables. If `TRUE`, the original variable(s) are overwritten.

Value

A modified tibble or tdcmm model with the recoded variables.

Examples

WoJ %>%
dplyr::select(trust_parliament, trust_politicians) %>%
categorize_scale(trust_parliament, trust_politicians,
lower_end = 1, upper_end = 5, breaks = c(2, 3),
labels = c("Low", "Medium", "High"), overwrite = FALSE)
WoJ %>%
dplyr::select(autonomy_selection) %>%
categorize_scale(autonomy_selection, breaks = c(2, 3, 4),
lower_end = 1, upper_end = 5,
labels = c("Low", "Medium", "High", "Very High"),
name = "autonomy_in_categories")

joon-e/tidycomm documentation built on May 11, 2024, 9:07 a.m.