create_multilevel_factor: Create data with multilevel factor categories

View source: R/create.R

create_multilevel_factorR Documentation

Create data with multilevel factor categories

Description

create_multilevel_factor will append additional rows to the provided dataset based upon new factor levels, provided as a named list to the function. This is based upon MULTILABEL FORMAT from SAS, which allows various overlapping categories to be applied to the same observations, making it easy to tally various combinations of otherwise granular factor levels.

Usage

create_multilevel_factor(
  data,
  target_col,
  new_levels,
  group_col,
  collapse = TRUE,
  track = TRUE
)

Arguments

data

A dataset, preferably as a tibble.

target_col

A character vector representing the column of interest to add new factor levels.

new_levels

A named list provided to aggregate existing factor levels into additional combined levels.

group_col

A character vector for the column that will group observations.

collapse

Logical value to determine if new rows should be collapsed as unique combinations.

track

Logical value to determine if a new column should be added to track the added factor levels by their rows.

Details

There are some limitations, the function has not been tested when converting between a numerical data (e.g. ages) to various overlapping age groups. Furthermore, the name of the list to label a new category is not vectorized. One needs to be careful of using the 'group' parameter, as it will be examine existing levels within that group before deciding how to apply a new label. This works well for seeing which levels exist, but less so to check which do not exist. The grouping function requires that a single column can define the entire grouping, so columns may need to be combined to ensure the right comparison is made.

Value

A new dataset with additional rows for the added factor categories.

Examples

library(tibble)
# Example data (Repeat groups)
exampleData <- tibble(group = c(1, 1, 1, 2, 3, 3),
                      condition = factor(c('A', 'B', 'C', 'A', 'B', 'Q'), ordered = FALSE))

# With grouping
newData <- create_multilevel_factor(exampleData,
                                   target_col = 'condition',
                                   group_col = 'group',
                                   new_levels = list('AB' = c('A', 'B'), 'QB' = c('Q', 'B')),
                                   collapse = TRUE, track = TRUE)

newData
addmargins(table(newData$group,newData$condition))

# Without grouping
newData2 <- create_multilevel_factor(exampleData,
                                   target_col = 'condition',
                                   new_levels = list('AB' = c('A', 'B'), 'QB' = c('Q', 'B')),
                                   collapse = TRUE, track = TRUE)

newData2



al-obrien/farrago documentation built on April 14, 2023, 6:20 p.m.