categorize: Categorize cases based on external data and classification...
In eribul/coder: Deterministic Categorization of Items Based on External Code Data

categorize

R Documentation

Categorize cases based on external data and classification scheme

Description

This is the main function of the package, which relies of a triad of objects: (1) data with unit id:s and possible dates of interest; (2) codedata for corresponding units and with optional dates of interest and; (3) a classification scheme (classcodes object; cc) with regular expressions to identify and categorize relevant codes. The function combines the three underlying steps performed by codify(), classify() and index(). Relevant arguments are passed to those functions by codify_args and cc_args.

Usage

categorize(x, ...)

## S3 method for class 'data.frame'
categorize(x, ...)

## S3 method for class 'tbl_df'
categorize(x, ...)

## S3 method for class 'data.table'
categorize(x, ..., codedata, id, code, codify_args = list())

## S3 method for class 'codified'
categorize(
  x,
  ...,
  cc,
  index = NULL,
  cc_args = list(),
  check.names = TRUE,
  .data_cols = NULL
)

Arguments

`x`	data set with mandatory character id column (identified by argument `id = "<col_name>"`), and optional `Date` of interest (identified by argument `date = "<col_name>"`). Alternatively, the output from `codify()`
`...`	arguments passed between methods
`codedata`	external code data with mandatory character id column (identified by `id = "<col_name>"`), code column (identified by argument `code = "<col_name>"`) and optional `Date` column (identified by `codify_args = list(code_date = "<col_name>")`).
`id`	name of unique character id column found in both `x`and `codedata`. (where it must not be unique).
`code`	name of code column in `codedata`.
`codify_args`	Lists of named arguments passed to `codify()`
`cc`	`classcodes` object (or name of a default object from `all_classcodes()`).
`index`	Argument passed to `index()`. A character vector of names of columns with index weights from the corresponding classcodes object (as supplied by the `cc`argument). See `attr(cc, "indices")` for available options. Set to `FALSE` if no index should be calculated. If `NULL`, the default, all available indices (from `attr(cc, "indices")`) are provided.
`cc_args`	List with named arguments passed to `set_classcodes()`
`check.names`	Column names are based on `cc$group`, which might include spaces. Those names are changed to syntactically correct names by `check.names = TRUE`. Syntactically invalid, but grammatically correct names might be preferred for presentation of the data as achieved by `check.names = FALSE`. Alternatively, if `categorize` is called repeatedly, longer informative names might be created by `cc_args = list(tech_names = TRUE)`.
`.data_cols`	used internally

Value

Object of the same class as x with additional logical columns indicating membership of groups identified by the classcodes object (the cc argument). Numeric indices are also included if requested by the index argument.

Examples

# For this example, 1 core would suffice:
old_threads <- data.table::getDTthreads()
data.table::setDTthreads(1)

# For some patient data (ex_people) and related hospital visit code data
# with ICD 10-codes (ex_icd10), add the Elixhauser comorbidity
# conditions based on all registered ICD10-codes
categorize(
   x            = ex_people,
   codedata     = ex_icd10,
   cc           = "elixhauser",
   id           = "name",
   code         = "icd10"
)


# Add Charlson categories and two versions of a calculated index
# ("quan_original" and "quan_updated").
categorize(
   x            = ex_people,
   codedata     = ex_icd10,
   cc           = "charlson",
   id           = "name",
   code         = "icd10",
   index        = c("quan_original", "quan_updated")
)


# Only include recent hospital visits within 30 days before surgery,
categorize(
   x            = ex_people,
   codedata     = ex_icd10,
   cc           = "charlson",
   id           = "name",
   code         = "icd10",
   index        = c("quan_original", "quan_updated"),
   codify_args  = list(
      date      = "surgery",
      days      = c(-30, -1),
      code_date = "admission"
   )
)



# Multiple versions -------------------------------------------------------

# We can compare categorization by according to Quan et al. (2005); "icd10",
# and Armitage et al. (2010); "icd10_rcs" (see `?charlson`)
# Note the use of `tech_names = TRUE` to distinguish the column names from the
# two versions.

# We first specify some common settings ...
ind <- c("quan_original", "quan_updated")
cd  <- list(date = "surgery", days = c(-30, -1), code_date = "admission")

# ... we then categorize once with "icd10" as the default regular expression ...
categorize(
   x            = ex_people,
   codedata     = ex_icd10,
   cc           = "charlson",
   id           = "name",
   code         = "icd10",
   index        = ind,
   codify_args  = cd,
   cc_args      = list(tech_names = TRUE)
) %>%

# .. and once more with `regex = "icd10_rcs"`
categorize(
   codedata     = ex_icd10,
   cc           = "charlson",
   id           = "name",
   code         = "icd10",
   index        = ind,
   codify_args  = cd,
   cc_args      = list(regex = "icd10_rcs", tech_names = TRUE)
)



# column names ------------------------------------------------------------

# Default column names are based on row names from corresponding classcodes
# object but are modified to be syntactically correct.
default <-
   categorize(ex_people, codedata = ex_icd10, cc = "elixhauser",
              id = "name", code = "icd10")

# Set `check.names = FALSE` to retain original names:
original <-
  categorize(
    ex_people, codedata = ex_icd10, cc = "elixhauser",
    id = "name", code = "icd10",
    check.names = FALSE
   )

# Or use `tech_names = TRUE` for informative but long names (use case above)
tech <-
  categorize(ex_people, codedata = ex_icd10, cc = "elixhauser",
    id = "name", code = "icd10",
    cc_args = list(tech_names = TRUE)
  )

# Compare
tibble::tibble(names(default), names(original), names(tech))

# Go back to original number of threads
data.table::setDTthreads(old_threads)

eribul/coder documentation built on July 3, 2025, 9:46 p.m.