check_recode: Check accurate recoding of variables

View source: R/check_recode.R

check_recodeR Documentation

Check accurate recoding of variables

Description

This was written a few days after the retraction of a paper in JAMA due to an error in recoding the treatment variable (https://jamanetwork.com/journals/jama/fullarticle/2752474). This takes a data frame or tibble, fuzzy matches variable names, and produces crosstables of all matched variables. A visual inspection should reveal any miscoding.

Usage

check_recode(
  .data,
  dependent = NULL,
  explanatory = NULL,
  include_numerics = TRUE,
  ...
)

Arguments

.data

Data frame or tibble.

dependent

Optional character vector: name(s) of depdendent variable(s).

explanatory

Optional character vector: name(s) of explanatory variable(s).

include_numerics

Logical. Include numeric variables in function.

...

Pass other arguments to agrep.

Value

List of length two. The first is an index of variable combiations. The second is a nested list of crosstables as tibbles.

Examples

library(dplyr)
data(colon_s)
colon_s_small = colon_s %>%
  select(-id, -rx, -rx.factor) %>%
  mutate(
    age.factor2 = forcats::fct_collapse(age.factor,
      "<60 years" = c("<40 years", "40-59 years")),
    sex.factor2 = forcats::fct_recode(sex.factor,
    # Intentional miscode
      "F" = "Male",
      "M" = "Female")
  )

# Check
colon_s_small %>%
  check_recode(include_numerics = FALSE)

out = colon_s_small %>%
  select(-extent, -extent.factor,-time, -time.years) %>%
  check_recode()
out

# Select a tibble and expand
out$counts[[9]]
# Note this variable (node4) appears miscoded in original dataset survival::colon.

# Choose to only include variables that you actually use. 
# This uses standard Finalfit grammar. 
dependent = "mort_5yr"
explanatory = c("age.factor2", "sex.factor2")
colon_s_small %>% 
  check_recode(dependent, explanatory)

finalfit documentation built on Sept. 11, 2024, 9:01 p.m.