assign: Derive an SDTM variable

assign_no_ctR Documentation

Derive an SDTM variable

Description

  • assign_no_ct() maps a variable in a raw dataset to a target SDTM variable that has no terminology restrictions.

  • assign_ct() maps a variable in a raw dataset to a target SDTM variable following controlled terminology recoding.

Usage

assign_no_ct(
  tgt_dat = NULL,
  tgt_var,
  raw_dat,
  raw_var,
  id_vars = oak_id_vars()
)

assign_ct(
  tgt_dat = NULL,
  tgt_var,
  raw_dat,
  raw_var,
  ct_spec,
  ct_clst,
  id_vars = oak_id_vars()
)

Arguments

tgt_dat

Target dataset: a data frame to be merged against raw_dat by the variables indicated in id_vars. This parameter is optional, see section Value for how the output changes depending on this argument value.

tgt_var

The target SDTM variable: a single string indicating the name of variable to be derived.

raw_dat

The raw dataset (dataframe); must include the variables passed in id_vars and raw_var.

raw_var

The raw variable: a single string indicating the name of the raw variable in raw_dat.

id_vars

Key variables to be used in the join between the raw dataset (raw_dat) and the target data set (raw_dat).

ct_spec

Study controlled terminology specification: a dataframe with a minimal set of columns, see ct_spec_vars() for details.

ct_clst

A codelist code indicating which subset of the controlled terminology to apply in the derivation.

Value

The returned data set depends on the value of tgt_dat:

  • If no target dataset is supplied, meaning that tgt_dat defaults to NULL, then the returned data set is raw_dat, selected for the variables indicated in id_vars, and a new extra column: the derived variable, as indicated in tgt_var.

  • If the target dataset is provided, then it is merged with the raw data set raw_dat by the variables indicated in id_vars, with a new column: the derived variable, as indicated in tgt_var.

Examples


md1 <-
  tibble::tibble(
    oak_id = 1:14,
    raw_source = "MD1",
    patient_number = 101:114,
    MDIND = c(
      "NAUSEA", "NAUSEA", "ANEMIA", "NAUSEA", "PYREXIA",
      "VOMITINGS", "DIARHHEA", "COLD",
      "FEVER", "LEG PAIN", "FEVER", "COLD", "COLD", "PAIN"
    )
  )

assign_no_ct(
  tgt_var = "CMINDC",
  raw_dat = md1,
  raw_var = "MDIND"
)

cm_inter <-
  tibble::tibble(
    oak_id = 1:14,
    raw_source = "MD1",
    patient_number = 101:114,
    CMTRT = c(
      "BABY ASPIRIN",
      "CORTISPORIN",
      "ASPIRIN",
      "DIPHENHYDRAMINE HCL",
      "PARCETEMOL",
      "VOMIKIND",
      "ZENFLOX OZ",
      "AMITRYPTYLINE",
      "BENADRYL",
      "DIPHENHYDRAMINE HYDROCHLORIDE",
      "TETRACYCLINE",
      "BENADRYL",
      "SOMINEX",
      "ZQUILL"
    ),
    CMROUTE = c(
      "ORAL",
      "ORAL",
      NA,
      "ORAL",
      "ORAL",
      "ORAL",
      "INTRAMUSCULAR",
      "INTRA-ARTERIAL",
      NA,
      "NON-STANDARD",
      "RANDOM_VALUE",
      "INTRA-ARTICULAR",
      "TRANSDERMAL",
      "OPHTHALMIC"
    )
  )

# Controlled terminology specification
(ct_spec <- read_ct_spec_example("ct-01-cm"))

assign_ct(
  tgt_dat = cm_inter,
  tgt_var = "CMINDC",
  raw_dat = md1,
  raw_var = "MDIND",
  ct_spec = ct_spec,
  ct_clst = "C66729"
)

# Variables are derived in sequence from multiple input sources.
# For each target variable, only missing (`NA`) values are filled
# during each step—previously assigned (non-missing) values are retained.

cm_raw <-
  tibble::tibble(
    oak_id = 1:4,
    raw_source = "cm_raw",
    patient_number = 370L + oak_id,
    PATNUM = patient_number,
    IT.CMTRT = c("BABY ASPIRIN", "CORTISPORIN", NA, NA),
    IT.CMTRTOTH = c("Other Treatment - ", NA, "Other Treatment - Baby Aspirin", NA)
  )

cm_raw

# Derivation of `CMTRT` first from `IT.CMTRT` and then from `IT.CMTRTOTH`.
assign_no_ct(
  raw_dat = cm_raw,
  raw_var = "IT.CMTRT",
  tgt_var = "CMTRT"
) |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "IT.CMTRTOTH",
    tgt_var = "CMTRT"
  )

# Derivation of `CMTRT` first from `IT.CMTRTOTH` and then from `IT.CMTRT`.
assign_no_ct(
  raw_dat = cm_raw,
  raw_var = "IT.CMTRTOTH",
  tgt_var = "CMTRT"
) |>
  assign_no_ct(
    raw_dat = cm_raw,
    raw_var = "IT.CMTRT",
    tgt_var = "CMTRT"
  )

# Another example of variables derived in sequence from multiple input
# sources but now with controlled terminology remapping, in this case,
# CDISC Dose Unit (C71620) recoding.

cm_raw2 <- tibble::tibble(
  oak_id = c(1:3, 6, 8:10, 12:14),
  raw_source = "cm_raw",
  patient_number = c(rep(375L, 2), 376:377, rep(378L, 3), rep(379L, 3)),
  PATNUM = patient_number,
  `IT.DOSUO` = c(NA, NA, NA, NA, NA, "Other Dose Unit", "cap", NA, NA, NA),
  `IT.CMDOSU` = c("mg", "Gram", NA, "Tablet", "g", "mg", NA, "IU", "mL", "%")
)

assign_ct(
  raw_dat = cm_raw2,
  raw_var = "IT.DOSUO",
  tgt_var = "CMDOSU",
  ct_spec = ct_spec,
  ct_clst = "C71620",
  # Dose Unit
  id_vars = oak_id_vars()
) |>
  assign_ct(
    raw_dat = cm_raw2,
    raw_var = "IT.CMDOSU",
    tgt_var = "CMDOSU",
    ct_spec = ct_spec,
    ct_clst = "C71620",
    id_vars = oak_id_vars()
  )


sdtm.oak documentation built on June 9, 2025, 5:10 p.m.