annotateData: Annotate a dataset.
In clinDataReview: Clinical Data Review Tool

View source: R/dataManipulation-annotateData.R

annotateData

R Documentation

Annotate a dataset.

Description

Standard annotation variables are available via the parameter annotType. Custom dataset/variables of interest are specified via the annotDataset/annotVar parameters.

Usage

annotateData(
  data,
  dataPath = ".",
  annotations,
  subjectVar = "USUBJID",
  verbose = FALSE,
  labelVars = NULL,
  labelData = "data"
)

Arguments

`data`	Data.frame with input data to annotate.
`dataPath`	String with path to the data.
`annotations`	Annotations (or list of those) either as a: string with standard annotation type, among: demographics: standard variables from the demographics data (DM or ADSL) are extracted exposed_subjects: a logical variable: `EXFL` is added to `data`, identifying exposed subjects, i.e. subjects included in the exposure dataset (EX/ADEX) dataset and with non empty and non missing start date ('EXSTDTC', 'STDY' or 'ASTDY') functional_groups_lab: a character variable: 'LBFCTGRP' is added to `data` based on standard naming of the parameter code ('PARAMCD' or 'LBTESTCD' variable) list of custom annotation, with: (optional) annotation dataset, either: 'dataset': String with name of the annotation dataset, e.g. 'ex' to import data from the file: '[dataset].sas7bdat'in `dataPath` 'data': Data.frame with annotation dataset The input `data` is used if 'data' and 'dataset' are not specified. 'vars': Either: Character vector with variables of interest from annotation dataset. If not specified, all variables of the dataset are considered. String with new variable name computed from `varFct` 'varFct': (optional) Either: function of `data` or string containing such function (e.g. 'function(data) ...') string containing manipulations from column names of `data` (e.g. 'col1 + col2') used to create a new variable specified in `vars`. 'filters': (optional) Filters for the annotation dataset, see `filters` parameter of `filterData`. The annotation dataset is first filtered, before being combined to the input `data`, such as only the records retained in the annotation dataset will be annotated in the output `data`. Other records will have missing values in the annotated variables. 'varLabel': (optional) label for new variable in case `varFct` is specified. 'varsBy': (optional) Character vector with variables used to merge input data and the annotation dataset. If not specified: if an external dataset (`dataset`/`data`) is specified: `subjectVar` is used otherwise: annotation dataset and input data are merged by rows IDs
`subjectVar`	String with subject ID variable, 'USUBJID' by default.
`verbose`	Logical, if TRUE (FALSE by default) progress messages are printed in the current console. For the visualizations, progress messages during download of subject-specific report are displayed in the browser console.
`labelVars`	Named character vector containing variable labels of `data`. This will be updated with the labels of the extra annotation variables (in `attr(output, 'labelVars')`).
`labelData`	(optional) String with label for input `data`, that will be included in progress messages.

Value

Annotated data. If labelVars is specified, the output contains an extra attribute: 'labelVars' containing updated labelVars (accessible via: in attr(output, 'labelVars')).

Examples

library(clinUtils)

data(dataADaMCDISCP01)

dataLB <- dataADaMCDISCP01$ADLBC
dataDM <- dataADaMCDISCP01$ADSL
dataAE <- dataADaMCDISCP01$ADAE

labelVars <- attr(dataADaMCDISCP01, "labelVars")

# standard annotations:

# path to dataset should be specified via: 'pathData'
# annotateData(dataLB, annotations = "demographics", pathData = ...)

# add all variables in annotation data (if not already available)
head(annotateData(dataLB, annotations = list(data = dataDM)), 1)

# only variables of interest
head(annotateData(dataLB, annotations = list(data = dataDM, vars = c("ARM", "ETHNIC"))), 1)

# filter annotation dataset
dataAnnotated <- annotateData(dataLB, 
	annotations = list(
		data = dataDM, 
		vars = c("ARM", "ETHNIC"), 
		filters = list(var = "ARM", value = "Placebo")
	)
)
head(subset(dataAnnotated, ARM == "Placebo"), 1)
head(subset(dataAnnotated, is.na(ARM)), 1)

# worst-case scenario: add a new variable based on filtering condition
dataAE$AESEV <- factor(dataAE$AESEV, levels = c('MILD', "MODERATE", "SEVERE"))
dataAEWC <- annotateData(
	data = dataAE,
	annotations = list(
		vars = "WORSTINT", 
		# create new variable: 'WORSTINT' 
		# with TRUE if maximum toxicity grade per subject/test 
		# (if multiple, they are all retained)
		filters = list(
			var = "AESEV", 
			# max will take latest level in a factor 
			# (so 'MODERATE' if 'MILD'/'MODERATE' are available)
			valueFct = function(x) x[which.max(as.numeric(x))],
			varsBy = c("USUBJID", "AEDECOD"),
			keepNA = FALSE,
			varNew = "WORSTINT", 
			labelNew = "worst-case"
		)
	),
	labelVars = labelVars,
	verbose = TRUE
)
attr(dataAEWC, "labelVars")["WORSTINT"]

# add a new variable based on a combination of variables:
dataLB <- annotateData(dataLB, 
	annotations = list(vars = "HILORATIO", varFct = "A1HI / A1LO")
)

# add a new variable based on extraction of a existing variable
# Note: slash should be doubled when the function is specified as text
dataLB <- annotateData(dataLB, 
	annotations = list(vars = "PERIOD", varFct = "sub('.* Week (.+)', 'Week \\\\1', AVISIT)")
)

# multiple annotations:
dataAnnotated <- annotateData(dataLB, 
	annotations = list(
		list(data = dataDM, vars = c("ARM", "ETHNIC")),
		list(data = dataAE, vars = c("AESEV"))
	)
)
head(dataAnnotated, 1)

clinDataReview documentation built on April 12, 2025, 1:14 a.m.