rc_outliers: Identify outliers in REDCap records data
In chillywings/rctools: Tools for REDCap API and data manipulation

rc_outliers

R Documentation

Identify outliers in REDCap records data

Description

Identifies outliers for each variable, defined as being further from the mean than the threshold number of standard deviations. This follows the convention used within REDCap, however defaults to a standard deviation threshold of 3 rather than 2. Data is returned in long format with a column specifying outlier status.

Usage

rc_outliers(
  record_data,
  sex_var = NA,
  sd_threshold = 3,
  fields = NULL,
  filtered = FALSE,
  data_dict = getOption("redcap_bundle")$data_dict,
  mappings = getOption("redcap_bundle")$mappings,
  id_field = getOption("redcap_bundle")$id_field
)

Arguments

`record_data`	Dataframe. Records data export from REDCap. For the purposes of this function, only quantitative data will be kept.
`sex_var`	String. Name of variable indicating the sex of subjects. If included, variables will be grouped by sex when determining outliers.
`sd_threshold`	Integer. Threshold value for the number of standard deviations from the mean a value can be before being flagged as an outlier.
`fields`	Character. A vector of field/variable names to be analyzed may be passed manually.
`filtered`	Logical. When `TRUE`, only outlier values will be returned. Default is `FALSE`.
`data_dict`	Dataframe. A REDCap project data dictionary. By default, $data_dict is expected in the REDCap bundle option, as created by `rc_bundle`.
`mappings`	Dataframe. A REDCap table containing form/event mappings.
`id_field`	Character. Field name corresponding to the 'record_id' field.

Details

Unless a vector of variables/field names is passed to the fields argument, the fields to be analyzed will be guessed based on column type. All non-numeric data will be removed before analysis. If mixed numeric/non-numeric data (e.g. "160 cm") are passed, the first numerical instance will be extracted from the data. If a sex variable is provided, then variables will be grouped by sex for outlier analysis.