filterData: Filter a dataset based on specified filters.

View source: R/dataManipulation-filterData.R

filterDataR Documentation

Filter a dataset based on specified filters.

Description

A dataset can be filtered:

  • on a specific value of interest

  • on a function of a variable (valueFct parameter), e.g. maximum of the variable)

  • to retain only non missing values of a variable (keepNA set to FALSE)

  • by groups (varsBy parameter)

Note that by default, missing values in the filtering variable are retained (which differs from the default behaviour in R). To filter missing records, please set the keepNA parameter to FALSE.

Usage

filterData(
  data,
  filters,
  keepNA = TRUE,
  returnAll = FALSE,
  verbose = FALSE,
  labelVars = NULL,
  labelData = "data"
)

Arguments

data

Data.frame with data.

filters

Unique filter or list of filters.
Each filter is a list containing:

  • 'var': String with variable from data to filter on.

  • 'value': (optional) Character vector with values from var to consider/keep.

  • 'valueFct': (optional) Function (or string with this function) to be applied on var to extract value to consider.
    For example, valueFct = max will extract the records with the maximum value of the variable.

  • 'op': (optional) String with operator used to retain records from value. If not specified, the inclusion operator: '%in%' is considered, so records with var in value are retained.

  • 'rev': (optional) Logical, if TRUE (FALSE by default), filtering condition based on value/valueFct is reversed.

  • 'keepNA': (optional) Logical, if TRUE (by default), missing values in var are retained.
    If not specified, keepNA general parameter is used.

  • 'varsBy': (optional) Character vector with variables in data containing groups to filter by

  • 'postFct': (optional) Function (or string with this function) with post-processing applied on the results of the filtering criteria (TRUE/FALSE for each record). This function should return TRUE/FALSE (for each record or for all considered records).
    For example, 'postFct = any, varsBy = "group"' retains all groups which contain at least one record that fulfills the criteria.

  • 'varNew': (optional) String with name of a new variable containing the results of the filtering criteria (as TRUE/FALSE).

  • 'labelNew': (optional) String with label for the varNew variable.

If a list of filters is specified, the different filters are independently executed on the entire dataset to identify the records to retain for each filtering condition.
The resulting selections are combined with a Logic operator ('&' by default, i.e. 'AND' condition). A custom logic operator can be specified between the lists describing the filter, for example:
list(list(var = "SEX", value = "F"), "&", list(var = "COUNTRY", value = "DEU")).

keepNA

Logical, if TRUE (by default) missing values in var are retained. If set to FALSE, missing values are ignored for all filters. The specification within filters prevails on this parameter.

returnAll

Logical:

  • if FALSE (by default): the data for only the filtered records is returned.

  • if TRUE: the full data is returned. Records are flagged based on the filters condition, in a new column: varNew (if specified), or 'keep' otherwise; containing TRUE if the record fulfill all conditions, FALSE otherwise

verbose

Logical, if TRUE (FALSE by default) progress messages are printed in the current console. For the visualizations, progress messages during download of subject-specific report are displayed in the browser console.

labelVars

Named character vector containing variable labels.

labelData

(optional) String with label for input data, that will be included in progress messages.

Value

If returnAll

  • is FALSE: data filtered with the specified filters

  • is TRUE: data with the additional column: keep or varNew (if specified), containing TRUE for records which fulfill the specified condition(s) and FALSE otherwise.

The output contains the additional attribute: msg which contains a message describing the filtered records.

Author(s)

Laure Cougnaud

Examples

library(clinUtils)

data(dataADaMCDISCP01)
labelVars <- attr(dataADaMCDISCP01, "labelVars")

dataDM <- dataADaMCDISCP01$ADSL

## single filter

# filter with inclusion criteria:
filterData(
	data = dataDM, 
	filters = list(var = "SEX", value = "M"),
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter with non-inclusion criteria
filterData(
	data = dataDM, 
	filters = list(var = "SEX", value = "M", rev = TRUE), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter based on inequality operator
filterData(
	data = dataDM, 
	filters = list(var = "AGE", value = 75, op = "<="), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# missing values are retained by default!
dataDMNA <- dataDM
dataDMNA[1 : 2, "AGE"] <- NA
filterData(
	data = dataDMNA, 
	filters = list(var = "AGE", value = 75, op = "<="), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter missing values on variable
filterData(
	data = dataDMNA, 
	filters = list(var = "AGE", value = 75, op = "<=", keepNA = FALSE), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# retain only missing values
filterData(
	data = dataDMNA, 
	filters = list(var = "AGE", value = NA, keepNA = TRUE), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter missing values
filterData(
	data = dataDMNA, 
	filters = list(var = "AGE", keepNA = FALSE), 
	# optional
	labelVars = labelVars, verbose = TRUE
)


## multiple filters

# by default the records fulfilling all conditions are retained ('AND')
filterData(
	data = dataDM, 
	filters = list(
		list(var = "AGE", value = 75, op = "<="),
		list(var = "SEX", value = "M")
	), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# custom operator:
filterData(
	data = dataDM, 
	filters = list(
		list(var = "AGE", value = 75, op = "<="),
		"|",
		list(var = "SEX", value = "M")
	), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter by group

# only retain adverse event records with worst-case severity
dataAE <- dataADaMCDISCP01$ADAE
dataAE$AESEV <- factor(dataAE$AESEV, levels = c("MILD", "MODERATE", "SEVERE"))
dataAE$AESEVN <- as.numeric(dataAE$AESEV)
nrow(dataAE)
dataAEWorst <- filterData(
	data = dataAE,
	filters = list(
		var = "AESEVN",		
		valueFct = max,
		varsBy = c("USUBJID", "AEDECOD"),
		keepNA = FALSE
	),
	# optional
	labelVars = labelVars, verbose = TRUE
)
nrow(dataAEWorst)

# post-processing function
# keep subjects with at least one severe AE:
dataSubjectWithSevereAE <- filterData(
  data = dataAE,
  filters = list(
    var = "AESEV",		
    value = "SEVERE",
    varsBy = "USUBJID",
    postFct = any
  ),
  # optional
  labelVars = labelVars, verbose = TRUE
)

# for each laboratory parameter: keep only subjects which have at least one
# measurement classified as low or high
dataLB <- subset(dataADaMCDISCP01$ADLBC, !grepl("change", PARAM))
dataLBFiltered <- filterData(
  data = dataLB,
  filters = list(
    var = "LBNRIND",		
    value = c("LOW", "HIGH"),
    varsBy = c("PARAMCD", "USUBJID"),
    postFct = any
  ),
  # optional
  labelVars = labelVars, verbose = TRUE
)

clinDataReview documentation built on March 7, 2023, 5:13 p.m.