filterData: Filter a dataset based on specified filters.
In clinDataReview: Clinical Data Review Tool

View source: R/dataManipulation-filterData.R

filterData

R Documentation

Filter a dataset based on specified filters.

Description

A dataset can be filtered:

on a specific value of interest
on a function of a variable (valueFct parameter), e.g. maximum of the variable)
to retain only non missing values of a variable (keepNA set to FALSE)
by groups (varsBy parameter)

Note that by default, missing values in the filtering variable are retained (which differs from the default behaviour in R). To filter missing records, please set the keepNA parameter to FALSE.

Usage

filterData(
  data,
  filters,
  keepNA = TRUE,
  returnAll = FALSE,
  verbose = FALSE,
  labelVars = NULL,
  labelData = "data"
)

Arguments

`data`	Data.frame with data.
`filters`	Unique filter or list of filters. Each filter is a list containing: 'var': String with variable from `data` to filter on. 'value': (optional) Character vector with values from `var` to consider/keep. 'valueFct': (optional) Function (or string with this function) to be applied on `var` to extract value to consider. For example, `valueFct = max` will extract the records with the maximum value of the variable. 'op': (optional) String with operator used to retain records from `value`. If not specified, the inclusion operator: '%in%' is considered, so records with `var` in `value` are retained. 'rev': (optional) Logical, if TRUE (FALSE by default), filtering condition based on `value`/`valueFct` is reversed. 'keepNA': (optional) Logical, if TRUE (by default), missing values in `var` are retained. If not specified, `keepNA` general parameter is used. 'varsBy': (optional) Character vector with variables in `data` containing groups to filter by 'postFct': (optional) Function (or string with this function) with post-processing applied on the results of the filtering criteria (TRUE/FALSE for each record). This function should return TRUE/FALSE (for each record or for all considered records). For example, '`postFct = any, varsBy = "group"`' retains all groups which contain at least one record that fulfills the criteria. 'varNew': (optional) String with name of a new variable containing the results of the filtering criteria (as TRUE/FALSE). 'labelNew': (optional) String with label for the `varNew` variable. If a list of filters is specified, the different filters are independently executed on the entire dataset to identify the records to retain for each filtering condition. The resulting selections are combined with a `Logic` operator ('&' by default, i.e. 'AND' condition). A custom logic operator can be specified between the lists describing the filter, for example: `list(list(var = "SEX", value = "F"), "&", list(var = "COUNTRY", value = "DEU"))`.
`keepNA`	Logical, if TRUE (by default) missing values in `var` are retained. If set to FALSE, missing values are ignored for all filters. The specification within `filters` prevails on this parameter.
`returnAll`	Logical: if FALSE (by default): the `data` for only the filtered records is returned. if TRUE: the full `data` is returned. Records are flagged based on the `filters` condition, in a new column: `varNew` (if specified), or 'keep' otherwise; containing TRUE if the record fulfill all conditions, FALSE otherwise
`verbose`	Logical, if TRUE (FALSE by default) progress messages are printed in the current console. For the visualizations, progress messages during download of subject-specific report are displayed in the browser console.
`labelVars`	Named character vector containing variable labels.
`labelData`	(optional) String with label for input `data`, that will be included in progress messages.

Value

If returnAll

is FALSE: data filtered with the specified filters
is TRUE: data with the additional column: keep or varNew (if specified), containing TRUE for records which fulfill the specified condition(s) and FALSE otherwise.

The output contains the additional attribute: msg which contains a message describing the filtered records.

Author(s)

Laure Cougnaud

Examples

library(clinUtils)

data(dataADaMCDISCP01)
labelVars <- attr(dataADaMCDISCP01, "labelVars")

dataDM <- dataADaMCDISCP01$ADSL

## single filter

# filter with inclusion criteria:
filterData(
	data = dataDM, 
	filters = list(var = "SEX", value = "M"),
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter with non-inclusion criteria
filterData(
	data = dataDM, 
	filters = list(var = "SEX", value = "M", rev = TRUE), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter based on inequality operator
filterData(
	data = dataDM, 
	filters = list(var = "AGE", value = 75, op = "<="), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# missing values are retained by default!
dataDMNA <- dataDM
dataDMNA[1 : 2, "AGE"] <- NA
filterData(
	data = dataDMNA, 
	filters = list(var = "AGE", value = 75, op = "<="), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter missing values on variable
filterData(
	data = dataDMNA, 
	filters = list(var = "AGE", value = 75, op = "<=", keepNA = FALSE), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# retain only missing values
filterData(
	data = dataDMNA, 
	filters = list(var = "AGE", value = NA, keepNA = TRUE), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter missing values
filterData(
	data = dataDMNA, 
	filters = list(var = "AGE", keepNA = FALSE), 
	# optional
	labelVars = labelVars, verbose = TRUE
)


## multiple filters

# by default the records fulfilling all conditions are retained ('AND')
filterData(
	data = dataDM, 
	filters = list(
		list(var = "AGE", value = 75, op = "<="),
		list(var = "SEX", value = "M")
	), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# custom operator:
filterData(
	data = dataDM, 
	filters = list(
		list(var = "AGE", value = 75, op = "<="),
		"|",
		list(var = "SEX", value = "M")
	), 
	# optional
	labelVars = labelVars, verbose = TRUE
)

# filter by group

# only retain adverse event records with worst-case severity
dataAE <- dataADaMCDISCP01$ADAE
dataAE$AESEV <- factor(dataAE$AESEV, levels = c("MILD", "MODERATE", "SEVERE"))
dataAE$AESEVN <- as.numeric(dataAE$AESEV)
nrow(dataAE)
dataAEWorst <- filterData(
	data = dataAE,
	filters = list(
		var = "AESEVN",		
		valueFct = max,
		varsBy = c("USUBJID", "AEDECOD"),
		keepNA = FALSE
	),
	# optional
	labelVars = labelVars, verbose = TRUE
)
nrow(dataAEWorst)

# post-processing function
# keep subjects with at least one severe AE:
dataSubjectWithSevereAE <- filterData(
  data = dataAE,
  filters = list(
    var = "AESEV",		
    value = "SEVERE",
    varsBy = "USUBJID",
    postFct = any
  ),
  # optional
  labelVars = labelVars, verbose = TRUE
)

# for each laboratory parameter: keep only subjects which have at least one
# measurement classified as low or high
dataLB <- subset(dataADaMCDISCP01$ADLBC, !grepl("change", PARAM))
dataLBFiltered <- filterData(
  data = dataLB,
  filters = list(
    var = "LBNRIND",		
    value = c("LOW", "HIGH"),
    varsBy = c("PARAMCD", "USUBJID"),
    postFct = any
  ),
  # optional
  labelVars = labelVars, verbose = TRUE
)

clinDataReview documentation built on April 12, 2025, 1:14 a.m.