rc_pool: Pools columns for aggregated analysis of fields with like...

View source: R/rc_pool.R

rc_poolR Documentation

Pools columns for aggregated analysis of fields with like data

Description

For each variable root provided, all column names in the record_data containing that root will be pooled into a single column and appended to the end of the dataframe. To see which columns have been pooled, run the command attributes([YOUR_DATA_FRAME])$pooled_vars on the returned dataframe.

Additionally, exact (i.e. full name) matching can be performed with the fields_list argument. Fields provided in this argument will be searched for in all columns. If both arguments are provided, fields_list will be applied first.

Furthermore, if the columns selected to be pooled contain more than one data point per row, the first data point will be used. In this case, pooling is likely inappropriate and the pooled columns should be reviewed. However, if for some reason pooling is still desirable and all data points should be kept, use make_repeat = TRUE to convert the pooled variables into repeats.

Usage

rc_pool(
  record_data,
  var_roots = NULL,
  fields_list = NULL,
  make_repeat = TRUE,
  id_field = getOption("redcap_bundle")$id_field
)

Arguments

record_data

Dataframe. Records data export from REDCap. For the purposes of this function, only quantitative data will be kept.

var_roots

Character. Vector of strings to search for within column names of record_data. For each variable root provided, all column names containing the root will be pooled into a single column. Regular expressions may be used.

fields_list

List. A list in the format list(new_column = c("old","column","names")). Unlike var_roots, the column names provided here will be matched exactly. In addition, if both var_roots and fields_list are provided, fields_list will be applied first.

make_repeat

Logical. Determines whether the pooled columns will be converted into repeat instruments. Default is TRUE. This option is useful for when there are same-row data points within columns to be pooled. In the future, this will be implemented automatically on an as-needed basis.

id_field

Character. Field name corresponding to the 'record_id' field.

Details

The intention of this function is to correct for inefficient REDCap project design where the same data measurement has been assigned to multiple variables. For example, if the variables "visit_1_weight" and "visit_2_weight" have been used to collect weight at different visits rather than re-using the same variable, they can be pooled into a single column using the var_root "weight". This is often desirable for analysis.

Author(s)

Marcus Lehr


chillywings/rctools documentation built on Aug. 9, 2024, 11:52 p.m.