knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Install the prostateredcap package

This step is only required once. To install (or update) the prostateredcap package from GitHub, use the remotes package:

install.packages("remotes")  # skip if 'remotes' package is already installed
remotes::install_github("stopsack/prostateredcap")

An example dataset

The prostateredcap R package contains an example dataset of the prostate cancer database in the same format as it would be exported from REDCap as a "labeled CSV." All data in the example dataset are designed to mimick real clinical data but do not correspond to any real patients.

First, load the dplyr package for data handling, and take a look at the raw example dataset provided as part of the prostateredcap package.

library(dplyr)

raw_data <- system.file("extdata",
                        "SampleGUPIMPACTDatab_DATA_LABELS_2021-05-26.csv",
                        package = "prostateredcap")

readr::read_csv(file = raw_data) %>% 
  print(max_extra_cols = 0)  # do not print all other columns

The dataset, as a typical REDCap export, contains multiple rows per person, with each of the REDCap "forms" (baseline data, sample data, ...) in a separate row and blank values for variables not part of that "form."

Loading the data

We will load the prostateredcap library, read in the same dataset again, and display its contents.

library(prostateredcap)

pts_smp <- load_prostate_redcap(raw_data)

Warnings that the example data, which has data on 8 patients, does not contain all tumor/stage combinations are expected.

load_prostate_redcap() has returned a list with two separate data elements:

The data in pts_smp is preprocessed. For example, rather than containing data on date of birth and date of diagnosis, the pts dataset contains age at diagnosis (age_dx, in years).

pts_smp$pts

pts_smp$smp

By default, the argument deidentify = TRUE is set in load_prostate_redcap(). Thus, any identifiers except the sample IDs, which are needed to merge in molecular data and are shared on cBioPortal, have been removed from the returned datasets.

Performing quality control

To help ensure data quality, the prostateredcap package contains the function check_prostate_redcap(), which further processes the output of load_prostate_redcap() (in our example, pts_smp):

Passing the data to check_prostate_redcap() with default parameters and reviewing the number of records that do not pass checks:

pts_smp_qcd <- pts_smp %>%
  check_prostate_redcap(recommended_only = TRUE)

pts_smp_qcd$qc_pts

Running analyses

The data are now ready to be used for analyses. For example, the sample data and patient data can be merged into one data frame.

inner_join(pts_smp_qcd$pts,
           pts_smp_qcd$smp,
           by = "ptid") %>%
  rmarkdown::paged_table()  # print formatted version

See the data dictionary of all derived variables recommended for analyses.



stopsack/prostateredcap documentation built on June 3, 2023, 12:51 a.m.