load_prostate_redcap: Load MSK-IMPACT Prostate REDCap Labeled CSV File

View source: R/load_prostate_redcap.R

load_prostate_redcapR Documentation

Load MSK-IMPACT Prostate REDCap Labeled CSV File

Description

Loads, merges, reformats, corrects, and labels the REDCap file used for the MSK-IMPACT Prostate clinical database. It is recommended that the returned list is next processed by check_prostate_redcap.

Usage

load_prostate_redcap(
  labeled_csv,
  deidentify = TRUE,
  keep_also = list(baseline = NULL, sample = NULL, freeze = NULL)
)

Arguments

labeled_csv

CSV file with labels, exported from REDCap. Must be the labeled version and must contain dates in order to derive time intervals.

deidentify

De-identify the returned data set using deidentify_prostate_redcap? Defaults to TRUE. Should only be disabled if additional data need to be merged by identifiers, followed by calling deidentify_prostate_redcap separately.

keep_also

Optional. Additional patient-level variables to keep without editing. As applicable, they would need to be deidentified manually. Provide as list with vectors of variable names for baseline and freeze forms: list(baseline = c("var1", "var2"), freeze = "varX").

Details

The following edits and assumptions are made:

  1. Potentially incomplete date variables are converted to date format, using guessdate.

  2. Various missingness indicators in strings and factors, c("Unknown / Not Reported", "N/A", "NA", "Unknown", "X", "x"), are converted to NA.

  3. "Undetectable" PSA is set to 0, PSA ">x" is set to x + 1, PSA "a-b" (e.g., 4.5-4.7) is set to the mean of the two values.

  4. Clinical T and N stage variables are set to missing if M1.

  5. Event dates and follow-up time for metastases (met_date), castration resistance (crpc_date), and death are set:

    • Event date is the last clinic visit (lastvisit) if a CRPC/metastases event has not occurred.

    • Event date is the last follow up/contact (lastfu) if last known survival status is alive.

    • If stage is M1 and the recorded metastasis date is no more than 1 month discrepant, met_date is set to the diagnosis date (dxdate).

    • If the sample is a variant histology (e.g., neuroendocrine), the castration resistance date (crpc_date) is the date of diagnosis and the event indicator for survival analyses (event_crpc) is NA.

    • Time intervals for these three survival outcomes are calculated from the time of sequencing. For late-entry survival models, time intervals from diagnosis to sequencing and from sample/biopsy to sequencing are also provided.

  6. Disease extent, distinguishing CRPC from castration-sensitive disease, at sampling is based on the sample date and the date of castration resistance. If the samples was obtained before the CRPC date, or CRPC did not occur, the sample is from castration-sensitive disease by definition.

Value

List of three labeled tibbles (data frames):

  • pts: Patient-level data

  • smp: Sample-level data

  • trt: Treatment data

Access variables labels in RStudio via View or using attr(., "label").

The warning message, Duplicated column names deduplicated, is expected due to the design of the REDCap dataset. Another warning message that a factor does not contain all levels is also possible.

See Also

Overview of analysis-ready data elements: https://stopsack.github.io/prostateredcap/articles/dataelements.html

Examples

# Get path to toy data provided by the package:
example_csv_file <- system.file("extdata",
  "SampleGUPIMPACTDatab_DATA_LABELS_2021-05-26.csv",
  package = "prostateredcap",
  mustWork = TRUE)

# Load data:
pts_smp <- load_prostate_redcap(labeled_csv = example_csv_file)

# Access patient-level data:
pts_smp$pts

# Access sample-level data:
pts_smp$smp

# Access treatment data:
pts_smp$trt

# Pass 'pts_smp' to check_prostate_redcap() next

stopsack/prostateredcap documentation built on June 3, 2023, 12:51 a.m.