View source: R/load_prostate_redcap.R
load_prostate_redcap | R Documentation |
Loads, merges, reformats, corrects, and labels
the REDCap file used for the MSK-IMPACT Prostate clinical database.
It is recommended that the returned list is next processed by
check_prostate_redcap
.
load_prostate_redcap(
labeled_csv,
deidentify = TRUE,
keep_also = list(baseline = NULL, sample = NULL, freeze = NULL)
)
labeled_csv |
CSV file with labels, exported from REDCap. Must be the labeled version and must contain dates in order to derive time intervals. |
deidentify |
De-identify the returned data set using
|
keep_also |
Optional. Additional patient-level variables to keep
without editing. As applicable, they would need to be
deidentified manually.
Provide as list with vectors of variable names for baseline and freeze
forms: |
The following edits and assumptions are made:
Potentially incomplete date variables are converted to
date format, using guessdate
.
Various missingness indicators in strings and factors,
c("Unknown / Not Reported", "N/A", "NA", "Unknown", "X", "x")
,
are converted to NA
.
"Undetectable" PSA is set to 0, PSA ">x"
is set to x + 1
,
PSA "a-b"
(e.g., 4.5-4.7
) is set to the mean of the two
values.
Clinical T and N stage variables are set to missing if M1.
Event dates and follow-up time for metastases (met_date
),
castration resistance (crpc_date
), and death are set:
Event date is the last clinic visit (lastvisit
)
if a CRPC/metastases event has not occurred.
Event date is the last follow up/contact (lastfu
)
if last known survival status is alive.
If stage is M1 and the recorded metastasis date is no more than
1 month discrepant, met_date
is set to the diagnosis
date (dxdate
).
If the sample is a variant histology (e.g., neuroendocrine),
the castration resistance date (crpc_date
) is the date of
diagnosis and the event indicator for survival analyses
(event_crpc
) is NA
.
Time intervals for these three survival outcomes are calculated from the time of sequencing. For late-entry survival models, time intervals from diagnosis to sequencing and from sample/biopsy to sequencing are also provided.
Disease extent, distinguishing CRPC from castration-sensitive disease, at sampling is based on the sample date and the date of castration resistance. If the samples was obtained before the CRPC date, or CRPC did not occur, the sample is from castration-sensitive disease by definition.
List of three labeled tibbles (data frames):
pts
: Patient-level data
smp
: Sample-level data
trt
: Treatment data
Access variables labels in RStudio via View
or using attr(., "label")
.
The warning message, Duplicated column names deduplicated
, is
expected due to the design of the REDCap dataset. Another warning message
that a factor does not contain all levels is also possible.
Overview of analysis-ready data elements: https://stopsack.github.io/prostateredcap/articles/dataelements.html
# Get path to toy data provided by the package:
example_csv_file <- system.file("extdata",
"SampleGUPIMPACTDatab_DATA_LABELS_2021-05-26.csv",
package = "prostateredcap",
mustWork = TRUE)
# Load data:
pts_smp <- load_prostate_redcap(labeled_csv = example_csv_file)
# Access patient-level data:
pts_smp$pts
# Access sample-level data:
pts_smp$smp
# Access treatment data:
pts_smp$trt
# Pass 'pts_smp' to check_prostate_redcap() next
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.