Description Usage Arguments Details Value
View source: R/check_functions.R
Check contents of a phenotype file for dbGaP posting
1 2 | check_pheno(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na",
"n/a"), subj_exp = NULL, subjectID_col = "SUBJECT_ID")
|
dsfile |
Path to the data file on disk |
ddfile |
Path to the data dictionary file on disk |
na_vals |
Vector of strings that should be read in as NA/missing in data file (see details of |
subj_exp |
Vector of expected subject IDs |
subjectID_col |
Column name for subject-level ID |
Because of the variability of phenotype file contents, the only required column checked here is the subject-level ID. Note dbGaP requests variables (1) described in the study description and/or study config; (2) affection status, if not already included in the subject consent file; (3) sex; and (4) race/ethnicity/ancestry/heritage.
If a data dictionary is provided (ddfile != NULL
), additionally checks
correspondence between column names in data file and entries in data dictionary.
Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.
pheno_report, a list of the following issues (when present):
flag_nonuniq_subjID |
TRUE when subject ID column is not unique, which would require definition of UNIQUEKEY columns in the corresponding data dictionary |
dd_errors |
Differences in fields between data file and data dictionary |
extra_subjects |
Subjects in data file missing from |
missing_subjects |
Subjects in |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.