check_pheno: Check phenotype file
In UW-GAC/dbgaptools: Creates and Checks Standard Files for dbGaP submission

Description Usage Arguments Details Value

View source: R/check_functions.R

Check contents of a phenotype file for dbGaP posting

1 2	check_pheno(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na", "n/a"), subj_exp = NULL, subjectID_col = "SUBJECT_ID")

`dsfile`	Path to the data file on disk
`ddfile`	Path to the data dictionary file on disk
`na_vals`	Vector of strings that should be read in as NA/missing in data file (see details of `read_ds_file`)
`subj_exp`	Vector of expected subject IDs
`subjectID_col`	Column name for subject-level ID

Because of the variability of phenotype file contents, the only required column checked here is the subject-level ID. Note dbGaP requests variables (1) described in the study description and/or study config; (2) affection status, if not already included in the subject consent file; (3) sex; and (4) race/ethnicity/ancestry/heritage.

If a data dictionary is provided (ddfile != NULL), additionally checks correspondence between column names in data file and entries in data dictionary. Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.

pheno_report, a list of the following issues (when present):

`flag_nonuniq_subjID`	TRUE when subject ID column is not unique, which would require definition of UNIQUEKEY columns in the corresponding data dictionary
`dd_errors`	Differences in fields between data file and data dictionary
`extra_subjects`	Subjects in data file missing from `ssm_exp`
`missing_subjects`	Subjects in `ssm_exp` missing from data file