check_pheno: Check phenotype file

Description Usage Arguments Details Value

View source: R/check_functions.R

Description

Check contents of a phenotype file for dbGaP posting

Usage

1
2
check_pheno(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na",
  "n/a"), subj_exp = NULL, subjectID_col = "SUBJECT_ID")

Arguments

dsfile

Path to the data file on disk

ddfile

Path to the data dictionary file on disk

na_vals

Vector of strings that should be read in as NA/missing in data file (see details of read_ds_file)

subj_exp

Vector of expected subject IDs

subjectID_col

Column name for subject-level ID

Details

Because of the variability of phenotype file contents, the only required column checked here is the subject-level ID. Note dbGaP requests variables (1) described in the study description and/or study config; (2) affection status, if not already included in the subject consent file; (3) sex; and (4) race/ethnicity/ancestry/heritage.

If a data dictionary is provided (ddfile != NULL), additionally checks correspondence between column names in data file and entries in data dictionary. Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.

Value

pheno_report, a list of the following issues (when present):

flag_nonuniq_subjID

TRUE when subject ID column is not unique, which would require definition of UNIQUEKEY columns in the corresponding data dictionary

dd_errors

Differences in fields between data file and data dictionary

extra_subjects

Subjects in data file missing from ssm_exp

missing_subjects

Subjects in ssm_exp missing from data file


UW-GAC/dbgaptools documentation built on April 30, 2019, 9:41 p.m.