check_subj: Check subject consent file

Description Usage Arguments Details Value

View source: R/check_functions.R

Description

Check contents of a subject consent file for dbGaP posting.

Usage

1
2
3
check_subj(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na",
  "n/a"), subj_exp = NULL, subjectID_col = "SUBJECT_ID",
  consent_col = "CONSENT")

Arguments

dsfile

Path to the data file on disk

ddfile

Path to the data dictionary file on disk

na_vals

Vector of strings that should be read in as NA/missing in data file (see details of read_ds_file)

subj_exp

Dataframe of expected subject ID (column 1) and consent value (column 2)

subjectID_col

Column name for subject-level ID

consent_col

Column name for consent variable

Details

The subject consent file should be a tab-delimited .txt file. When (subj_exp != NULL), checks for presence of expected subject IDs, and correspondence between subject ID and consent value. If only one of either SUBJECT_SOURCE and SOURCE_SUBJECT_ID is present, returns a warning indicating that both variables must be submitted together. Checks that all consent groups are coded using an integer (1, 2, 3, etc).

If a data dictionary is provided (ddfile != NULL), additionally checks for agreement between data file and data dictionary. Assumes that CONSENT=0 need not be defined in data dictionary, as dbGaP automatically codes as subjects used as genotyping controls and/or pedigree linking members. Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.

Value

subj_report, a list of the following issues (when present):

consent_varname

Logical, indicating consent variable is not named 'CONSENT'

alias_missvar

Logical, indicating when only one of SUBJECT_SOURCE or SOURCE_SUBJECT_ID is submitted

dd_errors

Differences in fields between data file and data dictionary

dup_subjects

List of duplicated subject IDs

extra_subjects

Subjects in data file missing from subj_exp

missing_subjects

Subjects in subj_exp missing from data file

consent_diffs

Discrepancies in correspondence between subject ID and consent. Lists entries in subj_exp that disagree with correspondence in the data file

consent_nonints

List of non-integer consent values.

potential_pheno_vars

List of potential phenotype variable names in DS. Note phenotype should only be in one of these two files: phenotype file or subject consent file.


UW-GAC/dbgaptools documentation built on Nov. 3, 2020, 12:19 a.m.