check_dd: Check data dictionary (generic)

Description Usage Arguments Details Value

View source: R/check_functions.R

Description

Check data dictionary (generic)

Usage

1
.check_dd(dd, ds = NULL, dstype = "")

Arguments

dd

Data dictionary (DD) object

ds

Corresponding dataset (DS) object

dstype

Type of corresponding DS file, one of "pheno","ped","sattr","ssm","subj."

Details

Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt. The first two columns must be 'VARNAME' and 'VARDESC' in order for checks to proceed. Reports errors or issues with DD file. When the corresponding DS file is also provided, checks for consistency between the two.

Even if DS file is not provided, (ds == NULL), the (dstype) must be specified to check for customize check for UNITS and VALUES columns: pheno = phenotype DS; ped = pedigree DS, sattr =sample attributes DS, ssm=sample-subject mapping DS, subj=subject consent DS. Note VALUES are considered required for pheno, ped, sattr, and subj DS types. Additionally, UNITS are considered required for pheno and sattr DS types. Note some studies may not actually require VALUES and UNITS in these files types, but when missing they are reported out here for convenience.

Value

dd_report, a list of the following issues (when present):

lowercase

Logical flag indicating non-uppercase variable names

missing_reqvars

Missing and required variables, based on dstype

extra_vars

Extra variables

uniquekey_flags

Returns warning when UNIQUEKEY column is populated for file types other than phenotype (most common need for UNIQUEKEY) or sample attributes (which in some cases requires UNIQUEKEY). Returns warning if, in pheno or sattr dstypes, the UNIQUEKEY variable(s) do not specify unique rows in the DS.

vals_warnings

Vector of warnings about VALUES columns

missing_dsvars

Variables present in DS but not defined in DD

min_errors

Variables for which DS value are < DD MIN

max_errors

Variables for which DS value are > DD MAX

illegal_vars

Variable names containing illegal characters: '\', '/', ',' (comma), or 'dbGaP' are present


UW-GAC/dbgaptools documentation built on April 30, 2019, 9:41 p.m.