Description Usage Arguments Details Value
View source: R/check_functions.R
Check contents of a pedigree file for dbGaP posting
1 2 3 |
dsfile |
Path to the data file on disk |
ddfile |
Path to the data dictionary file on disk |
na_vals |
Vector of strings that should be read in as NA/missing in data file (see details of |
subj_exp |
Vector of expected subject IDs |
subjectID_col |
Column name for subject-level ID |
check_incons |
Logical whether to report pedigree inconsistencies, using |
male |
Encoded value for male in SEX column |
female |
Encoded value for female in SEX column |
If an MZ twin column is detected, returns issues including column name other than 'MZ_TWIN_ID' and a data frame of all twin pairs with logical flags to indicate > 1 family ID per pair (chk_family=TRUE
); non-unique subject ID (chk_subjectID=TRUE
); > 1 sex, which could indicate dizygotic twins are included (chk_sex=TRUE
).
If a data dictionary is provided (ddfile != NULL
), additionally checks
correspondence between column names in data file and entries in data dictionary.
Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.
ped_report, a list of the following issues (when present):
lowercase |
Logical flag indicating non-upper case variable names |
missing_vars |
Missing and required variables |
dd_errors |
Differences in fields between data file and data dictionary |
dup_subjects |
List of duplicated subject IDs |
extra_subjects |
Subjects in data file missing from |
missing_subjects |
Subjects in |
extra_sexvals |
Additional values in SEX column beyond what's specified by |
mztwin_errors |
List of potential errors with MZ twins |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.