check_ped: Check pedigree file

Description Usage Arguments Details Value

View source: R/check_functions.R

Description

Check contents of a pedigree file for dbGaP posting

Usage

1
2
3
check_ped(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na", "n/a"),
  subj_exp = NULL, subjectID_col = "SUBJECT_ID", check_incons = TRUE,
  male = 1, female = 2)

Arguments

dsfile

Path to the data file on disk

ddfile

Path to the data dictionary file on disk

na_vals

Vector of strings that should be read in as NA/missing in data file (see details of read_ds_file)

subj_exp

Vector of expected subject IDs

subjectID_col

Column name for subject-level ID

check_incons

Logical whether to report pedigree inconsistencies, using GWASTools pedigreeCheck

male

Encoded value for male in SEX column

female

Encoded value for female in SEX column

Details

If an MZ twin column is detected, returns issues including column name other than 'MZ_TWIN_ID' and a data frame of all twin pairs with logical flags to indicate > 1 family ID per pair (chk_family=TRUE); non-unique subject ID (chk_subjectID=TRUE); > 1 sex, which could indicate dizygotic twins are included (chk_sex=TRUE).

If a data dictionary is provided (ddfile != NULL), additionally checks correspondence between column names in data file and entries in data dictionary. Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.

Value

ped_report, a list of the following issues (when present):

lowercase

Logical flag indicating non-upper case variable names

missing_vars

Missing and required variables

dd_errors

Differences in fields between data file and data dictionary

extra_subjects

Subjects in data file missing from ssm_exp

missing_subjects

Subjects in ssm_exp missing from data file

extra_sexvals

Additional values in SEX column beyond what's specified by male and female function arguments

mztwin_errors

List of potential errors with MZ twins


UW-GAC/dbgaptools documentation built on April 30, 2019, 9:41 p.m.