check_cross_file: Cross file checks

Description Usage Arguments Details Value

View source: R/check_cross_file.R

Description

Check presence of expected subjects and samples across dbGaP files.

Usage

1
2
3
4
check_cross_file(subj_file, ssm_file, molecular_samples,
  sattr_file = NULL, pheno_file = NULL, ped_file = NULL,
  subjectID_col = "SUBJECT_ID", sampleID_col = "SAMPLE_ID",
  consent_col = "CONSENT")

Arguments

subj_file

Path to subject consent file on disk

ssm_file

Path to sample-subject mapping file on disk

molecular_samples

Vector of sample IDs with molecular data

sattr_file

Path to sample attributes file on disk

pheno_file

Path to phenotype file on disk

ped_file

Path to pedigree file on disk

subjectID_col

Column name for subject-level ID across file

sampleID_col

Column name for sample-level ID across files

consent_col

Column name for consent in subject file

Details

Checks for presence of expected subjects and samples across a set of dbGaP files. At a minimum, requires a subject consent file, sample-subject mapping file, and list of sample IDs for which molecular data is being submitted. Subjects with consent codes other than 0 and positive integers are returned as an error and excluded from further checks. Including additional files increases the number of pairwise checks done across files. The basic principles behind these checks are:

Note issues returned in the report may not always require corrective action - i.e. sometimes there are extenuating circumstances, such as when consented study subjects are missing from current molecular data submissions but expected in future submissions, and are thus retained in dbGaP files with non-zero consent status.

Value

cross_check_report, a list of the following issues (when present):


UW-GAC/dbgaptools documentation built on Nov. 3, 2020, 12:19 a.m.