check_ssm: Check sample subject mapping file

Description Usage Arguments Details Value

View source: R/check_functions.R

Description

Check contents of a sample subject mapping file for dbGaP posting.

Usage

1
2
3
check_ssm(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na", "n/a"),
  ssm_exp = NULL, sampleID_col = "SAMPLE_ID",
  subjectID_col = "SUBJECT_ID")

Arguments

dsfile

Path to the data file on disk

ddfile

Path to the data dictionary file on disk

na_vals

Vector of strings that should be read in as NA/missing in data file (see details of read_ds_file)

ssm_exp

Dataframe of expected SAMPLE_ID and SUBJECT_ID

sampleID_col

Column name for sample-level ID

subjectID_col

Column name for subject-level ID

Details

The sample subject mapping file should be a tab-delimited .txt file. When ssm_exp != NULL, checks for expected correspondence between SAMPLE_ID and SUBJECT_ID. Any differences in mapping between the two, or a difference in the list of expected SAMPLE_IDs or SUBJECT_IDs, will be returned in the output.

If a data dictionary is provided ddfile != NULL, additionally checks correspondence between column names in data file and entries in data dictionary. Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.

Value

ssm_report, a list of the following issues (when present):

dup_samples

List of duplicated sample IDs

blank_idx

Row index of blank/missing subject or sample IDs

dd_errors

Differences in fields between data file and data dictionary

extra_subjects

Subjects in data file missing from ssm_exp

missing_subjects

Subjects in ssm_exp missing from data file

extra_samples

Samples in data file missing from ssm_exp

missing_samples

Samples in ssm_exp missing from data file

ssm_diffs

Discrepancies in mapping between SAMPLE_ID and SUBJECT_ID. Lists entries in ssm_exp that disagree with mapping in the data file


UW-GAC/dbgaptools documentation built on Nov. 3, 2020, 12:19 a.m.