check_ssm: Check sample subject mapping file
In UW-GAC/dbgaptools: Creates and Checks Standard Files for dbGaP submission

Description Usage Arguments Details Value

View source: R/check_functions.R

Check contents of a sample subject mapping file for dbGaP posting.

1
2
3

check_ssm(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na", "n/a"),
  ssm_exp = NULL, sampleID_col = "SAMPLE_ID",
  subjectID_col = "SUBJECT_ID")

`dsfile`	Path to the data file on disk
`ddfile`	Path to the data dictionary file on disk
`na_vals`	Vector of strings that should be read in as NA/missing in data file (see details of `read_ds_file`)
`ssm_exp`	Dataframe of expected SAMPLE_ID and SUBJECT_ID
`sampleID_col`	Column name for sample-level ID
`subjectID_col`	Column name for subject-level ID

The sample subject mapping file should be a tab-delimited .txt file. When ssm_exp != NULL, checks for expected correspondence between SAMPLE_ID and SUBJECT_ID. Any differences in mapping between the two, or a difference in the list of expected SAMPLE_IDs or SUBJECT_IDs, will be returned in the output.

If a data dictionary is provided ddfile != NULL, additionally checks correspondence between column names in data file and entries in data dictionary. Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.

ssm_report, a list of the following issues (when present):

`dup_samples`	List of duplicated sample IDs
`blank_idx`	Row index of blank/missing subject or sample IDs
`dd_errors`	Differences in fields between data file and data dictionary
`extra_subjects`	Subjects in data file missing from `ssm_exp`
`missing_subjects`	Subjects in `ssm_exp` missing from data file
`extra_samples`	Samples in data file missing from `ssm_exp`
`missing_samples`	Samples in `ssm_exp` missing from data file
`ssm_diffs`	Discrepancies in mapping between SAMPLE_ID and SUBJECT_ID. Lists entries in `ssm_exp` that disagree with mapping in the data file