check_sattr: Check sample attributes file

Description Usage Arguments Details Value

View source: R/check_functions.R

Description

Check contents of a sample attributes file for dbGaP posting.

Usage

1
2
3
check_sattr(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na",
  "n/a"), samp_exp = NULL, sampleID_col = "SAMPLE_ID",
  topmed = FALSE)

Arguments

dsfile

Path to the data file on disk

ddfile

Path to the data dictionary file on disk

na_vals

Vector of strings that should be read in as NA/missing in data file (see details of read_ds_file)

samp_exp

List of expected sample IDs

sampleID_col

Column name for sample-level ID

topmed

Logical to indicate TOPMed study

Details

The sample attributes file should be a tab-delimited .txt file. When (topmed = TRUE) checks presence of additional, TOPMed-specific sample attributes variables: SEQUENCING_CENTER, Funding_Source, TOPMed_Phase, TOPMed_Project, Study_Name.

Note that none of the BioSample variables (BODY_SITE, ANALYTE_TYPE, HISTOLOGICAL_TYPE, IS_TUMOR) are strictly required in the sense that their absence will not break dbGaP processing pipeline or delay study release. However, their inclusion is strongly encouraged, and indeed necessary for cancer studies and other tissue-specific studies, and are thus considered "required" variables for the purposes of this checking script.

If a data dictionary is provided (ddfile != NULL), additionally checks correspondence between column names in data file and entries in data dictionary. Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.

Value

satt_report, a list of the following issues (when present):

missing_vars

Missing and required variables

dup_samples

List of duplicated sample IDs

blank_idx

Row index of blank/missing sample IDs

dd_errors

Differences in fields between data file and data dictionary

extra_samples

Samples in data file missing from ssm_exp

missing_samples

Samples in ssm_exp missing from data file

missing_topmed_vars

Missing and required variables for TOPMed


UW-GAC/dbgaptools documentation built on Nov. 3, 2020, 12:19 a.m.