Description Usage Arguments Details Value
View source: R/check_functions.R
Check contents of a sample attributes file for dbGaP posting.
1 2 3 | check_sattr(dsfile, ddfile = NULL, na_vals = c("NA", "N/A", "na",
"n/a"), samp_exp = NULL, sampleID_col = "SAMPLE_ID",
topmed = FALSE)
|
dsfile |
Path to the data file on disk |
ddfile |
Path to the data dictionary file on disk |
na_vals |
Vector of strings that should be read in as NA/missing in data file (see details of |
samp_exp |
List of expected sample IDs |
sampleID_col |
Column name for sample-level ID |
topmed |
Logical to indicate TOPMed study |
The sample attributes file should be a tab-delimited .txt file.
When (topmed = TRUE
) checks presence of additional, TOPMed-specific
sample attributes variables: SEQUENCING_CENTER, Funding_Source, TOPMed_Phase,
TOPMed_Project, Study_Name.
Note that none of the BioSample variables (BODY_SITE, ANALYTE_TYPE, HISTOLOGICAL_TYPE, IS_TUMOR) are strictly required in the sense that their absence will not break dbGaP processing pipeline or delay study release. However, their inclusion is strongly encouraged, and indeed necessary for cancer studies and other tissue-specific studies, and are thus considered "required" variables for the purposes of this checking script.
If a data dictionary is provided (ddfile != NULL
), additionally checks
correspondence between column names in data file and entries in data dictionary.
Data dictionary files can be Excel (.xls, .xlsx) or tab-delimited .txt.
satt_report, a list of the following issues (when present):
missing_vars |
Missing and required variables |
dup_samples |
List of duplicated sample IDs |
blank_idx |
Row index of blank/missing sample IDs |
dd_errors |
Differences in fields between data file and data dictionary |
extra_samples |
Samples in data file missing from |
missing_samples |
Samples in |
missing_topmed_vars |
Missing and required variables for TOPMed |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.