QC_histogram: Histogram(s) of expected and observed data distribution

View source: R/QC_histogram.R

QC_histogramR Documentation

Histogram(s) of expected and observed data distribution

Description

QC_histogram creates two histograms: one showing the observed data distribution of a numeric variable, and one showing the expected distribution. It includes the option to filter the data with the high-quality filter. histogram_series generates a series of such histograms for multiple filter settings.

Usage

QC_histogram(dataset, data_col = 1,
             save_name = "dataset", save_dir = getwd(),
             export_outliers = FALSE,
             filter_FRQ = NULL, filter_cal = NULL,
             filter_HWE = NULL, filter_imp = NULL,
             filter_NA = TRUE,
             filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
             filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
             breaks = "Sturges",
             graph_name = colnames(dataset)[data_col],
             header_translations, check_impstatus = FALSE,
             ignore_impstatus = FALSE,
             T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
             F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
             NA_strings = c(NA, "NA", ".", "-"), ...)
histogram_series(dataset, data_col = 1,
   save_name = paste0("dataset_F", 1:nrow(plot_table)),
   save_dir = getwd(), export_outliers = FALSE,
   filter_FRQ = NULL, filter_cal = NULL,
   filter_HWE = NULL, filter_imp = NULL,
   filter_NA = TRUE,
   filter_NA_FRQ = filter_NA, filter_NA_cal = filter_NA,
   filter_NA_HWE = filter_NA, filter_NA_imp = filter_NA,
   breaks = "Sturges",
   header_translations, ignore_impstatus = FALSE,
   check_impstatus = FALSE,
   T_strings = c("1", "TRUE", "yes", "YES", "y", "Y"),
   F_strings = c("0", "FALSE", "no", "NO", "n", "N"),
   NA_strings = c(NA, "NA", ".", "-"),
   ...)

Arguments

dataset

vector or table containing the variable of interest.

data_col

name or number of the column of dataset containing the variable of interest.

save_name

for QC_histogram, a character string; for histogram_series, a vector of character strings; specifying the filename(s) of the graph, without extension.

save_dir

character string; the directory where the output files are saved. Note that R uses forward slash (/) where Windows uses the backslash (\).

export_outliers

logical or numeric value; should outlying entries (which are excluded from the plot) be exported to an output file? If numeric, the number specifies the max. number of entries that is exported.

filter_FRQ, filter_cal, filter_HWE, filter_imp

Filter threshold-values for allele-frequency, callrate, HWE p-value and imputation quality, respectively. Passed to HQ_filter. QC_histogram takes only single values, but histogram_series accepts vectors as well (see 'details').

filter_NA

logical; if TRUE, then missing filter variables will be excluded; if FALSE, they will be ignored. QC_histogram takes only single values, but histogram_series accepts vectors as well (see 'Details'). filter_NA is the default setting for all variables; variable-specific settings can be specified with the following arguments.

filter_NA_FRQ, filter_NA_cal, filter_NA_HWE, filter_NA_imp

logical; variable-specific settings for filter_NA. These arguments are passed to HQ_filter.

breaks

argument passed to hist; determines the cell-borders in the histogram.

graph_name

character string; used in the title of the plot.

header_translations

translation table for column names. See translate_header for more information. If the argument is left empty, dataset is assumed to use the standard column-names used by QC_GWAS.

check_impstatus

logical; should convert_impstatus be called to convert the imputation-status column into standard values?

ignore_impstatus

logical; if FALSE, HWE p-value and callrate filters are applied only to genotyped SNPs, and imputation quality filters only to imputed SNPs. If TRUE, the filters are applied to all SNPs regardless of the imputation status.

T_strings, F_strings, NA_strings

arguments passed to convert_impstatus.

...

in histogram_series: arguments passed to QC_histogram; in QC_histogram, arguments passed to hist.

Details

histogram_series accepts multiple filter-values, and passes these one by one to QC_histogram to generate a series of histograms. For example, specifying:

filter_FRQ = c(0.05, 0.10), filter_cal = c(0.90, 0.95)

will generate two histograms. The first excludes SNPs with allele frequency < 0.05 or callrate < 0.90; the second allele frequency < 0.10 or callrate < 0.95. The same principle applies to the NA_filter settings. If the vectors submitted to the filter arguments are of unequal length, the shorter vector will be recycled until it equals the length of the longer (if possible). To filter missing values only, set the filter to NA and the corresponding NA-filter argument to TRUE. Setting the filter argument to NULL will disable the filter entirely, regardless of the NA filter setting.

Value

Both functions return an invisible value NULL.

See Also

For creating QQ plots: QQ_plot.

Examples

## Not run: 
  data("gwa_sample")

  QC_histogram(dataset = gwa_sample, data_col = "EFFECT",
             save_name = "sample_histogram",
             filter_FRQ = 0.01, filter_cal = 0.95,
             filter_NA = FALSE,
             graph_name = "Effect size histogram")

  histogram_series(dataset = gwa_sample, data_col = "EFFECT",
             save_name = "sample_histogram",
             filter_FRQ = c(NA, 0.01, 0.01),
             filter_cal = c(NA, 0.95, 0.95),
             filter_NA = c(FALSE, FALSE, TRUE))

## End(Not run)

QCGWAS documentation built on May 30, 2022, 5:05 p.m.