HQ_filter: Select high-quality data in GWAS datasets

View source: R/HQ_filter.R

HQ_filterR Documentation

Select high-quality data in GWAS datasets

Description

This function accepts a QC_GWAS dataset and returns a vector of logical values indicating which entries meet the quality criteria.

Usage

HQ_filter(data,
          ignore_impstatus = FALSE,
          FRQ_val = NULL, HWE_val = NULL,
          cal_val = NULL, imp_val = NULL,
          filter_NA = TRUE,
          FRQ_NA = filter_NA, HWE_NA = filter_NA,
          cal_NA = filter_NA, imp_NA = filter_NA)

Arguments

data

table to be filtered. HQ_filter assumes the dataset uses the standard QC_GWAS column names.

ignore_impstatus

logical; if FALSE, HWE p-value and callrate filters are applied only to genotyped SNPs, and imputation quality filters only to imputed SNPs. If TRUE, the filters are applied to all SNPs regardless of the imputation status.

FRQ_val, HWE_val, cal_val, imp_val

numeric; the minimal required value for allele frequency, HWE p-value, callrate and imputation quality respectively. Note that the allele-frequency filter is two-sided: for a filter-value of x, it will exclude entries with freq < x and freq > 1 - x.

filter_NA

logical; if TRUE, then missing filter variables will be excluded; if FALSE, they will be ignored. filter_NA is the default setting for all variables. Variable-specific settings can be specified with the following arguments.

FRQ_NA, HWE_NA, cal_NA, imp_NA

logical; variable-specific settings for filter_NA.

Details

A SNP is considered high-quality if it meets all quality criteria. The thresholds are inclusive; i.e. SNPs that have a value equal or higher than the threshold will be considered high-quality.

To filter missing values only, set the filter argument to NA, and the corresponding NA-filter to TRUE.

To disable filtering entirely, set to NULL. This disables the filtering of missing values as well.

When imputation status is missing or invalid (and ignore_impstatus is FALSE), only the allele-frequency filter will be applied.

Value

A vector of logical values, indicating which values in data meet (TRUE) or fail (FALSE) the quality criteria.

Note

The table entered in the data argument must use the standard column names of QC_GWAS. Functions using HQ_filter usually allow the user to specify a translation table. If not, translate_header can be used to translate the header manually.

Examples

  data("gwa_sample")

  selected_SNPs <- HQ_filter(data = gwa_sample,
                             FRQ_val = 0.01,
                             cal_val = 0.95,
                             filter_NA = FALSE)
  summary(gwa_sample[selected_SNPs, ])
  
  selected_SNPs <- HQ_filter(data = gwa_sample,
                             FRQ_val = 0.01,
                             cal_val = 0.95,
                             filter_NA = FALSE,
                             ignore_impstatus = TRUE)
  summary(gwa_sample[selected_SNPs, ])

QCGWAS documentation built on May 30, 2022, 5:05 p.m.