HQ_filter: Select high-quality data in GWAS datasets

Description Usage Arguments Details Value Note Examples

View source: R/HQ_filter.R

Description

This function accepts a QC_GWAS dataset and returns a vector of logical values indicating which entries meet the quality criteria.

Usage

1
2
3
4
5
6
7
HQ_filter(data,
          ignore_impstatus = FALSE,
          FRQ_val = NULL, HWE_val = NULL,
          cal_val = NULL, imp_val = NULL,
          filter_NA = TRUE,
          FRQ_NA = filter_NA, HWE_NA = filter_NA,
          cal_NA = filter_NA, imp_NA = filter_NA)

Arguments

data

table to be filtered. HQ_filter assumes the dataset uses the standard QC_GWAS column names.

ignore_impstatus

logical; if FALSE, HWE p-value and callrate filters are applied only to genotyped SNPs, and imputation quality filters only to imputed SNPs. If TRUE, the filters are applied to all SNPs regardless of the imputation status.

FRQ_val, HWE_val, cal_val, imp_val

numeric; the minimal required value for allele frequency, HWE p-value, callrate and imputation quality respectively. Note that the allele-frequency filter is two-sided: for a filter-value of x, it will exclude entries with freq < x and freq > 1 - x.

filter_NA

logical; if TRUE, then missing filter variables will be excluded; if FALSE, they will be ignored. filter_NA is the default setting for all variables. Variable-specific settings can be specified with the following arguments.

FRQ_NA, HWE_NA, cal_NA, imp_NA

logical; variable-specific settings for filter_NA.

Details

A SNP is considered high-quality if it meets all quality criteria. The thresholds are inclusive; i.e. SNPs that have a value equal or higher than the threshold will be considered high-quality.

To filter missing values only, set the filter argument to NA, and the corresponding NA-filter to TRUE.

To disable filtering entirely, set to NULL. This disables the filtering of missing values as well.

When imputation status is missing or invalid (and ignore_impstatus is FALSE), only the allele-frequency filter will be applied.

Value

A vector of logical values, indicating which values in data meet (TRUE) or fail (FALSE) the quality criteria.

Note

The table entered in the data argument must use the standard column names of QC_GWAS. Functions using HQ_filter usually allow the user to specify a translation table. If not, translate_header can be used to translate the header manually.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  data("gwa_sample")

  selected_SNPs <- HQ_filter(data = gwa_sample,
                             FRQ_val = 0.01,
                             cal_val = 0.95,
                             filter_NA = FALSE)
  summary(gwa_sample[selected_SNPs, ])
  
  selected_SNPs <- HQ_filter(data = gwa_sample,
                             FRQ_val = 0.01,
                             cal_val = 0.95,
                             filter_NA = FALSE,
                             ignore_impstatus = TRUE)
  summary(gwa_sample[selected_SNPs, ])

Example output

QCGWAS library, version 1.0-8

A quick start guide (and other documentation)
can be found in the 'R/library/QCGWAS/doc' folder
    MARKER             STRAND              CHR               POSITION        
 Length:9765        Length:9765        Length:9765        Min.   :    37226  
 Class :character   Class :character   Class :character   1st Qu.: 32081975  
 Mode  :character   Mode  :character   Mode  :character   Median : 69123997  
                                                          Mean   : 78248986  
                                                          3rd Qu.:114381541  
                                                          Max.   :246531946  
                                                                             
  EFFECT_ALL         OTHER_ALL            N_TOTAL       EFF_ALL_FREQ   
 Length:9765        Length:9765        Min.   :896.0   Min.   :0.0100  
 Class :character   Class :character   1st Qu.:936.0   1st Qu.:0.2210  
 Mode  :character   Mode  :character   Median :936.0   Median :0.5010  
                                       Mean   :935.8   Mean   :0.5009  
                                       3rd Qu.:936.0   3rd Qu.:0.7850  
                                       Max.   :936.0   Max.   :0.9900  
                                                                       
    HWE_PVAL        CALLRATE          EFFECT               STDERR      
 Min.   :0.000   Min.   :0.9573   Min.   :-1.0340000   Min.   :0.0159  
 1st Qu.:0.236   1st Qu.:1.0000   1st Qu.:-0.0180000   1st Qu.:0.0209  
 Median :0.530   Median :1.0000   Median : 0.0000000   Median :0.0245  
 Mean   :0.515   Mean   :0.9998   Mean   :-0.0001974   Mean   :0.0333  
 3rd Qu.:0.783   3rd Qu.:1.0000   3rd Qu.: 0.0190000   3rd Qu.:0.0343  
 Max.   :1.000   Max.   :1.0000   Max.   : 0.4960000   Max.   :0.5965  
 NA's   :8799                                                          
     PVALUE             IMPUTED        IMP_QUALITY     
 Min.   :0.0000002   Min.   :0.0000   Min.   :-0.4265  
 1st Qu.:0.2449000   1st Qu.:1.0000   1st Qu.: 0.8890  
 Median :0.5000000   Median :1.0000   Median : 0.9640  
 Mean   :0.4995554   Mean   :0.9011   Mean   : 0.8949  
 3rd Qu.:0.7534000   3rd Qu.:1.0000   3rd Qu.: 0.9890  
 Max.   :0.9999000   Max.   :1.0000   Max.   : 1.0000  
                                                       
    MARKER             STRAND              CHR               POSITION        
 Length:9765        Length:9765        Length:9765        Min.   :    37226  
 Class :character   Class :character   Class :character   1st Qu.: 32081975  
 Mode  :character   Mode  :character   Mode  :character   Median : 69123997  
                                                          Mean   : 78248986  
                                                          3rd Qu.:114381541  
                                                          Max.   :246531946  
                                                                             
  EFFECT_ALL         OTHER_ALL            N_TOTAL       EFF_ALL_FREQ   
 Length:9765        Length:9765        Min.   :896.0   Min.   :0.0100  
 Class :character   Class :character   1st Qu.:936.0   1st Qu.:0.2210  
 Mode  :character   Mode  :character   Median :936.0   Median :0.5010  
                                       Mean   :935.8   Mean   :0.5009  
                                       3rd Qu.:936.0   3rd Qu.:0.7850  
                                       Max.   :936.0   Max.   :0.9900  
                                                                       
    HWE_PVAL        CALLRATE          EFFECT               STDERR      
 Min.   :0.000   Min.   :0.9573   Min.   :-1.0340000   Min.   :0.0159  
 1st Qu.:0.236   1st Qu.:1.0000   1st Qu.:-0.0180000   1st Qu.:0.0209  
 Median :0.530   Median :1.0000   Median : 0.0000000   Median :0.0245  
 Mean   :0.515   Mean   :0.9998   Mean   :-0.0001974   Mean   :0.0333  
 3rd Qu.:0.783   3rd Qu.:1.0000   3rd Qu.: 0.0190000   3rd Qu.:0.0343  
 Max.   :1.000   Max.   :1.0000   Max.   : 0.4960000   Max.   :0.5965  
 NA's   :8799                                                          
     PVALUE             IMPUTED        IMP_QUALITY     
 Min.   :0.0000002   Min.   :0.0000   Min.   :-0.4265  
 1st Qu.:0.2449000   1st Qu.:1.0000   1st Qu.: 0.8890  
 Median :0.5000000   Median :1.0000   Median : 0.9640  
 Mean   :0.4995554   Mean   :0.9011   Mean   : 0.8949  
 3rd Qu.:0.7534000   3rd Qu.:1.0000   3rd Qu.: 0.9890  
 Max.   :0.9999000   Max.   :1.0000   Max.   : 1.0000  
                                                       

QCGWAS documentation built on May 2, 2019, 3:19 p.m.