checkStat: Quality Control of Marker Summary Statistics

View source: R/genomic_statistics.R

checkStatR Documentation

Quality Control of Marker Summary Statistics

Description

Quality control is a fundamental step in GWAS summary statistics analysis. The function is equipped to handle various tasks including mapping marker ids, checking the effect allele and its frequency, determining build versions, and excluding data based on multiple criteria.

Usage

checkStat(
  Glist = NULL,
  stat = NULL,
  excludeMAF = 0.01,
  excludeMAFDIFF = 0.05,
  excludeINFO = 0.8,
  excludeCGAT = TRUE,
  excludeINDEL = TRUE,
  excludeDUPS = TRUE,
  excludeMHC = FALSE,
  excludeMISS = 0.05,
  excludeHWE = 1e-12
)

Arguments

Glist

List containing information about genotype matrix stored on disk.

stat

Data frame of marker summary statistics. It should either follow the "internal" or "external" format.

excludeMAF

Numeric. Exclusion threshold for minor allele frequency. Default is 0.01.

excludeMAFDIFF

Numeric. Threshold for excluding markers based on allele frequency difference. Default is 0.05.

excludeINFO

Numeric. Exclusion threshold for info score. Default is 0.8.

excludeCGAT

Logical. Exclude ambiguous alleles (CG or AT). Default is TRUE.

excludeINDEL

Logical. Exclude insertion/deletion markers. Default is TRUE.

excludeDUPS

Logical. Exclude markers with duplicated ids. Default is TRUE.

excludeMHC

Logical. Exclude markers located in MHC region. Default is FALSE.

excludeMISS

Numeric. Exclusion threshold for sample missingness. Default is 0.05.

excludeHWE

Numeric. Exclusion threshold for Hardy Weinberg Equilibrium test p-value. Default is 1e-12.

Details

Performs quality control on GWAS summary statistics, which includes: - Mapping marker ids to LD reference panel data. - Checking effect allele, frequency, and build version. - Excluding based on various criteria like MAF, HWE, INDELS, and more.

The function works with both "internal" and "external" formats of summary statistics. When the summary statistics format is "external", the function maps marker ids based on chr-pos-ref-alt information. It also aligns the effect allele with the LD reference panel and flips effect sizes if necessary. When allele frequencies are not provided, it uses the frequencies from the genotype data.

Required headers for external summary statistics: marker, chr, pos, ea, nea, eaf, b, seb, stat, p, n

Required headers for internal summary statistics: rsids, chr, pos, ea, nea, eaf, b, seb, stat, p, n

Value

A data frame with processed and quality-controlled summary statistics.

Author(s)

Peter Soerensen


psoerensen/qgg documentation built on March 9, 2024, 10:02 p.m.