mapStat: Map marker summary statistics to Glist

View source: R/genomic_statistics.R

mapStatR Documentation

Map marker summary statistics to Glist

Description

Quality control is a critical step for working with summary statistics (in particular for external). Processing and quality control of GWAS summary statistics includes:

- map marker ids (rsids/cpra (chr, pos, ref, alt)) to LD reference panel data

- check effect allele (flip EA, EAF, Effect)

- check effect allele frequency

- thresholds for MAF and HWE

- exclude INDELS, CG/AT and MHC region

- remove duplicated marker ids

- check which build version

- check for concordance between marker effect and LD data

Usage

mapStat(
  Glist = NULL,
  stat = NULL,
  excludeMAF = 0.01,
  excludeMAFDIFF = 0.05,
  excludeINFO = 0.8,
  excludeCGAT = TRUE,
  excludeINDEL = TRUE,
  excludeDUPS = TRUE,
  excludeMHC = FALSE,
  excludeMISS = 0.05,
  excludeHWE = 1e-12
)

Arguments

Glist

list of information about genotype matrix stored on disk

stat

dataframe with marker summary statistics

excludeMAF

exclude marker if minor allele frequency (MAF) is below threshold (0.01 is default)

excludeMAFDIFF

exclude marker if minor allele frequency difference (MAFDIFF) between Glist$af and stat$af is above threshold (0.05 is default)

excludeINFO

exclude marker if info score (INFO) is below threshold (0.8 is default)

excludeCGAT

exclude marker if alleles are ambigous (CG or AT)

excludeINDEL

exclude marker if it an insertion/deletion

excludeDUPS

exclude marker id if duplicated

excludeMHC

exclude marker if located in MHC region

excludeMISS

exclude marker if missingness (MISS) is above threshold (0.05 is default)

excludeHWE

exclude marker if p-value for Hardy Weinberg Equilibrium test is below threshold (0.01 is default)

Author(s)

Peter Soerensen


psoerensen/qgg documentation built on March 9, 2024, 10:02 p.m.