View source: R/genomic_statistics.R
mapStat | R Documentation |
Quality control is a critical step for working with summary statistics (in particular for external). Processing and quality control of GWAS summary statistics includes:
- map marker ids (rsids/cpra (chr, pos, ref, alt)) to LD reference panel data
- check effect allele (flip EA, EAF, Effect)
- check effect allele frequency
- thresholds for MAF and HWE
- exclude INDELS, CG/AT and MHC region
- remove duplicated marker ids
- check which build version
- check for concordance between marker effect and LD data
mapStat(
Glist = NULL,
stat = NULL,
excludeMAF = 0.01,
excludeMAFDIFF = 0.05,
excludeINFO = 0.8,
excludeCGAT = TRUE,
excludeINDEL = TRUE,
excludeDUPS = TRUE,
excludeMHC = FALSE,
excludeMISS = 0.05,
excludeHWE = 1e-12
)
Glist |
list of information about genotype matrix stored on disk |
stat |
dataframe with marker summary statistics |
excludeMAF |
exclude marker if minor allele frequency (MAF) is below threshold (0.01 is default) |
excludeMAFDIFF |
exclude marker if minor allele frequency difference (MAFDIFF) between Glist$af and stat$af is above threshold (0.05 is default) |
excludeINFO |
exclude marker if info score (INFO) is below threshold (0.8 is default) |
excludeCGAT |
exclude marker if alleles are ambigous (CG or AT) |
excludeINDEL |
exclude marker if it an insertion/deletion |
excludeDUPS |
exclude marker id if duplicated |
excludeMHC |
exclude marker if located in MHC region |
excludeMISS |
exclude marker if missingness (MISS) is above threshold (0.05 is default) |
excludeHWE |
exclude marker if p-value for Hardy Weinberg Equilibrium test is below threshold (0.01 is default) |
Peter Soerensen
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.