genoQC: Quality Control for genotype data

Description Usage Arguments Details Value Author(s)

View source: R/GT_genoQC.R

Description

genoQC takes genotype data in GenABEL gwaa format and performs quality control and PCA analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
genoQC(
  gwaa,
  projectfolder = "GT/QC",
  projectname = "QC1",
  trait.name = "affection01",
  trait.type = "binomial",
  export.genofile = "ped",
  p.level.hwe = 0.05,
  hwe.id.subset = T,
  maf = 0.01,
  checkX = T,
  PCA = T,
  maxCenters = 5,
  ...
)

Arguments

gwaa

gwaa object from GenABEL

projectfolder

character containing path to output folder (will be generated if not existing).

projectname

character used as suffix for output files.

trait.name

character indicating column name with trait of interest in pheno data of gwaa. Needed for qq-plots as well as for PCA plot annotation. Omitted if NULL.

trait.type

character with data type "gaussian" or "binomial" of trait.name.

export.genofile

character or character vector with type(s) of QC-purified ped file to export into projectfolder. Allowed values are "ped" for ped/map-file, "tped" for transposed ped file format or "add.tped" for transposed additive coded format. If NULL, no data is exported.

p.level.hwe

numeric cut-off p-value for HWE in check.markers. For first round of QC it is rcommended to skip p-level cut-off, i.e. set p.level.hwe = 0.

hwe.id.subset

Subset for HWE checks in check.markers (default means controls only if trait.name is 0/1-coded affection status).

maf

numeric cut off for minor allele frequency to be used in check.markers.

checkX

boolean. If TRUE, X-errors in gwaa are fixed by Xfix().

PCA

boolean. IF TRUE, PCA analysis performed with genotype data.

maxCenters

numeric with maximum count of reported clustering center if PCA is performed.

...

further parameter submitted to GenABEL's check.marker() function. See ?check.marker for details.

Details

The check.marker-function from GenABEL package is used for quality control of genotype data. It is recommended to perform two round of quality control: first QC, remove samples with different genetic substructure, second QC. Principal component analysis for detection of genetic substructure is done if PCA = TRUE. The first 10 principal components are added to the covariates of the gwaa object. Samples are assigned to clusters and colored accordingly in PCA plots. Sample assignment is done for up to maxCenters cluster centers. All cluster sample lists are stored in a subfolder "ClusterLists". The QC-purified gwaa object may be exported to PLINK-compatible file formats.

Value

list containing two objects. First the QC-purified GenABEL gwaa object whith all samples removed as recomended. Second an object of class check.marker containing the quality control information. Intermediary results and plots are stored in projectfolder as side effects.

Author(s)

Frank Ruehle


frankRuehle/systemsbio documentation built on Sept. 14, 2020, 1:18 a.m.