ADDO_AddDom1_QC: Quality Control of Phenotype and Genotype (STEP1 of The...

Description Usage Arguments Details Value Author(s) Examples

View source: R/ADDO_AddDom1_QC.r

Description

Quality control of Phenotype and Genotype (Input file format: PLINK or GenABEL) (1) Discard phenotypes with <200 individuals or logical variables; (2) Remove extreme values over threefold sd from the mean; (3) Remove genotypes with MAF<0.05 or missing rate>0.1; (4) Normalized phenotypes using "quantile" or "-log2" transforming; (5) Histogram Plot of the raw, clean, residual, normalized and transformed phenotypes; (6) Calculate kinship matrix using GenABEL, EMMA, EMMAX, GEMMA, GCTA, HOMEBREW_AFW or HOMEBREW_AS; (7) Summary the mean, sd and sum of each phenotype.

Usage

1
2
3
4
5
6
7
ADDO_AddDom1_QC(indir = indir, outdir = outdir,
  Input_name = Input_name, Input_type = "PLINK",
  Kinship_type = Kinship_type, PheList_Choose = F, PheList = PheList,
  Phe_HistogramPlot = F, Phe_ResDone = F, Phe_NormDone = F,
  Normal_method = "QUANTILE", covariates_sum = covariates_sum,
  covariates_types = covariates_types, Phe_IndMinimum = 200,
  Phe_Extreme = 5, GT_maf = 0.05, GT_missing = 0.1, num_nodes = 10)

Arguments

indir

A character. The input directory where contains the input bPLINK or GenABEL data.

outdir

A character. The output directory where generates the folder: "1_PheGen".

Input_name

A character. The prefixes of the input files.

Input_type

A character. The format of input data. Please select from "PLINK" or "GenABEL".

Kinship_type

A character. The method to generate kinship matrix. Please select from "GenABEL","EMMA","EMMAX","GEMMA", "GCTA", "GCTA_ad", "HOMEBREW_AFW" or "HOMEBREW_AS".

PheList_Choose

A logic variable. T: Investigate specified phenotypes; F: Investigate all phenofiles.

PheList

A vector of character. Please specifie a list like c("id","cov1","cov2","phe1","phe2"), when "PheList_Choose=F".

Phe_HistogramPlot

A logic variable. T: Draw the histogram plots for all phenotypes; F: Aviod the histogram plots.

Phe_ResDone

A logic variable. T: The input data has already been residualize, won't correct the covariates effect; F: Correct the covariates effect.

Phe_NormDone

A logic variable. T: The input data has already been normalized, won't implement Log Transforming; F: Implement Log Transforming.

Normal_method

A character. When choose "Phe_NormDone = F", the specified normalized method will be needed, "LOG2" or "QUANTILE".

covariates_sum

A numeric variable. The sum of all covariates.

covariates_types

A vector of character. The type of all covariates. Please select from "n" and "f". "f" stands for factorization.

Phe_IndMinimum

A numeric variable. Remove phenotypes without enough available individuals.

Phe_Extreme

A numeric variable. Phenotype QC2: Remove extreme phenotype values over -/+ Phe_Extreme*sd from mean.

GT_maf

A numeric variable. Genotype QC1: Remove genotypes with MAF<GT_maf.

GT_missing

A numeric variable. Genotype QC2: Remove genotypes with rate>GT_missing.

num_nodes

A numeric variable. The number of cores used parallelly.

Details

NOTE1: PLINK Input Format (1) Genotype File, named "file.bed", "file.bim" & "file.fam" (2) Phenotype File, named "file.phe" (1st column name should be "id"; The covariates columns should be prior than phenotypes; The sex column should coded as female=0 and male=1) (3) Covariates File, named "file.covs" (1st column is phenotype names; 2nd column is corresponding covariates separated by ","). NOTE2: GenABEL Input Format (1) file.ABEL.dat (Just contain one GenABEL type variable named "dat") (2) file.covs (1st column is phenotype name; 2nd column is corresponding covariates and all covariates should be separated by ",") NOTE3: Required Softwares: plink (v.1.90) & gcta64 (or emma/emmax-kin/gemma, only required when specified)

Value

a folder named "1_PheGen" with phenotypes and genotypes after QC.

Author(s)

Leilei Cui and Bin Yang

Examples

1
2
3
covariates_types = c("n","f")
names(covariates_types) = c("sex","batch")
ADDO_AddDom1_QC(indir=indir, outdir=outdir, Input_name="TEST", Kinship_type="GCTA_ad", PheList=c("sex","batch"), covariates_sum=2)

LeileiCui/ADDO documentation built on July 25, 2020, 1:51 a.m.