phaseImpute: Phasing and imputation

View source: R/phasingImpute.R

phaseImputeR Documentation

Phasing and imputation

Description

Perform phasing, imputation and conversion from IMPUTE2 or GEN format into PLINK binary files.

Usage

phaseImpute(
  inputPrefix,
  outputPrefix,
  autosome = TRUE,
  plink,
  shapeit,
  imputeTool,
  impute,
  qctool,
  gtool,
  windowSize = 3e+06,
  effectiveSize = 20000,
  nCore = 40,
  threshold = 0.9,
  outputInfoFile,
  SNP = TRUE,
  referencePanel,
  impRefDIR,
  tmpImputeDir,
  keepTmpDir = TRUE
)

Arguments

inputPrefix

the prefix of the input PLINK binary files for the imputation.

outputPrefix

the prefix of the output PLINK binary files after imputation.

autosome

a logical value indicating if only autosomal chromosomes are imputed.

plink

an executable program in either the current working directory or somewhere in the command path.

shapeit

an executable program in either the current working directory or somewhere in the command path.

imputeTool

a string indicating the type of imputation tool is used: "impute2" or "impute4".

impute

an executable program in either the current working directory or somewhere in the command path. It can be either "impute2" or "impute4".

qctool

an executable program in either the current working directory or somewhere in the command path. This is only used if imputeTool is "impute4".

gtool

an executable program in either the current working directory or somewhere in the command path.

windowSize

the window size of each chunk. The default value is 3000000.

effectiveSize

this parameter controls the effective population size. Commonly denoted as Ne. A universal -Ne value of 20000 is suggested.

nCore

the number of cores used for splitting chromosome by PLINK, phasing, imputation, genotype format modification, genotype conversion, and merging genotypes. The default value is 40.

threshold

threshold for merging genotypes from GEN probability. Default 0.9.

outputInfoFile

the output file of impute2 info scores consisting of two columns: all imputed SNPs and their info scores.

SNP

A logical value indicating if the data is entirely comprised single nucleotide polymorphisms then it can be set as TRUE and the genotypes are expressed as pairs of A,C,G,T and unknowns are represented as N N.

referencePanel

a string indicating the type of imputation reference panels is used: "1000Gphase1v3_macGT1" or "1000Gphase3".

impRefDIR

the directory where the imputation reference files are located.

tmpImputeDir

the name of the temporary directory used for storing phasing and imputation results.

keepTmpDir

a logical value indicating if the directory 'tmpImputeDir' should be kept or not. The default is TRUE.

Details

The whole imputation process mainly consists of the following steps: 1.) Phasing the input PLINK data using an existing imputation reference; 2.) Imputing the input PLINK data using phased results and an existing reference data; 3.) Converting IMPUTE2 or GEN format data into PLINK format. 4.) Combining all imputed data into whole-genome PLINK binary files. 5.) Filtering out imputed variants with bad imputation quality. Parallel computing in R is supported.

Value

Note that chromosome X is not supported for the impute4. 1.) The filtered imputed PLINK binary files; 2.) The final PLINK binary files including bad imputed variants; 3.) A pure text file contains the info scores of all imputed SNPs with two columns: SNP names and the corresponding info scores.

Author(s)

Junfang Chen

References

  1. Howie, B., et al. (2012). Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44(8): 955-959.

  2. Howie, B. N., et al. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5(6): e1000529.

  3. Bycroft, C., et al. Genome-wide genetic data on~ 500,000 UK Biobank participants. BioRxiv (2017): 166298.

Examples

## In the current working directory
bedFile <- system.file("extdata", "alignedData.bed", package="Gimpute")
bimFile <- system.file("extdata", "alignedData.bim", package="Gimpute") 
famFile <- system.file("extdata", "alignedData.fam", package="Gimpute")
system(paste0("scp ", bedFile, " ."))   
system(paste0("scp ", bimFile, " ."))   
system(paste0("scp ", famFile, " ."))   
inputPrefix <- "alignedData"   
outputPrefix <- "gwasImputed"   
outputInfoFile <- "infoScore.txt"
tmpImputeDir <- "tmpImpute"
## Not run: Requires an executable program PLINK, e.g.
## plink <- "/home/tools/plink"
## phaseImpute(inputPrefix, outputPrefix, autosome=TRUE, 
##             plink, shapeit, imputeTool, impute, qctool, gtool, 
##             windowSize=3000000, effectiveSize=20000, 
##             nCore=40, threshold=0.9, outputInfoFile, SNP=TRUE,
##             referencePanel, impRefDIR, tmpImputeDir, keepTmpDir=TRUE)

transbioZI/Gimpute documentation built on April 10, 2022, 4:20 a.m.