View source: R/phasingImpute.R
phaseImpute | R Documentation |
Perform phasing, imputation and conversion from IMPUTE2 or GEN format into PLINK binary files.
phaseImpute( inputPrefix, outputPrefix, autosome = TRUE, plink, shapeit, imputeTool, impute, qctool, gtool, windowSize = 3e+06, effectiveSize = 20000, nCore = 40, threshold = 0.9, outputInfoFile, SNP = TRUE, referencePanel, impRefDIR, tmpImputeDir, keepTmpDir = TRUE )
inputPrefix |
the prefix of the input PLINK binary files for the imputation. |
outputPrefix |
the prefix of the output PLINK binary files after imputation. |
autosome |
a logical value indicating if only autosomal chromosomes are imputed. |
plink |
an executable program in either the current working directory or somewhere in the command path. |
shapeit |
an executable program in either the current working directory or somewhere in the command path. |
imputeTool |
a string indicating the type of imputation tool is used: "impute2" or "impute4". |
impute |
an executable program in either the current working directory or somewhere in the command path. It can be either "impute2" or "impute4". |
qctool |
an executable program in either the current working directory or somewhere in the command path. This is only used if imputeTool is "impute4". |
gtool |
an executable program in either the current working directory or somewhere in the command path. |
windowSize |
the window size of each chunk. The default value is 3000000. |
effectiveSize |
this parameter controls the effective population size. Commonly denoted as Ne. A universal -Ne value of 20000 is suggested. |
nCore |
the number of cores used for splitting chromosome by PLINK, phasing, imputation, genotype format modification, genotype conversion, and merging genotypes. The default value is 40. |
threshold |
threshold for merging genotypes from GEN probability. Default 0.9. |
outputInfoFile |
the output file of impute2 info scores consisting of two columns: all imputed SNPs and their info scores. |
SNP |
A logical value indicating if the data is entirely comprised single nucleotide polymorphisms then it can be set as TRUE and the genotypes are expressed as pairs of A,C,G,T and unknowns are represented as N N. |
referencePanel |
a string indicating the type of imputation reference panels is used: "1000Gphase1v3_macGT1" or "1000Gphase3". |
impRefDIR |
the directory where the imputation reference files are located. |
tmpImputeDir |
the name of the temporary directory used for storing phasing and imputation results. |
keepTmpDir |
a logical value indicating if the directory 'tmpImputeDir' should be kept or not. The default is TRUE. |
The whole imputation process mainly consists of the following steps: 1.) Phasing the input PLINK data using an existing imputation reference; 2.) Imputing the input PLINK data using phased results and an existing reference data; 3.) Converting IMPUTE2 or GEN format data into PLINK format. 4.) Combining all imputed data into whole-genome PLINK binary files. 5.) Filtering out imputed variants with bad imputation quality. Parallel computing in R is supported.
Note that chromosome X is not supported for the impute4. 1.) The filtered imputed PLINK binary files; 2.) The final PLINK binary files including bad imputed variants; 3.) A pure text file contains the info scores of all imputed SNPs with two columns: SNP names and the corresponding info scores.
Junfang Chen
Howie, B., et al. (2012). Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44(8): 955-959.
Howie, B. N., et al. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5(6): e1000529.
Bycroft, C., et al. Genome-wide genetic data on~ 500,000 UK Biobank participants. BioRxiv (2017): 166298.
## In the current working directory bedFile <- system.file("extdata", "alignedData.bed", package="Gimpute") bimFile <- system.file("extdata", "alignedData.bim", package="Gimpute") famFile <- system.file("extdata", "alignedData.fam", package="Gimpute") system(paste0("scp ", bedFile, " .")) system(paste0("scp ", bimFile, " .")) system(paste0("scp ", famFile, " .")) inputPrefix <- "alignedData" outputPrefix <- "gwasImputed" outputInfoFile <- "infoScore.txt" tmpImputeDir <- "tmpImpute" ## Not run: Requires an executable program PLINK, e.g. ## plink <- "/home/tools/plink" ## phaseImpute(inputPrefix, outputPrefix, autosome=TRUE, ## plink, shapeit, imputeTool, impute, qctool, gtool, ## windowSize=3000000, effectiveSize=20000, ## nCore=40, threshold=0.9, outputInfoFile, SNP=TRUE, ## referencePanel, impRefDIR, tmpImputeDir, keepTmpDir=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.