BiocStyle::markdown()
knitr::opts_chunk$set(message=FALSE, error=FALSE, warning=FALSE, fig.width=10, fig.height=10) options(width=80)
Genome-wide association studies (GWAS) have become an essential tool in assisting the improvement of understanding of neuropathology of diseases. Moreover, the imputation of genotyping data with missing information can enhance the power of GWAS by using haplotype relationships between genetic variants [1,2]. In the meantime, the imputation can substantially reduce the cost of genotyping large number of samples. Furthermore, the comparison of significant findings across different cohorts and genotyping platforms is also achievable. Currently large amounts of genotyping data are produced at an unprecedented rate to be analyzed; however, the preparation of ready-to-use data sets needs to be of high quality. Thus the implementation of an efficient and reliable genetic data pre-processing and imputation pipeline is desirable.
In order to ensure the reliability and reproducibility of genome-wide association study (GWAS) data, we set up an efficient, automatic and comprehensive genotype data processing and imputation pipeline termed Gimpute. It consists of pre-processing (genetic variant information updating/matching/liftOver, quality control of genetic variants and study samples, the alignment of variants to the imputation references), pre-phasing, imputation and post-imputation quality control [3], as well as an extension to the Genipe framework in this vignette, which is easy-to-follow and user-friendly.
Gimpute runs on any 64-bit x86 Linux distribution and it requires the following tools [4,5,6,7]:
You can install the tools from the command line as the following examples:
## PLINK wget https://www.cog-genomics.org/static/bin/plink180717/plink_linux_x86_64.zip unzip plink_linux_x86_64.zip ## GCTA64 wget http://cnsgenomics.com/software/gcta/gcta_1.91.5beta.zip unzip gcta_1.91.5beta.zip ## SHAPEIT wget https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.v2.r904.glibcv2.17.linux.tar.gz tar -zxvf shapeit.v2.r904.glibcv2.17.linux.tar.gz ## IMPUTE2 wget https://mathgen.stats.ox.ac.uk/impute/impute_v2.3.2_x86_64_static.tgz tar -zxvf impute_v2.3.2_x86_64_static.tgz ## GTOOL wget http://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool_v0.7.5_x86_64.tgz tar -zxvf gtool_v0.7.5_x86_64.tgz ## QCTOOLv2 wget http://www.well.ox.ac.uk/~gav/resources/qctool_v2.0-CentOS6.8-x86_64.tgz tar -zxvf qctool_v2.0-CentOS6.8-x86_64.tgz
Note that, IMPUTE4 can be downloaded only after applying for access. IMPUTE4 and QCTOOLv2 are not required if you are using IMPUTE2 for the imputation.
Development version from Github:
install.packages("devtools") library("devtools") install_github("transbioZI/Gimpute", build_vignettes=TRUE)
git clone https://github.com/transbioZI/Gimpute R CMD build Gimpute R CMD INSTALL Gimpute_*.tar.gz
Gimpute runs on any 64-bit x86 Linux distribution. Additional dependencies are described below.
PLINK format files (PED/MAP or BED/BIM/FAM) are required as the input genotyping data for our pipeline. Free datasets are available under ./extdata/ in our package. The way for loading these PLINK binary files is shown as below. Since sharing of genetic genotyping data is strictly prohibited, we did not upload any private genotyping data for this demonstration.
library(Gimpute) bedFile <- system.file("extdata", "dataChr21.bed", package="Gimpute") bimFile <- system.file("extdata", "dataChr21.bim", package="Gimpute") famFile <- system.file("extdata", "dataChr21.fam", package="Gimpute")
The files processed and generated are named in a consistent way across all datasets as follows:
The recommended directories are named as follows:
## Gimpute has the following dependencies: library(lattice) library(doParallel) ## Define required tools toolDIR <- "/home/Gimpute/tools/" plink <- paste0(toolDIR, "plink") gcta <- paste0(toolDIR, "gcta64") shapeit <- paste0(toolDIR, "shapeit") gtool <- paste0(toolDIR, "gtool") qctool <- paste0(toolDIR, "qctool") imputeTool <- "impute4" ## if (imputeTool == "impute2"){ impute <- paste0(toolDIR, "impute2") } else if (imputeTool == "impute4"){ impute <- paste0(toolDIR, "impute4.1_r291.2") }
Self-prepared configuration files
Genotyping chip annotation file: Most of common chip annotation files can be directly downloaded from Dr. William Rayner'page. This annotation file is particularly important for strand checking and genomic information of genotypes on a variety of genome builds. Gimpute requires the chip annotation file that is organized into two different types:
Imputation reference files: Publicly avaiable reference datasets can be downloaded from IMPUTE2's reference page. The reference files from "1000 Genome Project Phase1v3" are the default panel used in our package. 1000 Genomes Phase 3 is an alternative reference panel.
## Download imputation reference panels ## Reference panel 1: 1000Gphase1v3_macGT1 wget https://mathgen.stats.ox.ac.uk/impute/ALL_1000G_phase1integrated_v3_impute_macGT1.tgz tar -zxvf ALL_1000G_phase1integrated_v3_impute_macGT1.tgz ## Reference panel 2: 1000Gphase3 wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.tgz wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3_chrX.tgz tar -zxvf 1000GP_Phase3.tgz tar -zxvf 1000GP_Phase3_chrX.tgz
A string indicating the type of imputation reference panels is used: c("1000Gphase1v3_macGT1", "1000Gphase3"). This must be defined before running pipeline.
## Define the directory where you place the imputation reference files impRefDIR <- "/home/imputeRef/1000Gphase1v3_macGT1/" referencePanel <- "1000Gphase1v3_macGT1" ## Alternatively, impRefDIR <- "/home/imputeRef/1000GP_Phase3/" referencePanel <- "1000Gphase3"
All files named in this subsection are in the directory 1-genoUpdate/.
In this subsection, the function updateGenoInfo()
is mandatory and updates genotype information of original genotype data by the use of metadata and configuration files.
## Not run bedFile <- system.file("extdata", "dataChr21.bed", package="Gimpute") bimFile <- system.file("extdata", "dataChr21.bim", package="Gimpute") famFile <- system.file("extdata", "dataChr21.fam", package="Gimpute") metadataFile <- system.file("extdata", "1_01_metaData.txt", package="Gimpute") removedSampIDFile <- system.file("extdata", "excludedSampIDs.txt", package="Gimpute") excludedProbeIdsFile <- system.file("extdata", "excludedProbeIDs.txt", package="Gimpute") chipAnnoFile <- system.file("extdata", "coriellAffyChip.txt", package="Gimpute")
inputPrefix <- "dataChr21" ancestrySymbol <- "EUR" outputPrefix <- "1_11_removedYMtSnp" metaDataFile <- "1_01_metaData.txt" chipType <- "rsIDstudy" updateGenoInfo(plink, inputPrefix, metaDataFile, removedSampIDFile, ancestrySymbol, excludedProbeIdsFile, chipAnnoFile, chipType, outputPrefix, keepInterFile=TRUE)
Note that, the output files of each step are the input of the next step.
system(paste0("scp ", metadataFile, " ."))
inputPrefix <- "dataChr21" outputPrefix2 <- "1_02_removedExclInst" removedSampIDFile <- system.file("extdata", "excludedSampIDs.txt", package="Gimpute") removeSampID(plink, removedSampIDFile, inputPrefix, outputPrefix=outputPrefix2)
metaDataFile <- "1_01_metaData.txt" outputPrefix3 <- "1_03_replacedGroupAndSex" updateGroupIdAndSex(plink, inputPrefix=outputPrefix2, metaDataFile, outputPrefix=outputPrefix3)
outputPrefix4 <- "1_04_removedNoGroupId" removeNoGroupId(plink, inputPrefix=outputPrefix3, outputPrefix=outputPrefix4)
outputPrefix5 <- "1_05_removedWrongAnceInst" removedWrongAnceInst(plink, inputPrefix=outputPrefix4, metaDataFile, ancestrySymbol, outputPrefix=outputPrefix5)
outputPrefix6 <- "1_06_removedExclProbe" removedExclProbe(plink, inputPrefix=outputPrefix5, excludedProbeIdsFile, outputPrefix=outputPrefix6)
outputPrefix7 <- "1_07_removedUnmapProbes" outputSNPfile7 <- "1_07_probesUnmapped2ChipRef.txt" removedUnmapProbes(plink, inputPrefix=outputPrefix6, chipAnnoFile, outputPrefix=outputPrefix7, outputSNPfile=outputSNPfile7)
outputSNPdupFile8 <- "1_08_probesDouble.txt" outputPrefix8 <- "1_08_removedDoubleProbes" removedDoubleProbes(plink, inputPrefix=outputPrefix7, chipAnnoFile, chipType, outputSNPdupFile=outputSNPdupFile8, outputPrefix=outputPrefix8)
outputPrefix9 <- "1_09_updatedSnpInfo" updatedSnpInfo(plink, inputPrefix=outputPrefix8, chipAnnoFile, chipType, outputPrefix=outputPrefix9)
outputPrefix10 <- "1_10_splitXchr" splitXchr(plink, inputPrefix=outputPrefix9, outputPrefix=outputPrefix10)
outputPrefix11 <- "1_11_removedYMtSnp" removedYMtSnp(plink, inputPrefix=outputPrefix10, outputPrefix=outputPrefix11)
All files named in this subsection are in the directory 2-genoQC/.
In this subsection, the function genoQC()
is mandatory and performs QC on the
genotype data. removedSnpHetX()
, removedMaleHetX()
, and removeOutlierByPCs()
are optional and can be used part of QC . removedSnpHetX()
and removedMaleHetX()
exclude heterozygous markers in male, and males with haploid heterozygous SNPs
on chromosome X. Our default QC procedure does not remove any population outliers
before imputation. However, if your study data includes significant population outliers
and this may affect the qualtiy of subquent imputation analysis. For this, it is
recommended to remove them by the function removeOutlierByPCs()
after looking at
PCA plot generated by plotPCA4plink()
.
This step does QC in a single function, including all required substeps to ensure that the QC-ed genotype data is of high quality prior to imputation.
inputPrefix <- "2_02_removedInstHetX" outputPrefix <- "2_13_removedSnpHweFemaleX" genoQC(plink, inputPrefix, snpMissCutOffpre=0.05, sampleMissCutOff=0.02, Fhet=0.2, cutoffSubject=0.05, cutoffSNP=0.1, snpMissCutOffpost=0.02, snpMissDifCutOff=0.02, femaleChrXmissCutoff=0.05, pval4autoCtl=0.000001, pval4femaleXctl=0.000001, outputPrefix, keepInterFile=TRUE)
Alternatively, we elaborate step by step instructions of QC procedure in genoQC()
for better understanding.
Note that, the output files of each step are the input of the next step.
inputPrefix <- "1_11_removedYMtSnp" hhCutOff <- 0.005 ## can be tuned outputPrefix <- "2_01_removedSnpHetX" outputHetSNPfile <- "2_01_snpHHfreqAll.txt" outputRetainSNPfile <- "2_01_snpHHfreqRetained.txt" removedSnpHetX(plink, inputPrefix, hhCutOff, outputPrefix, outputHetSNPfile, outputRetainSNPfile)
inputPrefix <- "2_01_removedSnpHetX" hhSubjCutOff <- 15 outputPrefix <- "2_02_removedInstHetX" outputSubjHetFile <- "2_02_instHetXfreqAll.txt" outputRetainSubjectFile <- "2_02_instHetXfreqRetained.txt" outputHetSNPfile <- "2_02_snpHHfreqAll.txt" removedMaleHetX(plink, inputPrefix, hhSubjCutOff, outputPrefix, outputSubjHetFile, outputRetainSubjectFile, outputHetSNPfile)
These two steps are automatically skipped if you are working on the autosomal data.
outputPrefix3 <- "2_03_setHeteroHaploMissing" setHeteroHaploMissing(plink, inputPrefix, outputPrefix=outputPrefix3)
outputPrefix4 <- "2_04_removedSnpMissPre" removedSnpMiss(plink, snpMissCutOff=0.05, inputPrefix=outputPrefix3, outputPrefix=outputPrefix4)
outputPrefix5 <- "2_05_removedInstMiss" removedInstMiss(plink, sampleMissCutOff=0.02, inputPrefix=outputPrefix4, outputPrefix=outputPrefix5)
outputPrefix6 <- "2_06_removedInstFhet" removedInstFhet(plink, Fhet=0.2, inputPrefix=outputPrefix5, outputPrefix=outputPrefix6)
outputPrefix7 <- "2_07_removedParentIdsMiss" removedParentIdsMiss(plink, inputPrefix=outputPrefix6, outputPrefix=outputPrefix7)
outputPrefix8 <- "2_08_removedMendelErr" removedMendelErr(plink, inputPrefix=outputPrefix7, cutoffSubject, cutoffSNP, outputPrefix=outputPrefix8)
outputPrefix9 <- "2_09_removedSnpMissPost" removedSnpMiss(plink, snpMissCutOff=0.02, inputPrefix=outputPrefix8, outputPrefix=outputPrefix9)
outputPrefix10 <- "2_10_removedSnpMissDiff" groupLabel <- getGroupLabel(inputFAMfile=paste0(outputPrefix8, ".fam")) removedSnpMissDiff(plink, inputPrefix=outputPrefix9, snpMissDifCutOff=0.02, outputPrefix=outputPrefix10, groupLabel)
outputPrefix11 <- "2_11_removedSnpFemaleChrXmiss" removedSnpFemaleChrXmiss(plink, femaleChrXmissCutoff=0.05, inputPrefix=outputPrefix10, outputPrefix=outputPrefix11)
outputPvalFile <- "2_12_snpHwePvalAuto.txt" outputSNPfile <- "2_12_snpRemovedHweAuto.txt" outputPrefix12 <- "2_12_removedSnpHweAuto" if (groupLabel == "control" | groupLabel == "caseControl"){ removedSnpHWEauto(groupLabel="control", plink, inputPrefix=outputPrefix11, pval=0.000001, outputPvalFile, outputSNPfile, outputPrefix=outputPrefix12) }
outputPrefix <- "2_13_removedSnpHweFemaleX" outputPvalFile <- "2_13_snpHwePvalfemaleXct.txt" outputSNPfile <- "2_13_snpRemovedHweFemaleXct.txt" removedSnpFemaleChrXhweControl(plink, inputPrefix=outputPrefix12, pval=0.000001, outputPvalFile, outputSNPfile, outputPrefix)
inputPrefix <- "2_13_removedSnpHweFemaleX" outputPC4subjFile <- "2_14_eigenvalAfterQC.txt" outputPCplotFile <- "2_14_eigenvalAfterQC.png" plotPCA4plink(gcta, inputPrefix, nThread=20, outputPC4subjFile, outputPCplotFile) cutoff <- c(-0.4, 0.2) cutoffSign <- "greater" ## not used if cutoff == NULL, and with two values inputPC4subjFile <- "2_14_eigenvalAfterQC.txt" outputPC4outlierFile <- "2_14_eigenval4outliers.txt" outputPCplotFile <- "2_14_removedOutliers.png" outputPrefix <- "2_14_removedOutliers" removeOutlierByPCs(plink, gcta, inputPrefix, nThread=20, cutoff, cutoffSign, inputPC4subjFile, outputPC4outlierFile, outputPCplotFile, outputPrefix)
All files named in this subsection are in the directory 3-checkAlign/.
The function checkAlign2ref()
performs the alignment against a given reference
panel by considering the following parameters: variant name, genomic position and
the allele presentation. Output files are generated in order.
If the genome build of already QC-ed is different from the genome build of the imputation reference files, then lift the data from the first genome build to the second. If the genome build is the same, then just copy the input files.
Output files: 3_1_QCdata.#
Remove SNPs for which the name has a different position (i.e. combination of base pair position and chromosome) in the imputation reference files.
Primary output files: 3_2_removedSnpDiffNamePos.#
Output file with the SNPs names for which a different position is contained in the imputation reference files: 3_2_snpDiffNamePos.txt
Remove SNPs for which the combination of base pair position and chromosome is not contained in the imputation reference files (ignoring the SNP name).
Output file with the SNPs names for which the combination of base pair position and chromosome is not contained in the imputation reference files: 3_3_snpMissPos.txt
Remove SNPs from which the combination of base pair position and chromosome (ignoring the SNP name) has an allele which is not in the imputation reference files for that combination of base pair position and chromosome.
Primary output files: 3_4_removedSnpDiffAlleles.#
renamePlinkBFile(inputPrefix="2_14_removedOutliers", outputPrefix="3_1_QCdata", action="move") inputFile <- paste0(impRefDIR,"*.legend.gz") ## To recode the intermediate disk usage ## keep in the same directory. bimReferenceFile <- "bimImputeRef.txt" .prepareLegend2bim(inputFile, referencePanel, outputFile=bimReferenceFile, ncore=nCores) inputPrefix <- "3_1_QCdata" out2.snp <- "3_2_snpSameNameDiffPos" out2 <- "3_2_removedSnpSameNameDiffPos" out3 <- "3_3_removedSnpMissPos" out3.snp <- "3_3_snpMissPos" out4 <- "3_4_removedSnpDiffAlleles" out4.snp <- "3_4_snpDiffAlleles" out4.snpRetained <- "3_4_snpImpRefAlleles" nCores <- detectCores() checkAlign2ref(plink, inputPrefix, referencePanel, bimReferenceFile, out2, out2.snp, out3, out3.snp, out4, out4.snp, out4.snpRetained, nCore=nCores)
All files named in this subsection are in the directory 4-imputation/.
In this subsection, three major functions removedMonoSnp()
, phaseImpute()
and postImpQC()
are used for the purpose of
excluding monomorphic SNPs, phasing, imputation and post imputation QC.
outputPrefix <- "4_1_removedMonoSnp" outputMonoSNPfile <- "4_1_snpMonoRemoved.txt" removedMonoSnp(plink, inputPrefix="3_4_removedSnpDiffAlleles", outputPrefix, outputSNPfile=outputMonoSNPfile)
inputPrefix <- "4_1_removedMonoSnp" outputPrefix <- "4_2_imputedDataset" outputInfoFile <- "4_2_snpImputedInfoScore.txt" tmpImputeDir <- paste0(imputeTool, "_", referencePanel) nCores <- detectCores() phaseImpute(inputPrefix, outputPrefix, autosome=TRUE, plink, shapeit, imputeTool, impute, qctool, gtool, windowSize=3000000, effectiveSize=20000, nCore=nCores, threshold=0.9, outputInfoFile, SNP=TRUE, referencePanel, impRefDIR, tmpImputeDir, keepTmpDir=TRUE)
Output files: 4_3_wellImputeData.#
Remove all SNPs (if any) which have the same position as a SNP in 4_1_snpMonoRemoved.txt has in aligned instance data.
Output files: 4_4_removedMonoSnpAfter.#
Add previous identified monomorphic SNPs (if any) in 4_1_snpMonoRemoved.txt with their values from the lifted instance data.
Output files: 4_5_addedMonoSnpAfter.#
Remove SNPs which have a non missing value for less then 20 instances.
inputPrefix <- "4_2_imputedDataset" out1 <- "4_3_wellImputeData" out2 <- "4_4_removedMonoSnpAfter" out3 <- "4_5_addedMonoSnpAfter" out4 <- "4_6_removedSnpMissPostImp" outputInfoFile <- "4_2_snpImputedInfoScore.txt" outputMonoSNPfile <- "4_1_snpMonoRemoved.txt" prefixAlign2ref <- "3_4_removedSnpDiffAlleles" outRemovedSNPfile <- "4_6_snpRemovedMissPostImp.txt" outRetainSNPfile <- "4_6_snpRetainMissPostImp.txt" postImpQC(plink, inputPrefix, out1, out2, out3, out4, outputInfoFile, infoScore=0.6, outputMonoSNPfile, prefixAlign2ref, missCutoff=20, outRemovedSNPfile, outRetainSNPfile, referencePanel)
All files named in this subsection are in the directory 5-reductAndExpand/.
reductExpand()
function extracts the genotypes from well imputed data so that
PLINK data with only SNPs before imputation are remained. On the basis of
extracted genotypes, a subset of non-imputed genotypes that are different from
the imputation reference panel are then added, including SNPs with different alleles,
missing positions, and different positions.
Output files: 5_1_reducedToSpecific.#
Add the SNPs (if any) with different alleles from the dataset before removing SNPs.
Output files: 5_2_specificDiffAllele.#
Add the SNPs (if any) with missing positions from the dataset before removing SNPs.
Output files: 5_3_specificMissPos.#
Add the SNPs (if any) with different positions from the dataset before removing SNPs.
inputPrefix <- "4_6_removedSnpMissPostImp" inputQCprefix <- "3_1_QCdata" snpRefAlleleFile <- "3_4_snpImpRefAlleles.txt" snpDiffAlleleFile <- "3_4_snpDiffAlleles.txt" snpMissPosFile <- "3_3_snpMissPos.txt" snpSameNameDifPosFile <- "3_2_snpSameNameDiffPos.txt" dir5 <- "./5-reductAndExpand/" ## imputed dataset system(paste0("scp ./4-imputation/", inputPrefix, ".* ", dir5)) system(paste0("scp ./3-checkAlign/", inputQCprefix, ".* ", dir5)) system(paste0("scp ./3-checkAlign/", snpRefAlleleFile, " ", dir5)) system(paste0("scp ./3-checkAlign/", snpDiffAlleleFile, " ", dir5)) system(paste0("scp ./3-checkAlign/", snpMissPosFile, " ", dir5)) system(paste0("scp ./3-checkAlign/", snpSameNameDifPosFile, " ", dir5)) setwd(dir5) reducedToSpecificfn <- "5_1_reducedToSpecific" specificDiffAllelefn <- "5_2_specificDiffAllele" specificMissPosfn <- "5_3_specificMissPos" specificDiffPosfn <- "5_4_specificDiffPos" reductExpand(plink, referencePanel, inputPrefix, inputQCprefix, snpRefAlleleFile, snpDiffAlleleFile, snpMissPosFile, snpSameNameDifPosFile, reducedToSpecificfn, specificDiffAllelefn, specificMissPosfn, specificDiffPosfn) setwd("..")
Copy and rename the results of interest to the directory ./6-finalResults.
The metadata information file, the final well imputed and a subset of imputed dataset (dataset-specific) are essential for the downstream analysis.
dir6 <- "./6-finalResults/" system(paste0("scp ./1-genoUpdate/1_01_metaData.txt ", dir6)) system(paste0("scp ./4-imputation/4_6_removedSnpMissPostImp.* ", dir6)) system(paste0("scp ./5-reductAndExpand/5_4_specificDiffPos.* ", dir6)) setwd(dir6) renamePlinkBFile(inputPrefix="4_6_removedSnpMissPostImp", outputPrefix="imputedSnpsDataset", action="move") renamePlinkBFile(inputPrefix="5_4_specificDiffPos", outputPrefix="specificSnpsDataset", action="move")
Gimpute's modular structure allows the integration of other existing imputation workflows, allowing users to choose their preferred imputation strategy. For demonstration, we have embedded Genipe as an external imputation tool [8].
To set up Genipe, please refer to Genipe's installation page for more details.
The imputation reference can be downloaded directly from IMPUTE2's reference page: 1000 Genome phase 3 or Genipe's page.
This reference genome is used for the strand check, you can download from Genipe's page.
## example impRefDIR4genipe <- "../1000GP_Phase3/" fastaFile <- "../hg19/hg19.fasta"
chrs =22 inputPrefix <- "3_4_removedSnpDiffAlleles" thread4impute2 <- 20 ## tune by yourself thread4shapeit <- 30 segmentSize <- 3000000 imputeTool <- "impute2" ## only impute2 imputedByGenipe(chrs, impRefDIR4genipe, inputPrefix, shapeit, impute, plink, fastaFile, segmentSize, thread4impute2, thread4shapeit)
chr <- 2 inputImpute2 <- 'chr2.33000001_36000000.impute2' probability <- 0.9 completionRate <- 0.98 # info <- 0.6 outputPrefix <- paste0('imputedChr', chr) mergeByGenipe(inputImpute2, chr, probability, completionRate, info, outputPrefix)
## extract imputed markers using Genipe chr <- 3 inputImpute2 <- paste0('chr', chr,'.imputed.impute2') inputMAP <- paste0('chr', chr,'.imputed.map') format <- 'bed' prob <- 0.9 outputPrefix <- paste0('imputedChr', chr) extractByGenipe(inputImpute2, inputMAP, outputPrefix, format, prob)
## sessionInfo() # R version 3.5.0 (2018-04-23) # Platform: x86_64-redhat-linux-gnu (64-bit) # Running under: CentOS Linux 7 (Core) # Matrix products: default # BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so # locale: # [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C # [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 # [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 # [7] LC_PAPER=en_US.UTF-8 LC_NAME=C # [9] LC_ADDRESS=C LC_TELEPHONE=C # [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C # attached base packages: # [1] parallel stats graphics grDevices utils datasets methods # [8] base # other attached packages: # [1] doParallel_1.0.11 iterators_1.0.10 foreach_1.4.4 lattice_0.20-35 # [5] Gimpute_0.99.9 # loaded via a namespace (and not attached): # [1] compiler_3.5.0 codetools_0.2-15 grid_3.5.0
[1] Marchini, J., et al. (2010). Genotype imputation for genome-wide association studies. Nat Rev Genet 11(7): 499-511.
[2] Marchini, J., et al. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39(7): 906-913.
[3] Schizophrenia Working Group of the Psychiatric Genomics, C. (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511(7510): 421-427.
[4] Purcell, Shaun, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81.3 (2007): 559-575.
[5] Howie, B., et al. (2012). Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44(8): 955-959.
[6] Howie, B. N., et al. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5(6): e1000529.
[7] Bycroft, Clare, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T. Elliott, Kevin Sharp, Allan Motyer et al. Genome-wide genetic data on~ 500,000 UK Biobank participants. BioRxiv (2017): 166298.
[8] Lemieux Perreault, L. P., et al. (2016). genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools. Bioinformatics, 32(23), 3661-3663.
If you have any trouble or comment ? --> Contact: junfang.chen@zi-mannheim.de
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.