title: "main.r" author: "JohnHadish" date: "3/8/2020" output: html_document
Here we implement a new GWAS package "DJ.GLM.GWAS" which performs GLM. This package is implemented in r, and contains functions for performing the GWAS and graphing results. We hypthesis that this will outperform correlation methods.
knitr::opts_chunk$set(echo = TRUE)
Each Function should have its own demonstration script:
Download Data:
myGenotype=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T) myPosition=read.table(file="http://zzlab.net/GAPIT/data/mdp_SNP_information.txt",head=T) myPhenotype=read.table(file="http://zzlab.net/GAPIT/data/CROP545_Phenotype.txt",head=T) myCovariate=read.table(file="http://zzlab.net/GAPIT/data/CROP545_Covariates.txt",head=T)
Load Dependencies and Packages:
library(plotly) library(ggplot2) library(gridExtra) library(knitr) library(devtools) install_github("JohnHadish/DJ.GLM.GWAS") library(DJ.GLM.GWAS)
Run GWAS using GLM and Correlation on the provided data:
# prcomp is a default principal component analysis allPC <- prcomp(myGenotype[,-1]) # pcThreshold removes all princpal components that account for less than threshold threshPC <- pcThreshold(allPC) # pcCovariance determines if covariates are linear with principal components finalPC <- pcCoVariance(threshPC, myCovariate, .7) P <- GWASbyGLM(myGenotype, myPhenotype[,2], finalPC) # GWAS by Correlation P2 <- GWASbyCor(myGenotype[,-1], myPhenotype[,-1])
P and P2 contains the p-values corresponding to the two competing methods GWASbyGLM and GWASbyCor for the association tests between phenotype and markers. Markers are tested one at a time with covariates in C included as covariates. This package GWASbyGLM also performs PCA and incorporates PCs as cofactors for GWAS. Also,it automatically excludes the PCs that are in linear dependent to the covariates.
plotManhatten(P, main = "GWAS by GLM") plotManhatten(P2, main = "GWAS by Cor")
Returns: a QQ plot of observed and expected p values
par(mfrow=c(1,2)) plotQQ(P, main = "GWAS by GLM") plotQQ(P2, main = "GWAS by Cor")
30 GWAS Simulations
# These only need to be calculated 1 time since they only pertain to structure of Genotype allPC <- prcomp(myGenotype[,-1]) threshPC <- pcThreshold(allPC) finalPC <- pcCoVariance(threshPC, myCovariate, .7) #Create BLINK input data GD=t(myGenotype[,-1]) write.table(GD,file="myData.dat",quote=F,sep="\t",col.name=F,row.name=F) write.table(myPosition,file="myData.map",quote=F,sep="\t",col.name=T,row.name=F) scoresTop10 = matrix(NA,3,30) time30 = matrix(NA,3,30) total <- 30 # create progress bar pb <- txtProgressBar(min = 0, max = total, style = 3) for(i in 1:30) { simP <- G2P(myGenotype[,-1],0.75,alpha=1,NQTN=10,distribution = "norm") ### Run Correlation GWAS### simCor_Time_Start <- proc.time() simCor <- GWASbyCor(myGenotype[,-1], simP$y) simCor_Time_End <- proc.time() ### Run GLM GWAS### simGLM_Time_Start <- proc.time() simGLM <- GWASbyGLM(myGenotype, simP$y, finalPC ) simGLM_Time_End <- proc.time() ### Run BLINK GWAS#### simBLINK_Time_Start <- proc.time() # write this phenotype for BLINK to use:\ P_BLINK <- data.frame(myPhenotype[,1], simP$y) names(P_BLINK)[1] <- "Taxa" names(P_BLINK)[2] <- "Obs" write.table(P_BLINK, file="myData.txt",quote=F,sep="\t",col.name=T,row.name=F) # # Blink rewrites files, so I have to recreate them every time if I wantit to run write.table(GD,file="myData.dat",quote=F,sep="\t",col.name=F,row.name=F) write.table(myPosition,file="myData.map",quote=F,sep="\t",col.name=T,row.name=F) # Run Blink on the command line, read in results system("/home/john/Documents/ZZClass/BLINK/blink_linux --gwas --file myData --numeric", wait = TRUE, ignore.stdout = TRUE, ignore.stderr = TRUE) GMP <- read.delim("Obs_GWAS_result.txt", head = T) simBLINK <- t(as.matrix(GMP[,5])) simBLINK_Time_End <- proc.time() time30[2,i] = simCor_Time_End[3] - simCor_Time_Start[3] time30[1,i] = simGLM_Time_End[3] -simGLM_Time_Start[3] time30[3,i] = simBLINK_Time_End[3] - simBLINK_Time_Start[3] scoresTop10[2,i] = detectTop10(simCor, simP$QTN.position) scoresTop10[1,i] = detectTop10(simGLM,simP$QTN.position) scoresTop10[3,i] = detectTop10(simBLINK, simP$QTN.position) setTxtProgressBar(pb, i) } close(pb) print(paste("Average of GWAS by Cor average: ", mean(scoresTop10[2,]), "SD: ", sd(scoresTop10[2,]), "Time: ", mean(time30[2,]))) print(paste("Average of GWAS by GLM average: ", mean(scoresTop10[1,]), "SD: ", sd(scoresTop10[1,]), "Time: ", mean(time30[1,]))) print(paste("Average of GWAS by BLINK average: ", mean(scoresTop10[3,]), "SD: ", sd(scoresTop10[3,]), "Time: ", mean(time30[3,])))
Observations:
We noted the mean times for running this data set using the 3 procedures, GWASbyCor,GWASbyGLM and BLINK.Our procedure is definitely a good competitor with respect to the other two,but the run time is takes a few more seconds.(We tried to apply the apply function but that didn't quite help)
system("R CMD Rd2pdf --output=./Reference_Manual.pdf .")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.