title: "main.r" author: "JohnHadish" date: "3/8/2020" output: html_document


Here we implement a new GWAS package "DJ.GLM.GWAS" which performs GLM. This package is implemented in r, and contains functions for performing the GWAS and graphing results. We hypthesis that this will outperform correlation methods.

knitr::opts_chunk$set(echo = TRUE)

Testing The Program:

Each Function should have its own demonstration script:

Download Data:

myGenotype=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T)
myPosition=read.table(file="http://zzlab.net/GAPIT/data/mdp_SNP_information.txt",head=T)
myPhenotype=read.table(file="http://zzlab.net/GAPIT/data/CROP545_Phenotype.txt",head=T)
myCovariate=read.table(file="http://zzlab.net/GAPIT/data/CROP545_Covariates.txt",head=T)

Load Dependencies and Packages:

library(plotly)
library(ggplot2)
library(gridExtra)
library(knitr)

library(devtools)
install_github("JohnHadish/DJ.GLM.GWAS")
library(DJ.GLM.GWAS)

Run GWAS using GLM and Correlation on the provided data:

# prcomp is a default principal component analysis
allPC <- prcomp(myGenotype[,-1])
# pcThreshold removes all princpal components that account for less than threshold
threshPC <- pcThreshold(allPC)
# pcCovariance  determines if covariates are linear with principal components
finalPC <- pcCoVariance(threshPC, myCovariate, .7)
P <- GWASbyGLM(myGenotype, myPhenotype[,2], finalPC)

# GWAS by Correlation
P2 <- GWASbyCor(myGenotype[,-1], myPhenotype[,-1])

P and P2 contains the p-values corresponding to the two competing methods GWASbyGLM and GWASbyCor for the association tests between phenotype and markers. Markers are tested one at a time with covariates in C included as covariates. This package GWASbyGLM also performs PCA and incorporates PCs as cofactors for GWAS. Also,it automatically excludes the PCs that are in linear dependent to the covariates.

Manhattan Plots of GWAS by GLM and Cor

plotManhatten(P, main = "GWAS by GLM")
plotManhatten(P2, main = "GWAS by Cor")

Returns: a QQ plot of observed and expected p values

par(mfrow=c(1,2))
plotQQ(P, main = "GWAS by GLM")
plotQQ(P2, main = "GWAS by Cor")

30 GWAS Simulations

# These only need to be calculated 1 time since they only pertain to structure of Genotype
allPC <- prcomp(myGenotype[,-1])
threshPC <- pcThreshold(allPC)
finalPC <- pcCoVariance(threshPC, myCovariate, .7)

#Create BLINK input data
GD=t(myGenotype[,-1])
write.table(GD,file="myData.dat",quote=F,sep="\t",col.name=F,row.name=F)
write.table(myPosition,file="myData.map",quote=F,sep="\t",col.name=T,row.name=F)

scoresTop10 = matrix(NA,3,30)
time30 = matrix(NA,3,30)

total <- 30
# create progress bar
pb <- txtProgressBar(min = 0, max = total, style = 3)
for(i in 1:30)
{
  simP <- G2P(myGenotype[,-1],0.75,alpha=1,NQTN=10,distribution = "norm")


  ### Run Correlation GWAS###
  simCor_Time_Start <- proc.time()
  simCor <- GWASbyCor(myGenotype[,-1], simP$y)
  simCor_Time_End <- proc.time()

  ### Run GLM GWAS###
  simGLM_Time_Start <- proc.time()
  simGLM <- GWASbyGLM(myGenotype, simP$y, finalPC )
  simGLM_Time_End <- proc.time()

  ### Run BLINK GWAS####
  simBLINK_Time_Start <- proc.time()
    # write this phenotype for BLINK to use:\
  P_BLINK <- data.frame(myPhenotype[,1], simP$y)
  names(P_BLINK)[1] <- "Taxa"
  names(P_BLINK)[2] <- "Obs"
  write.table(P_BLINK, file="myData.txt",quote=F,sep="\t",col.name=T,row.name=F)
  # # Blink rewrites files, so I have to recreate them every time if I wantit to run
  write.table(GD,file="myData.dat",quote=F,sep="\t",col.name=F,row.name=F)
  write.table(myPosition,file="myData.map",quote=F,sep="\t",col.name=T,row.name=F)
  # Run Blink on the command line, read in results
  system("/home/john/Documents/ZZClass/BLINK/blink_linux --gwas --file myData --numeric", wait = TRUE, ignore.stdout = TRUE, ignore.stderr = TRUE)
  GMP <- read.delim("Obs_GWAS_result.txt", head = T)
  simBLINK <- t(as.matrix(GMP[,5]))
  simBLINK_Time_End <- proc.time()

  time30[2,i] = simCor_Time_End[3] - simCor_Time_Start[3]
  time30[1,i] = simGLM_Time_End[3] -simGLM_Time_Start[3] 
  time30[3,i] = simBLINK_Time_End[3] - simBLINK_Time_Start[3]

  scoresTop10[2,i] = detectTop10(simCor, simP$QTN.position)
  scoresTop10[1,i] = detectTop10(simGLM,simP$QTN.position)
  scoresTop10[3,i] = detectTop10(simBLINK, simP$QTN.position)

  setTxtProgressBar(pb, i)
}
close(pb)

print(paste("Average of GWAS by Cor average: ", mean(scoresTop10[2,]), "SD: ", sd(scoresTop10[2,]), "Time: ", mean(time30[2,])))
print(paste("Average of GWAS by GLM average: ", mean(scoresTop10[1,]), "SD: ", sd(scoresTop10[1,]), "Time: ", mean(time30[1,])))
print(paste("Average of GWAS by BLINK average: ", mean(scoresTop10[3,]), "SD: ", sd(scoresTop10[3,]), "Time: ", mean(time30[3,])))

Observations:

We noted the mean times for running this data set using the 3 procedures, GWASbyCor,GWASbyGLM and BLINK.Our procedure is definitely a good competitor with respect to the other two,but the run time is takes a few more seconds.(We tried to apply the apply function but that didn't quite help)

system("R CMD Rd2pdf --output=./Reference_Manual.pdf .")


JohnHadish/DJ.GLM.GWAS documentation built on April 5, 2020, 9:05 p.m.