In stp4freedom/GWhEAT: Performed GWAS in a fast and efficient manner

knitr::opts_chunk$set(echo = TRUE)

Professor: Zhiwu Zhang

Due on March 30, 2020, Monday, 3:10PM, PST

We will need to install the GWhEAT package since this is a working package in it’s early stages of development it’s only available through Github. To download files off Github first download and load the library of the packaged “devtools” using the code below.

#code to install package from github
#install.packages("devtools")
library(devtools)

Next the code below downloads and installs the package GWhEAT from Github. The bottom two line of code in the chunk below make sure the dependencies GWhEAT relies on are also downloaded and installed.

#install package
install_github("stp4freedom/GWhEAT", force =TRUE)
library(GWhEAT)#package name
#Load dependencies 
if (!require("pacman")) install.packages("pacman")
pacman::p_load(ggplot2,knitr,gridExtra)

The code below downloads and loads into the environment data that can be used for this tutorial.

if (!file.exists("GAPIT_Tutorial_Data.zip"))
{
  download.file("http://zzlab.net/GAPIT/GAPIT_Tutorial_Data.zip", destfile = "GAPIT_Tutorial_Data.zip")
  unzip("GAPIT_Tutorial_Data.zip")
}
download.file("http://zzlab.net/GAPIT/data/CROP545_Covariates.txt", destfile = "CROPS545_Covariates.txt")
download.file("http://zzlab.net/GAPIT/data/CROP545_Phenotype.txt", destfile = "CROPS545_Phenotype.txt")
# Import the GAPIT demo data genotypes
gt_scan <- data.frame(read.table("GAPIT_Tutorial_Data/mdp_numeric.txt", header = T, stringsAsFactors = F, sep = "\t", nrows = 1))
classes <- sapply(gt_scan, class)
genotypes <- data.frame(read.table("GAPIT_Tutorial_Data/mdp_numeric.txt", header = T, row.names = 1, colClasses = classes, stringsAsFactors = F, sep = "\t"))
#import genetic map, phenotypes, and covariates
GM <- read.table("GAPIT_Tutorial_Data/mdp_SNP_information.txt", header = T, stringsAsFactors = F, sep = "\t")
CV <- read.table("CROPS545_Covariates.txt", header = T, stringsAsFactors = F, sep = "\t")
phenotypes <- read.table("CROPS545_Phenotype.txt", header = T, stringsAsFactors = F, sep = "\t")
#Create phenotypes with just n by 1
phenotypes_n_1=phenotypes[,2]

Problem 1 (15 Pts)

The package should contain at least three input: y, X , and C that are R objects of numeric data frame. Their dimensions are n by 1, n by m, and n by t corresponding to phenotype, genotype and covariate data, where n is number of individuals, m is number of markers, and t is number of covariates. The function should return probability values with dimension of 1 by m for the association tests between phenotype and markers. Markers are tested one at a time with covariates in C included as covariates

Please see GWAS_apply.R for the source code to these questions

Our gwas function is called "GWASapply" and the input is "pheno" for the y object, "geno" for the X object, and "Cov" for the C object. They follow the correct dimensions. The function returns various results, but the probability values are contained in the result "P.value.res" and returns an object of 1 by m.

Problem 2 (25 Pts)

The package should perform PCA and incorporate PCs as cofactors for GWAS. Your package should also automatically exclude the PCs that are in linear dependent to the covariates provided by users.

Please see GWAS_apply.R for the source code to these questions

In our package we use the "prcomp" function to perform PCA on the X "geno" object and use a "fix_Dep" function to test for dependency and then excludethe PCs that are in linear dependence.

Problem 3 (20 Pts)

Develop a user manual and tutorials. Name your package and create a logo.

Please see User_Manual_Tutorial.Rmd and User_Manual_Tutorial.html for the user manual and tutorials.

Also included is a Reference_Manual.pdf for all our functions in the package which include "fix_Dep", "manhattan_plot", "qq_plot", "power.fdr", and our main functions "GWASapply", and "GWASapply_rapid".

Problem 4 (15 Pts)

Perform GWAS on the data provided or your own data which must contain cofactors.

Methods:

Our GWASapply function accepts the tutorial data phenotypes, genotypes, CV, and Genetic Map. It displays the PCA variance table and plots and scatterplots along with a QQ-plot and manhattan plot. The messages displayed give information on certain aspects of our GWAS results. The results returned returned are the probability values, cutoff used, order of snps and a table with the significant SNP information.

Results and Interpretation:

For the demo data, our function shows that since there was no cutoff specified, it defaults to a Bonferroni correction that results in a cutoff for significant values of "1.62e-05" with four significant SNPs. The results are shown below in the plots and the information is displayed in table 1. The MAF for each associated SNP ranged from 0.10 to 0.35. If these associated SNP’s MAF were very low like below 0.05 it would be cause of concern, however with the lowest MAF at 0.10 we can have confidence in these SNP associations.

Perform GWAS and display results

#run gwasapply function and display messages, PCA, and plots
p4=GWASapply(phenotypes_n_1,genotypes,CV,GM,PCA.M = 3,plots=TRUE,messages=TRUE,trait="Problem 4")

GWAS Threshold

#final threshold
p4$cutoff.final.res

Significant SNPs and Minor Allele Frequency

Table 1. Significant SNP Results.

#significant snps table
kable(p4$sig.table)

Problem 5 (25 Pts)

Demonstrate that your method is superior to the competing method (GWASbyCor) through simulation.

Methods:

In order to accurately compare our GWAS function and "GWASbyCor", we will simulate data and use 10 QTNs in order to calculate false positives and true positives. We will use Manhattan plots and QQ plots to compare our results with the QTNs, false positives, and true positives denoted on the Manhattan plot. Using the replicate function, We replicated the GWAS functions, and then calculated how many false positives and true positives were calculated. We then took the averages and standard deviations of all the replicates. Then we use a t-test for both results in order to use statisitcal inferences and show a significant difference between the two methods. We also plotted ROC curves for Power vs FDR and Power vs Type 1 error.

Results and Interpretation:

In comparing the Manhattan plots and qq plots. The red lines display the locations for each QTN simulated. The blue lines show false positives, and the black lines show true positives. If there are red lines, in the results they are technically false negatives since the black lines would cover them if present.Since there was no cutoff specified, it defaults to a Bonferroni correction. We can easily see the large amounts of false positives in the GWASbyCor Manhattan plot, with only one appearing in the GWASapply Manhattan plot. The qqplots also show the GWASapply function results with less deviation from the expected except for the significant SNPS. Both ROC curves both look similar between the two functions. For the replicated results, we see that GWASbyCor results in a very large number of false positives with an averge of 31 and a large standard deviation of 48, whereas, our GWASapply function only results in an average of 4 false positives and a standard deviation of around 6. However, both functions result in a similar average for true positives. The t-tests show a significant difference for false positives but not true positives. The results show the GWASapply function results in a much lower rate of false positives.

Simulated Data

source("http://www.zzlab.net/StaGen/2020/R/G2P.R")
source("http://www.zzlab.net/StaGen/2020/R/GWASbyCor.R")
NQTN = 10 #specify number of QTN
h2 = 0.75 #heritability
alpha = 0.6 #alpha value
distribution = "normal" #specify distribution
set.seed(1)
#plotting
#phenotype simulation
mySim=G2P(genotypes, h2, alpha, NQTN, distribution)
p=GWASbyCor(X=genotypes,y=mySim$y)#GWAS by using the correlation methods
p=t(matrix(p))
#power calculations
pres=power.fdr(p, QTN.position=mySim$QTN.position,cutoff=NULL)
#gwasapply function
p5=GWASapply(mySim$y,genotypes,CV,GM,PCA.M = 3,QTN.position=mySim$QTN.position,message=TRUE)

Manhattan Plots

#create manhattan plots
gcm=manhattan_plot(GM,p,cutoff=NULL,mySim$QTN.position,FP=pres$FP.fdr.power,TP=pres$TP.fdr.power, trait = "GWASbyCor",messages=FALSE)
gtm=manhattan_plot(GM,p5$P.value.res,cutoff=NULL,mySim$QTN.position,FP=p5$False.Positive,TP=p5$True.Positive, trait = "GWASapply",messages=FALSE)
grid.arrange(gcm, gtm, nrow = 2, ncol = 1, top = "Manhattan Plots")

QQ Plots

#create qq plots
gcq=qq_plot(GM,p,mySim$QTN.position, trait = "GWASbyCor")
gtq=qq_plot(GM,p5$P.value.res,mySim$QTN.position, trait = "GWASapply")
grid.arrange(gcq,gtq, nrow = 2, ncol = 1, top = "QQ-Plots")

ROC Curve Power vs FDR

#plot power vs fdr for both functions
par(mfrow=c(1,2))
plot(pres$power~pres$fdr,xlab="FDR",ylab="Power",main="ROC Curve: GWASbyCor",type="b")
plot(p5$power.fdr.type1error.res$power~p5$power.fdr.type1error.res$fdr,xlab="FDR",ylab="Power",main="ROC Curve: GWASapply",type="b")

ROC Curve Power vs Type 1

#plot power vs type1 for both functions
par(mfrow=c(1,2))
plot(pres$power~pres$type1error,xlab="Type 1",ylab="Power",main="ROC Curve: GWASbyCor",type="b")
plot(p5$power.fdr.type1error.res$power~p5$power.fdr.type1error.res$type1error,xlab="Type 1",ylab="Power",main="ROC Curve: GWASapply",type="b")

Mean Power, FDR, and True and False Positives

GWASbyCor

gcrep=replicate (30,{
  #simulate phenotype
  mySim=G2P(genotypes, h2, alpha, NQTN, distribution) #phenotype simulation
  p= GWASbyCor(X=genotypes,y=mySim$y)#GWAS by using the correlation methods
  p=data.frame(p)
  #power calculations
  pres=power.fdr(p, QTN.position=mySim$QTN.position,cutoff=NULL)
  p5_pwr=mean(pres$power)
  p5_fdr=mean(pres$fdr)
  FalP=length(pres$FP.fdr.power)
  TruP=length(pres$TP.fdr.power)
  num_output=c()
  num_output=c(p5_pwr,p5_fdr,FalP,TruP)
  })
means=rowMeans(gcrep)
stdevs=apply(gcrep,1,sd)
out4 <- rbind(means,stdevs)
out4 <- as.data.frame(out4)
colnames(out4) <-  c("Power","FDR","False Positive","True Positives")
rownames(out4)<-c("Averages","Standard Deviations")
kable(out4)

GWASapply

garep=replicate(30,{
  #simulate phenotype
  mySim=G2P(genotypes, h2, alpha, NQTN, distribution)
  #run gwasapply function
  p5_rep=GWASapply(mySim$y,genotypes,CV,GM,PCA.M = 3,QTN.position=mySim$QTN.position)
  p5_pwr=mean(p5$power.fdr.type1error.res$power)
  p5_fdr=mean(p5$power.fdr.type1error.res$fdr)
  FalP=length(p5_rep$False.Positive)
  TruP=length(p5_rep$True.Positive)
  num_output=c()
  num_output=c(p5_pwr,p5_fdr,FalP,TruP)
  })
means=rowMeans(garep)
stdevs=apply(garep,1,sd)
out5 <- rbind(means,stdevs)
out5 <- as.data.frame(out5)
colnames(out5) <- c("Power","FDR","False Positive","True Positives")
rownames(out5) <-c("Averages","Standard Deviations")
#output results in a table
kable(out5)

Statistical Inference

Power

#t-test for power
t.test(gcrep[1,],garep[1,], paired = T,  na.rm=T)

FDR

#t-test for FDR
t.test(gcrep[2,],garep[2,], paired = T,  na.rm=T)

False Positives

#t-test for false positive
t.test(gcrep[3,],garep[3,], paired = T,  na.rm=T)

True Positives

#t-test for true positives
t.test(gcrep[4,],garep[4,], paired = T,  na.rm=T)

Problem 6 (25 Pts)

Demonstrate that your package is better than BLINK C version (http://zzlab.net/blink) on either statistical power or speed.

Methods:

In order to accurately compare our GWAS function and Blink, we will simulate data and use 10 QTNs in order to calculate power, FDR, false positives and true positives, and most importantly compare the speed of our functions. We will use Manhattan plots and QQ plots to compare our results with the QTNs, false positives, and true positives denoted on the Manhattan plot. Using the replicate function, We replicated the GWAS functions, and then calculated how many false positives and true positives were calculated. We then took the averages and standard deviations of all the replicates. Then we use a t-test for both results in order to use statisitcal inferences and show a significant difference between the two methods.We did the same thing for Power and FDR. We then tested the replicated functions on two more datasets.

Results and Interpretation:

NQTN = 10 #specify number of QTN
h2 = 0.75 #heritability
alpha = 0.6 #alpha value
distribution = "normal" #specify distribution
set.seed(1)
#plotting
#phenotype simulation
mySim=G2P(genotypes, h2, alpha, NQTN, distribution)

#run replicate
garapid=replicate(30,{
mySim=G2P(genotypes, h2, alpha, NQTN, distribution)
#start time for function
loop_start <- proc.time()
p6_rep=GWASapply_rapid(mySim$y,genotypes,CV,PCA.M = 3)
loop_end <- proc.time()
#end time for function
loop_elapsed <- loop_end[3] - loop_start[3]
#power calculations
p6_power=power.fdr(p6_rep$P.value.res,mySim$QTN.position)
p6_pwr=mean(p6_power$power)
p6_fdr=mean(p6_power$fdr)
FalP=length(p6_power$FP.fdr.power)
TruP=length(p6_power$TP.fdr.power)
pwr.fdr=c(p6_pwr,p6_fdr,loop_elapsed,FalP,TruP)
#timet=c(loop_elapsed[[1]])
  })
means=rowMeans(garapid)
out6 <- rbind(means[1],means[2],means[3],means[4],means[5])
out6 <- as.data.frame(out6)
colnames(out6) <- c("GWASapply Average")
rownames(out6) <-c("Power","FDR","Seconds","False Positive","True Postive")

#write map and numeric genotype files compatible with BLINK
myGM=GM
colnames(myGM)=c("rs","chr","pos")
write.table(t(genotypes),file="myData.dat",quote=F,sep="\t",col.name=F,row.name=F)
write.table(myGM,file="myData.map",quote=F,sep="\t",col.name=T,row.name=F)
colnames(CV)=c("taxa","CV1","CV2")
write.table(CV,file="myData.cov",quote=F,sep="\t",col.name=T,row.name=F)
#run replicate
blinkrep=replicate(30,{
mySim=G2P(genotypes, h2, alpha, NQTN, distribution)
myY=cbind(rownames(mySim$y),mySim$y)
colnames(myY)=c("taxa","Sim")
write.table(myY,file="myData.txt",quote=F,sep="\t",col.name=T,row.name=F)
#start time for function
loop_start <- proc.time()
system("blink --file myData --numeric --gwas")
loop_end <- proc.time()
#end time for function
loop_elapsed <- loop_end[3] - loop_start[3]
#read in results
blink.p <- read.table("Sim_GWAS_result.txt", header = T, stringsAsFactors = F, sep = "\t")
p=t(blink.p$p_value)
#power calculations
p6_blink=power.fdr(p, QTN.position=mySim$QTN.position,cutoff=NULL)
p6_pwr_blink=mean(p6_blink$power)
p6_fdr_blink=mean(p6_blink$fdr)
FalP=length(p6_blink$FP.fdr.power)
TruP=length(p6_blink$TP.fdr.power)
pwr.fdr=c(p6_pwr_blink,p6_fdr_blink,loop_elapsed,FalP,TruP)
  })
means=rowMeans(blinkrep)
out7 <- rbind(means[1],means[2],means[3],means[4],means[5])
out7 <- as.data.frame(out7)
colnames(out7) <- c("Blink Average")
rownames(out7) <-c("Power","FDR","Seconds","False Positive","True Postive")
out1=data.frame(out6,out7)

Comparing BLINk and GWASapply

#output results
kable(out1)

Power

#t-test for power
t.test(garapid[1,],blinkrep[1,], paired = T,  na.rm=T)

FDR

#t-test for FDR
t.test(garapid[2,],blinkrep[2,], paired = T,  na.rm=T)

Time

#t-test for time
t.test(garapid[3,],blinkrep[3,], paired = T,  na.rm=T)

False Positives

#t-test for false positve
t.test(garapid[4,],blinkrep[4,], paired = T,  na.rm=T)

True Positives

#t-test for true positive
t.test(garapid[5,],blinkrep[5,], paired = T,  na.rm=T)

Manhattan Plots

#simulate phenotypes
mySim=G2P(genotypes, h2, alpha, NQTN, distribution)
myY=cbind(rownames(mySim$y),mySim$y)
colnames(myY)=c("taxa","Sim")
#write phenotype table compatabile with BLINK
write.table(myY,file="myData.txt",quote=F,sep="\t",col.name=T,row.name=F)
#run BLINK numeric GWAS
system("blink --file myData --numeric --gwas")
blink.p <- read.table("Sim_GWAS_result.txt", header = T, stringsAsFactors = F, sep = "\t")
p=t(blink.p$p_value)
pres=power.fdr(p, QTN.position=mySim$QTN.position,cutoff=NULL)

#Run GWASapply and power.fdr functions
p6_p=GWASapply_rapid(mySim$y,genotypes,CV,PCA.M = 3)
p6_plots=power.fdr(p6_p$P.value.res,mySim$QTN.position)

Manhattan Plots

#plot manhattan plots together
gcm=manhattan_plot(GM,p,cutoff=NULL,mySim$QTN.position,FP=pres$FP.fdr.power,TP=pres$TP.fdr.power, trait = "BLINK",messages=FALSE)
gtm=manhattan_plot(GM,p6_p$P.value.res,cutoff=NULL,mySim$QTN.position,FP=p6_plots$FP.fdr.power,TP=p6_plots$TP.fdr.power, trait = "GWASapply",messages=FALSE)
grid.arrange(gcm, gtm, nrow = 2, ncol = 1, top = "Manhattan Plots")

QQ Plots

#plot qq plots together
gcq=qq_plot(GM,p,mySim$QTN.position, trait = "BLINK")
gtq=qq_plot(GM,p6_p$P.value.res,mySim$QTN.position, trait = "GWASapply")
grid.arrange(gcq,gtq, nrow = 2, ncol = 1, top = "QQ-Plots")

Second Dataset

gt_scan1 <- data.frame(read.table("GAPIT_Tutorial_Data/mdp_numeric1.txt", header = T, stringsAsFactors = F, sep = "\t", nrows = 1))
classes1 <- sapply(gt_scan1, class)
genotypes1 <- data.frame(read.table("GAPIT_Tutorial_Data/mdp_numeric1.txt", header = T, row.names = 1, colClasses = classes1, stringsAsFactors = F, sep = "\t"))
#import genetic map, phenotypes, and covariates
GM1 <- read.table("GAPIT_Tutorial_Data/mdp_SNP_information1.txt", header = T, stringsAsFactors = F, sep = "\t")

#run replicate
garapid1=replicate(30,{
mySim=G2P(genotypes1, h2, alpha, NQTN, distribution)
#start time for function
loop_start <- proc.time()
p6_rep=GWASapply_rapid(mySim$y,genotypes1,CV,PCA.M = 3)
loop_end <- proc.time()
#end time for function
loop_elapsed <- loop_end[3] - loop_start[3]
#power calculations
p6_power=power.fdr(p6_rep$P.value.res,mySim$QTN.position)
p6_pwr=mean(p6_power$power)
p6_fdr=mean(p6_power$fdr)
FalP=length(p6_power$FP.fdr.power)
TruP=length(p6_power$TP.fdr.power)
pwr.fdr=c(p6_pwr,p6_fdr,loop_elapsed,FalP,TruP)
#timet=c(loop_elapsed[[1]])
  })
means1=rowMeans(garapid1)
out8 <- rbind(means1[1],means1[2],means1[3],means1[4],means1[5])
out8 <- as.data.frame(out8)
colnames(out8) <- c("GWASapply Average")
rownames(out8) <-c("Power","FDR","Seconds","False Positive","True Postive")

#write map and numeric genotype files compatible with BLINK
myGM1=GM1
colnames(myGM1)=c("rs","chr","pos")
write.table(t(genotypes1),file="myData1.dat",quote=F,sep="\t",col.name=F,row.name=F)
write.table(myGM1,file="myData1.map",quote=F,sep="\t",col.name=T,row.name=F)
colnames(CV)=c("taxa","CV1","CV2")
write.table(CV,file="myData1.cov",quote=F,sep="\t",col.name=T,row.name=F)
#run replicate
blinkrep1=replicate(30,{
mySim=G2P(genotypes1, h2, alpha, NQTN, distribution)
myY=cbind(rownames(mySim$y),mySim$y)
colnames(myY)=c("taxa","Sim")
write.table(myY,file="myData1.txt",quote=F,sep="\t",col.name=T,row.name=F)
#start time for function
loop_start <- proc.time()
system("blink --file myData1 --numeric --gwas")
loop_end <- proc.time()
#end time for function
loop_elapsed <- loop_end[3] - loop_start[3]
#read in results
blink.p <- read.table("Sim_GWAS_result.txt", header = T, stringsAsFactors = F, sep = "\t")
p=t(blink.p$p_value)
#power calculations
p6_blink=power.fdr(p, QTN.position=mySim$QTN.position,cutoff=NULL)
p6_pwr_blink=mean(p6_blink$power)
p6_fdr_blink=mean(p6_blink$fdr)
FalP=length(p6_blink$FP.fdr.power)
TruP=length(p6_blink$TP.fdr.power)
pwr.fdr=c(p6_pwr_blink,p6_fdr_blink,loop_elapsed,FalP,TruP)
  })
means1=rowMeans(blinkrep1)
out9 <- rbind(means1[1],means1[2],means1[3],means1[4],means1[5])
out9 <- as.data.frame(out9)
colnames(out9) <- c("Blink Average")
rownames(out9) <-c("Power","FDR","Seconds","False Positive","True Postive")
out2=data.frame(out8,out9)

Comparing BLINk and GWASapply Dataset 2

#output results
kable(out2)

Time

#t-test for time
t.test(garapid1[3,],blinkrep1[3,], paired = T,  na.rm=T)

Third Dataset

gt_scan2 <- data.frame(read.table("GAPIT_Tutorial_Data/mdp_numeric2.txt", header = T, stringsAsFactors = F, sep = "\t", nrows = 1))
classes2 <- sapply(gt_scan2, class)
genotypes2 <- data.frame(read.table("GAPIT_Tutorial_Data/mdp_numeric2.txt", header = T, row.names = 1, colClasses = classes2, stringsAsFactors = F, sep = "\t"))
#import genetic map, phenotypes, and covariates
GM2 <- read.table("GAPIT_Tutorial_Data/mdp_SNP_information2.txt", header = T, stringsAsFactors = F, sep = "\t")

#run replicate
garapid2=replicate(30,{
mySim=G2P(genotypes2, h2, alpha, NQTN, distribution)
#start time for function
loop_start <- proc.time()
p6_rep=GWASapply_rapid(mySim$y,genotypes2,CV,PCA.M = 3)
loop_end <- proc.time()
#end time for function
loop_elapsed <- loop_end[3] - loop_start[3]
#power calculations
p6_power=power.fdr(p6_rep$P.value.res,mySim$QTN.position)
p6_pwr=mean(p6_power$power)
p6_fdr=mean(p6_power$fdr)
FalP=length(p6_power$FP.fdr.power)
TruP=length(p6_power$TP.fdr.power)
pwr.fdr=c(p6_pwr,p6_fdr,loop_elapsed,FalP,TruP)
#timet=c(loop_elapsed[[1]])
  })
means2=rowMeans(garapid2)
out10 <- rbind(means2[1],means2[2],means2[3],means2[4],means2[5])
out10 <- as.data.frame(out10)
colnames(out10) <- c("GWASapply Average")
rownames(out10) <-c("Power","FDR","Seconds","False Positive","True Postive")

#write map and numeric genotype files compatible with BLINK
myGM2=GM2
colnames(myGM2)=c("rs","chr","pos")
write.table(t(genotypes2),file="myData2.dat",quote=F,sep="\t",col.name=F,row.name=F)
write.table(myGM2,file="myData2.map",quote=F,sep="\t",col.name=T,row.name=F)
colnames(CV)=c("taxa","CV1","CV2")
write.table(CV,file="myData2.cov",quote=F,sep="\t",col.name=T,row.name=F)
#run replicate
blinkrep2=replicate(30,{
mySim=G2P(genotypes2, h2, alpha, NQTN, distribution)
myY=cbind(rownames(mySim$y),mySim$y)
colnames(myY)=c("taxa","Sim")
write.table(myY,file="myData2.txt",quote=F,sep="\t",col.name=T,row.name=F)
#start time for function
loop_start <- proc.time()
system("blink --file myData2 --numeric --gwas")
loop_end <- proc.time()
#end time for function
loop_elapsed <- loop_end[3] - loop_start[3]
#read in results
blink.p <- read.table("Sim_GWAS_result.txt", header = T, stringsAsFactors = F, sep = "\t")
p=t(blink.p$p_value)
#power calculations
p6_blink=power.fdr(p, QTN.position=mySim$QTN.position,cutoff=NULL)
p6_pwr_blink=mean(p6_blink$power)
p6_fdr_blink=mean(p6_blink$fdr)
FalP=length(p6_blink$FP.fdr.power)
TruP=length(p6_blink$TP.fdr.power)
pwr.fdr=c(p6_pwr_blink,p6_fdr_blink,loop_elapsed,FalP,TruP)
  })
means2=rowMeans(blinkrep2)
out11 <- rbind(means2[1],means2[2],means2[3],means2[4],means2[5])
out11 <- as.data.frame(out11)
colnames(out11) <- c("Blink Average")
rownames(out11) <-c("Power","FDR","Seconds","False Positive","True Postive")
out3=data.frame(out10,out11)

Comparing BLINk and GWASapply Dataset 3

#output results
kable(out3)

Time

#t-test for time
t.test(garapid2[3,],blinkrep2[3,], paired = T,  na.rm=T)

stp4freedom/GWhEAT documentation built on April 3, 2020, 4:21 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stp4freedom/GWhEAT Performed GWAS in a fast and efficient manner

In stp4freedom/GWhEAT: Performed GWAS in a fast and efficient manner

Professor: Zhiwu Zhang

Due on March 30, 2020, Monday, 3:10PM, PST

Problem 1 (15 Pts)

Please see GWAS_apply.R for the source code to these questions

Problem 2 (25 Pts)

Please see GWAS_apply.R for the source code to these questions

Problem 3 (20 Pts)

Please see User_Manual_Tutorial.Rmd and User_Manual_Tutorial.html for the user manual and tutorials.

Problem 4 (15 Pts)

Methods:

Results and Interpretation:

Perform GWAS and display results

GWAS Threshold

Significant SNPs and Minor Allele Frequency

Table 1. Significant SNP Results.

Problem 5 (25 Pts)

Methods:

Results and Interpretation:

Simulated Data

Manhattan Plots

QQ Plots

ROC Curve Power vs FDR

ROC Curve Power vs Type 1

Mean Power, FDR, and True and False Positives

GWASbyCor

GWASapply

Statistical Inference

Power

FDR

False Positives

True Positives

Problem 6 (25 Pts)

Methods:

Results and Interpretation:

Comparing BLINk and GWASapply

Power

FDR

Time

False Positives

True Positives

Manhattan Plots

Manhattan Plots

QQ Plots

Second Dataset

Comparing BLINk and GWASapply Dataset 2

Time

Third Dataset

Comparing BLINk and GWASapply Dataset 3

Time

R Package Documentation

Browse R Packages

We want your feedback!

stp4freedom/GWhEAT
Performed GWAS in a fast and efficient manner