Lance F. Merrick November 22, 2022
Wrappers for Easy Association, Tools, and Selection for Breeders (WhEATBreeders)
WhEATBreeders was created to lower the bar for implementing genomic selection models for plant breeders to utilize within their own breeding programs. Not only does include functions for genotype quality control and filtering, but it includes easy to use wrappers for the most commonly use models in many scenarios with K-Fold cross-validation or validation sets. You can also implement GWAS assisted genomic selection. We created a full wrapper for quality control and genomci selection in our function “WHEAT”. Additionaly we walk through the set up of unrpelicated data using adjuste means and calculate cullis heritability. We also go through multi-output and multi-trait wrappers for GWAS in GAPIt. Finally we walk through cross-prediction using PopVar, rrBLUP, and sommer.
For a full list of functions within WhEATBreeders see “Reference_Manual.pdf” this file contains not only the full list of functions but also a description of each. The pdf also has each functions arguments listed. And like with all R packages once WhEATBreeders is installed and loaded you can type ?function_name and that specific function’s full descriptions will appear in the help tab on RStudio.
First if you do not already have R and R studio installed on your computer head over to https://www.r-project.org/ and install the version appropriate for you machine. Once R and R studio are installed you will need to install the WhEATBreeders package since this is a working package in it’s early stages of development it’s only available through Github. To download files off Github first download and load the library of the package “devtools” using the code below.
For a deep dive into all the code in this package and the inner working of the functions, please review the file WhEATBreeders_DeepDive_Into_Code.Rmd. There is a lot including the adjusted means for single plot trials. All the quality control and GS models, GWAS, Manhattan plots, and even popvar tutorials. There are also GBS and heterozygote calling pipelines available in the relevant folder.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(devtools)
library(devtools)
#Better for FDR function
devtools::install_github("jiabowang/GAPIT3",force=TRUE)
library(GAPIT3)
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("impute")
#From the source
#require(compiler) #for cmpfun
#Only if you want the source code
#source("http://zzlab.net/GAPIT/GAPIT.library.R")
#source("http://zzlab.net/GAPIT/gapit_functions.txt") #make sure compiler is running
#source("http://zzlab.net/GAPIT/emma.txt")
if (!require("pacman")) install.packages("pacman")
pacman::p_load(BGLR,
rrBLUP,
caret,
tidyr,
dplyr,
Hmisc,
WeightIt,
mpath,
glmnetUtils,
glmnet,
MASS,
Metrics,
stringr,
lsa,
keras,
tensorflow,
BMTME,
plyr,
data.table,
bigmemory,
biganalytics,
ggplot2,
tidyverse,
knitr,
cvTools,
vcfR,
compiler,
gdata,
PopVar,
BLR,
sommer,
heritability,
arm,
optimx,
purrr,
psych,
lme4,
lmerTest,
gridExtra,
grid,
readxl,
devtools)
Next using the code below download and install the package WhEATBreeders from Github. The bottom two line of code in the chunk below make sure the dependencies WhEATBreeders relies on are also downloaded and installed.
#install package
install_github("lfmerrick21/WhEATBreeders")
library(WhEATBreeders)#package name
library(data.table)
Genotype<-fread("F:\\OneDrive\\OneDrive - Washington State University (email.wsu.edu)\\Documents\\Genomic Selection\\Genomic Selection Pipeline\\Jason_GBS\\WAC_2016-2020_production_filt.hmp.txt",fill=TRUE)
Phenotype<-fread("F:\\OneDrive\\OneDrive - Washington State University (email.wsu.edu)\\Documents\\Genomic Selection\\Genomic Selection Pipeline\\GS-Complex-Traits\\BL_EM_Pheno.csv",header=T)
Phenotype=Phenotype[,-1]
Phenotype1=Phenotype %>%
pivot_longer(!Genotype, names_to = "Env", values_to = "EM")
Phenotype=Phenotype1
LIND_QC=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="Two-Step", #Two-Step or #One-Step
Messages=TRUE)
load(file="Tutorial_Filt_Imputed.RData")
#Input
F515_DH20=WHEAT(Phenotype=Phenotype,
GDre=GDre,
GT=GT,
GIre=GIre,
#GS Info
Type="Regression",
Replications=2,
Training="F5_2015",
Prediction="DH_2020",
CV=NULL,
PC=NULL,
Trait="EM",
Study="Tutorial",
Outcome="Untested",
Trial=c("F5_2015","DH_2020"),
Scheme="VS",
Method="Two-Step",
Package="rrBLUP",
model="rrBLUP",
Kernel="Markers",
markers=NULL,
folds = 5,
nIter = 1500,
burnIn = 500,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
transformation="none",
#GAGS Info
GAGS=FALSE,
PCA.total=3,
QTN=10,
GWAS=c("BLINK"),
alpha=0.05,
threshold="none",
GE=TRUE,
UN=FALSE,
GE_model="MTME",
sampling="up",
repeats=5,
method="repeatedcv",
digits=4,
nCVI=5,
Messages=TRUE)
load(file ='GBS_2_Tutorial.RData')
#Input
F515_DH20=WHEAT(Phenotype=Phenotype,
GBS_Train=GBS_2_F5_2015_EM,
GBS_Predict=GBS_2_Untested_DH_2020_EM,
#GS Info
Type="Regression",
Replications=2,
Training="F5_2015",
Prediction="DH_2020",
CV=NULL,
PC=NULL,
Trait="EM",
Study="Tutorial",
Outcome="Untested",
Trial=c("F5_2015","DH_2020"),
Scheme="VS",
Method="Two-Step",
Package="rrBLUP",
model="rrBLUP",
Kernel="Markers",
markers=NULL,
folds = 5,
nIter = 1500,
burnIn = 500,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
transformation="none",
#GAGS Info
GAGS=FALSE,
PCA.total=3,
QTN=10,
GWAS=c("BLINK"),
alpha=0.05,
threshold="none",
GE=TRUE,
UN=FALSE,
GE_model="MTME",
sampling="up",
repeats=5,
method="repeatedcv",
digits=4,
nCVI=5,
Messages=TRUE)
F515_DH20=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=TRUE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
Filter_Ind=TRUE,
Missing_Rate_Ind=0.80,
#If QC Info is FALSE
GDre=NULL,
GT=NULL,
GIre=NULL,
GBS_Train=NULL,
GBS_Predict=NULL,
Matrix=NULL,
#GS Info
Type="Regression",
Replications=2,
Training="F5_2015",
Prediction="DH_2020",
CV=NULL,
PC=NULL,
Trait="EM",
Study="Tutorial",
Outcome="Untested",
Trial=c("F5_2015","DH_2020"),
Scheme="VS",
Method="Two-Step",
Package="rrBLUP",
model="rrBLUP",
Kernel="Markers",
markers=NULL,
folds = 5,
nIter = 1500,
burnIn = 500,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
transformation="none",
#GAGS Info
GAGS=FALSE,
PCA.total=3,
QTN=10,
GWAS=c("BLINK"),
alpha=0.05,
threshold="none",
GE=TRUE,
UN=FALSE,
GE_model="MTME",
Messages=TRUE)
load(file="Tutorial_Filt_Imputed.RData")
#Input
F515_DH20=WHEAT(Phenotype=Phenotype,
GDre=GDre,
GT=GT,
GIre=GIre,
#GS Info
Type="Regression",
Replications=2,
Training="F5_2015",
Prediction="DH_2020",
CV=NULL,
PC=NULL,
Trait="EM",
Study="Tutorial",
Outcome="Untested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",
Method="One-Step",
Package="BGLR",
Kernel="Gaussian",
markers=NULL,
folds = 5,
nIter = 1500,
burnIn = 500,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
transformation="none",
#GAGS Info
GAGS=FALSE,
PCA.total=3,
QTN=10,
GWAS=c("BLINK"),
alpha=0.05,
threshold="none",
GE=TRUE,
UN=FALSE,
GE_model="MTME",
sampling="up",
repeats=5,
method="repeatedcv",
digits=4,
nCVI=5,
Messages=TRUE)
load(file ='GBS_2_Tutorial.RData')
#Input
F515_DH20=WHEAT(Phenotype=Phenotype,
Matrix=Matrix,
#GS Info
Type="Regression",
Replications=2,
Training="F5_2015",
Prediction="DH_2020",
CV=NULL,
PC=NULL,
Trait="EM",
Study="Tutorial",
Outcome="Untested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",
Method="One-Step",
Package="BGLR",
Kernel="Gaussian",
markers=NULL,
folds = 5,
nIter = 1500,
burnIn = 500,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
transformation="none",
#GAGS Info
GAGS=FALSE,
PCA.total=3,
QTN=10,
GWAS=c("BLINK"),
alpha=0.05,
threshold="none",
GE=TRUE,
UN=FALSE,
GE_model="MTME",
sampling="up",
repeats=5,
method="repeatedcv",
digits=4,
nCVI=5,
Messages=TRUE)
LIND_QC_One_Step=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="One-Step", #Two-Step or #One-Step
Kernel="Gaussian",
folds = 5,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
GE=TRUE
UN=FALSE
GE_model="MTME"
Messages=TRUE)
LIND_QC_One_Step=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="One-Step", #Two-Step or #One-Step
Kernel="Gaussian",
folds = 5,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
GE=TRUE
UN=FALSE
GE_model="BMTME"
Messages=TRUE)
LIND_QC_One_Step=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="One-Step", #Two-Step or #One-Step
Kernel="Gaussian",
folds = 5,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
GE=TRUE
UN=FALSE
GE_model="MTME"
Messages=TRUE)
LIND_QC_One_Step=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="One-Step", #Two-Step or #One-Step
Kernel="Gaussian",
folds = 5,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
GE=TRUE
UN=TRUE
GE_model="MEI"
Messages=TRUE)
LIND_QC_One_Step=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="One-Step", #Two-Step or #One-Step
Kernel="Gaussian",
folds = 5,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
GE=TRUE
UN=FALSE
GE_model="MTME"
Messages=TRUE)
LIND_QC_One_Step=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="One-Step", #Two-Step or #One-Step
Kernel="Gaussian",
folds = 5,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
GE=TRUE
UN=FALSE
GE_model="BMTME"
Messages=TRUE)
LIND_QC_One_Step=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="One-Step", #Two-Step or #One-Step
Kernel="Gaussian",
folds = 5,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
GE=TRUE
UN=FALSE
GE_model="MTME"
Messages=TRUE)
LIND_QC_One_Step=WHEAT(Phenotype=Phenotype,
Genotype=Genotype,
QC=TRUE,
GS=FALSE,
#QC Info
Geno_Type="Hapmap",
Imputation="Beagle",
Filter=TRUE,
Missing_Rate=0.20,
MAF=0.05,
#Do not remove individuals
Filter_Ind=FALSE,
Missing_Rate_Ind=0.80,
Trait=c("EM"),
Study="Tutorial",
Outcome="Tested",
Trial=c("F5_2015","DH_2020"),
Scheme="K-Fold",#K-Fold or VS
Method="One-Step", #Two-Step or #One-Step
Kernel="Gaussian",
folds = 5,
Sparse=FALSE,
m=NULL,
degree=NULL,
nL=NULL,
GE=TRUE,
UN=TRUE,
GE_model="MEI"
Messages=TRUE)
For more information on individual functions please see the “Reference_Manual.pdf” or type ?FUNCITON_NAME into the R console, this will pull up specific information of each function inside WhEATBreeders. For example typing ?manhattan_plot will pull of the help page with details about the function that creates the Manhattan plots.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.